Universal Web Scraper Test Command
WP-CLI command for testing the Universal Web Scraper handler with any target URL. Supports web pages, ICS feeds, and JSON APIs.
wp datamachine-events test-event-scraper --target_url=<url>
Parameters
Required
--target_url=<url>: The web page URL, ICS feed, or JSON API to test
Examples
Web Page Scraping
wp datamachine-events test-event-scraper --target_url=https://example.com/events
ICS Calendar Feeds
wp datamachine-events test-event-scraper --target_url=https://tockify.com/api/feeds/ics/calendar-name
Google Calendar Export
wp datamachine-events test-event-scraper --target_url=webcal://calendar.google.com/calendar/ical/...
Output
The command displays:
- Target URL: The URL being tested
- Extraction Details:
- Packet title
- Source type (e.g.,
wix_events,json_ld,raw_html) - Extraction method
- Event title and start date
- Venue name and address
- Status: OK (complete venue/address coverage) or WARNING (incomplete coverage)
- Warnings: Any extraction warnings encountered
Venue Coverage Warnings
The command evaluates venue data completeness:
- Missing venue name: Venue override required in flow configuration
- Missing address fields: Address, city, and state are required for geocoding
Raw HTML packets indicate AI extraction is needed for venue data.
Exit Codes
0: Command completed successfully1: Error (missing required parameter)
Use Cases
- Handler Testing: Verify the scraper works on a new venue website
- Extraction Debugging: Inspect raw extraction results before running a full pipeline. If extraction fails, the command outputs the full raw HTML to assist in troubleshooting.
- Coverage Assessment: Check if venue data will be complete after import
- Platform Detection: Identify which extractor is being used (Wix, Squarespace, etc.)
Reliability & Debugging
The test command is essential for verifying the scraper’s Smart Fallback and Browser Spoofing capabilities. When testing URLs known to have strict bot detection, observe the logs for "retrying with standard mode" to confirm the fallback is functioning correctly.
ICS Calendar Feed Support
The Universal Web Scraper now directly supports ICS/iCal feed URLs, replacing the deprecated ICS Calendar handler.
Supported ICS Formats
- Direct
.icsfiles - Tockify feeds
- Google Calendar exports
- Apple Calendar exports
- Outlook calendar exports
- Any standard ICS/iCal feed
ICS Handler Migration
The legacy ICS Calendar handler is deprecated. Migrate existing flows using:
wp datamachine-events migrate-handlers --handler=ics_calendar --dry-run
To perform the migration:
wp datamachine-events migrate-handlers --handler=ics_calendar
The migration tool automatically:
- Updates flow handler from
ics_calendartouniversal_web_scraper - Maps
feed_urlconfig tosource_url - Preserves all venue override and keyword filtering settings
- Validates configuration before applying changes
ICS-Specific Output
When testing ICS feeds, the command displays:
- Source type:
ics_feed - Timezone information: Calendar timezone and event-specific timezone
- Venue data: From ICS location field (optional venue override available)
- Time coverage warnings: For missing start/end times (common in all-day events)