WebFetch AI Tool

File Location: inc/Engine/AI/Tools/Global/WebFetch.php

Registration: datamachine_global_tools filter (available to all AI agents – pipeline + chat)

Enables AI models to retrieve and process web page content for analysis, research, and content extraction from external websites with built-in content processing and safety features.

Configuration

No Configuration Required: Tool is always available without external API keys or authentication setup.

Universal Availability: Accessible to all AI steps as a general-purpose tool for web content retrieval.

Parameters

Parameter Type Required Description
url string Yes Valid HTTP or HTTPS URL to fetch

Usage Examples

Basic Web Fetch:

$parameters = [
    'url' => 'https://example.com/article'
];

Article Analysis:

$parameters = [
    'url' => 'https://techcrunch.com/2024/01/15/ai-breakthrough'
];

Content Processing

HTML Processing: Retrieves complete HTML content and processes it for AI consumption.

Content Limits: 50,000 character limit to prevent excessive response sizes and processing overhead.

Safety Validation: Validates URL format and restricts to HTTP/HTTPS protocols only.

URL Validation

Format Validation: Uses PHP’s filter_var() with FILTER_VALIDATE_URL for strict URL validation.

Protocol Restrictions: Only HTTP and HTTPS URLs are accepted for security.

Error Handling: Clear error messages for invalid URL formats or unsupported protocols.

Tool Response

Success Response:

[
    'success' => true,
    'data' => [
        'url' => 'https://example.com/page',
        'content' => 'Retrieved web page content...',
        'content_length' => 15420,
        'content_truncated' => false,
        'fetch_timestamp' => '2024-01-15 14:30:00'
    ],
    'tool_name' => 'web_fetch'
]

Success Response (Truncated):

[
    'success' => true,
    'data' => [
        'url' => 'https://example.com/very-long-page',
        'content' => 'Retrieved content truncated to 50,000 characters...',
        'content_length' => 50000,
        'content_truncated' => true,
        'original_length' => 87234,
        'fetch_timestamp' => '2024-01-15 14:30:00'
    ],
    'tool_name' => 'web_fetch'
]

Error Responses:

// Missing URL
[
    'success' => false,
    'error' => 'URL parameter is required',
    'tool_name' => 'web_fetch'
]

// Invalid URL format
[
    'success' => false,
    'error' => 'Invalid URL format. Must be a valid HTTP or HTTPS URL',
    'tool_name' => 'web_fetch'
]

// Fetch failure
[
    'success' => false,
    'error' => 'Failed to fetch URL: Connection timeout',
    'tool_name' => 'web_fetch'
]

Content Features

Complete HTML: Retrieves entire HTML content including markup, scripts, and styles.

Raw Content: No content filtering or extraction – provides complete page source for AI processing.

Character Limit: Automatic truncation at 50,000 characters with truncation indicators.

Timestamp Tracking: Records fetch time for content freshness verification.

Network Handling

WordPress HTTP API: Uses WordPress’s built-in wp_remote_get() for consistent network handling.

Timeout Management: Inherits WordPress default timeout settings for reliability.

Error Reporting: Clear error messages for network failures, timeouts, and HTTP errors.

Use Cases

Content Analysis: Retrieve web pages for AI content analysis and insights.

Competitive Research: Analyze competitor websites and content strategies.

Reference Material: Fetch source material for fact-checking and reference.

Content Inspiration: Retrieve content from external sources for inspiration and ideas.

URL Processing: Extract and analyze content from URLs found in data sources.

Research Enhancement: Gather external information to enhance AI responses with current web data.

Performance Considerations

Content Limits: 50K character limit prevents excessive memory usage and processing time.

Single Request: One HTTP request per tool call for efficient resource usage.

No Caching: Fresh content retrieval on every request for current information.

Truncation Handling: Graceful content truncation with clear indicators when limits are reached.

Safety Features

URL Validation: Strict URL format and protocol validation prevents malformed requests.

Protocol Restrictions: Only HTTP/HTTPS allowed – blocks file://, ftp://, and other protocols.

Error Boundaries: Comprehensive error handling prevents tool failures from breaking AI workflows.

Content Limits: Prevents resource exhaustion from extremely large web pages.

Success Messaging

Custom Success Messages: Implements datamachine_tool_success_message filter for enhanced AI conversation formatting.

Content Length Reporting: Clear indication of content size and truncation status.

URL Confirmation: Confirms successful retrieval with original URL reference.

Timestamp Information: Provides fetch time for content freshness awareness.