Data Machine Architecture

Data Machine is an AI-first WordPress plugin that uses a Pipeline+Flow architecture for automated content processing and publication. It provides multi-provider AI integration with tool-first design patterns, centered around a reliability-first Single Item Execution Model.

Core Components

Pipeline+Flow System

  • Pipelines: Reusable templates containing step configurations
  • Flows: Configured instances of pipelines with scheduling
  • Jobs: Individual executions of flows with status tracking, each processing exactly one item

Execution Engine

Services layer architecture with direct method calls for optimal performance. The engine implements a four-action execution cycle that processes exactly one item per job to ensure maximum reliability and isolation.

Database Schema

  • wp_datamachine_pipelines – Pipeline templates (reusable)
  • wp_datamachine_flows – Flow instances (scheduled + configured)
  • wp_datamachine_logs – Centralized system logs for all agent activity (@since v0.4.0)
  • wp_datamachine_processed_items – Deduplication tracking per execution

Engine Data Architecture

Clean Data Separation: AI agents receive clean data packets without URLs while handlers access engine parameters via centralized filter pattern.

Enhanced Database Storage + Filter Access: Fetch handlers store engine parameters (source_url, image_url) in database; steps retrieve via centralized datamachine_engine_data filter with storage/retrieval mode detection for unified access.

Core Pattern:

php
// Fetch handlers store via centralized filter (array storage)
if ($job_id) {
    apply_filters('datamachine_engine_data', null, $job_id, [
        'source_url' => $source_url,
        'image_url' => $image_url
    ]);
}

// Steps retrieve via centralized filter (EngineData.php)
$engine_data = apply_filters('datamachine_engine_data', [], $job_id);
$source_url = $engine_data['source_url'] ?? null;
$image_url = $engine_data['image_url'] ?? null;

Benefits:

  • Clean AI Data: AI processes content without URLs for better model performance
  • Centralized Access: Single filter interface for all engine data retrieval
  • Filter Consistency: Maintains architectural pattern of filter-based service discovery
  • Flexible Storage: Steps access only what they need via filter call

Services Layer Architecture (@since v0.4.0)

Performance Revolution: Complete replacement of filter-based action system with OOP service managers for 3x performance improvement through direct method calls. Most services have been migrated to the WordPress 6.9 Abilities API.

Remaining Services (utilities for cross-cutting concerns):

  • JobManager – Job execution monitoring and management
  • LogsManager – Centralized log access and filtering
  • Cache Invalidation – Ability-level clearCache() methods for handlers, step types, tools, and settings

Abilities API (business logic):

  • FlowAbilities – Flow CRUD operations, duplication
  • PipelineAbilities – Pipeline CRUD operations with complete/simple creation modes
  • FlowStepAbilities – Individual flow step configuration and handler management
  • PipelineStepAbilities – Pipeline step template management
  • ProcessedItemsAbilities – Deduplication tracking across workflows

Benefits:

  • 3x Performance Improvement: Direct method calls eliminate filter indirection
  • Centralized Business Logic: Consistent validation and error handling
  • Reduced Database Queries: Optimized data access patterns
  • Clean Architecture: Single responsibility per ability class
  • Backward Compatibility: Maintains WordPress hook integration

Step Types

  • Fetch: Data retrieval with clean content processing (Files, RSS, Reddit, Google Sheets, WordPress Local, WordPress Media, WordPress API)
  • AI: Content processing with multi-provider support (OpenAI, Anthropic, Google, Grok)
  • Publish: Content distribution with modular handler architecture (Twitter, Facebook, Threads, Bluesky, WordPress with specialized components)
  • Update: Content modification (WordPress posts/pages)

Authentication System

Base Authentication Provider Architecture (@since v0.2.6): Complete inheritance system with centralized option storage and validation across all authentication providers.

Base Classes:

  • BaseAuthProvider (/inc/Core/OAuth/BaseAuthProvider.php): Abstract base for all authentication providers with unified option storage, callback URL generation, and authentication state checking
  • BaseOAuth1Provider (/inc/Core/OAuth/BaseOAuth1Provider.php): OAuth 1.0a providers (TwitterAuth) extending BaseAuthProvider
  • BaseOAuth2Provider (/inc/Core/OAuth/BaseOAuth2Provider.php): OAuth 2.0 providers (RedditAuth, FacebookAuth, ThreadsAuth, GoogleSheetsAuth) extending BaseAuthProvider

OAuth Handlers:

  • OAuth1Handler (/inc/Core/OAuth/OAuth1Handler.php): Three-legged OAuth 1.0a flow implementation
  • OAuth2Handler (/inc/Core/OAuth/OAuth2Handler.php): Authorization code flow implementation

Authentication Providers:

  • OAuth 1.0a: TwitterAuth extends BaseOAuth1Provider
  • OAuth 2.0: RedditAuth, FacebookAuth, ThreadsAuth, GoogleSheetsAuth extend BaseOAuth2Provider
  • Direct: BlueskyAuth extends BaseAuthProvider (app password authentication)

OAuth2 Flow:

  1. Create state nonce for CSRF protection
  2. Build authorization URL with parameters
  3. Handle callback: verify state, exchange code for token, retrieve account details, store credentials

OAuth1 Flow:

  1. Get request token
  2. Build authorization URL
  3. Handle callback: validate parameters, exchange for access token, store credentials

Benefits:

  • Eliminates duplicated storage logic across all providers (~60% code reduction per provider)
  • Standardized error handling and logging
  • Unified security implementation
  • Easy integration of new providers via base class extension

Universal Engine Architecture

Data Machine v0.2.0 introduced a universal Engine layer (/inc/Engine/AI/) that serves both Pipeline and Chat agents with shared AI infrastructure:

Core Engine Components:

  • AIConversationLoop: Multi-turn conversation execution with tool calling, completion detection, and state management
  • ToolExecutor: Universal tool discovery, enablement validation, and execution across agent types
  • ToolParameters: Centralized parameter building for AI tools with data packet integration
  • ConversationManager: Message formatting and conversation state management
  • RequestBuilder: AI request construction with directive application and tool restructuring
  • ToolResultFinder: Utility for finding tool execution results in data packets

Tool Categories:

  • Handler-specific tools for publish/update operations
  • Global tools for search and analysis (GoogleSearch, LocalSearch, WebFetch, WordPressPostReader)
  • Chat-only tools for workflow building (@since v0.4.3):
    • AddPipelineStep, ApiQuery, AuthenticateHandler, ConfigureFlowSteps, ConfigurePipelineStep, CopyFlow, CreateFlow, CreatePipeline, CreateTaxonomyTerm, ExecuteWorkflowTool, GetHandlerDefaults, ManageLogs, ReadLogs, RunFlow, SearchTaxonomyTerms, SetHandlerDefaults, UpdateFlow
  • Automatic tool discovery and three-layer enablement system

Filter-Based Discovery

All components self-register via WordPress filters:

  • AddPipelineStep, ApiQuery, AuthenticateHandler, ConfigureFlowSteps, ConfigurePipelineStep, CopyFlow, CreateFlow, CreatePipeline, CreateTaxonomyTerm, ExecuteWorkflowTool, GetHandlerDefaults, ManageLogs, ReadLogs, RunFlow, SearchTaxonomyTerms, SetHandlerDefaults, UpdateFlow

Modular Component Architecture (@since v0.2.1)

Data Machine v0.2.1 introduced modular component systems for enhanced code organization and maintainability:

FilesRepository Components (/inc/Core/FilesRepository/):

  • AddPipelineStep, ApiQuery, AuthenticateHandler, ConfigureFlowSteps, ConfigurePipelineStep, CopyFlow, CreateFlow, CreatePipeline, CreateTaxonomyTerm, ExecuteWorkflowTool, GetHandlerDefaults, ManageLogs, ReadLogs, RunFlow, SearchTaxonomyTerms, SetHandlerDefaults, UpdateFlow

WordPress Shared Components (/inc/Core/WordPress/):

  • datamachine_handlers – Register fetch/publish/update handlers
  • chubes_ai_tools – Register AI tools and capabilities
  • datamachine_auth_providers – Register authentication providers
  • datamachine_step_types – Register custom step types
  • datamachine_get_oauth1_handler – OAuth 1.0a handler service discovery
  • datamachine_get_oauth2_handler – OAuth 2.0 handler service discovery

EngineData (/inc/Core/EngineData.php):

  • DirectoryManager – Directory creation and path management
  • FileStorage – File operations and flow-isolated storage
  • FileCleanup – Retention policy enforcement and cleanup
  • ImageValidator – Image validation and metadata extraction
  • RemoteFileDownloader – Remote file downloading with validation
  • FileRetrieval – Data retrieval from file storage

Engine Components (/inc/Engine/):

  • TaxonomyHandler – Taxonomy selection and term creation (skip, AI-decided, pre-selected modes)
  • WordPressSettingsHandler – Shared WordPress settings fields
  • WordPressFilters – Service discovery registration

Benefits:

  • Consolidated Operations – Featured image attachment, source URL attribution, and engine data access (@since v0.2.1, enhanced v0.2.6)
  • Unified Interface – Single class for all engine data operations (replaces FeaturedImageHandler and SourceUrlHandler in v0.2.6)

For detailed documentation:

  • StepNavigator – Centralized step navigation logic for execution flow

Centralized Handler Filter System

Unified Cross-Cutting Functionality: The engine provides centralized filters for shared functionality across multiple handlers, eliminating code duplication and ensuring consistency.

Core Centralized Filters:

  • Code Deduplication: Eliminates repetitive functionality across handlers
  • Single Responsibility: Each component has focused purpose
  • Maintainability: Centralized logic simplifies updates
  • Extensibility: Easy to add new functionality via composition

Implementation:

php
// Timeframe parsing example
$cutoff_timestamp = apply_filters('datamachine_timeframe_limit', null, '24_hours');
$date_query = $cutoff_timestamp ? ['after' => gmdate('Y-m-d H:i:s', $cutoff_timestamp)] : [];

// Keyword matching example
$matches = apply_filters('datamachine_keyword_search_match', true, $content, $search_keywords);
if (!$matches) continue; // Skip non-matching items

// Data packet creation example
$data = apply_filters('datamachine_data_packet', $data, $packet_data, $flow_step_id, $step_type);

Benefits:

  • FilesRepository Components
  • WordPress Shared Components
  • EngineData
  • StepNavigator

WordPress Publish Handler Architecture

Modular Component System: The WordPress publish handler uses specialized processing modules for enhanced maintainability and extensibility.

Core Components:

  • datamachine_timeframe_limit: Shared timeframe parsing with discovery/conversion modes
    • Discovery mode: Returns available timeframe options for UI dropdowns
    • Conversion mode: Returns Unix timestamp for specified timeframe
    • Used by: RSS, Reddit, WordPress Local, WordPress Media, WordPress API
  • datamachine_keyword_search_match: Universal keyword matching with OR logic
    • Case-insensitive Unicode-safe matching
    • Comma-separated keyword support
    • Used by: RSS, Reddit, WordPress Local, WordPress Media, WordPress API
  • datamachine_data_packet: Standardized data packet creation and structure
    • Ensures type and timestamp fields are present
    • Maintains chronological ordering via array_unshift()
    • Used by: All step types for consistent data flow

Configuration Hierarchy: System-wide defaults ALWAYS override handler-specific configuration when set, providing consistent behavior across all WordPress publish operations.

Features:

  • Discovery mode: Returns available timeframe options for UI dropdowns
  • Conversion mode: Returns Unix timestamp for specified timeframe
  • Used by: RSS, Reddit, WordPress Local, WordPress Media, WordPress API

File Management

Flow-isolated UUID storage with automatic cleanup:

  • Case-insensitive Unicode-safe matching
  • Comma-separated keyword support
  • Used by: RSS, Reddit, WordPress Local, WordPress Media, WordPress API

HTTP Client

The centralized HttpClient class (/inc/Core/HttpClient.php) standardizes all outbound requests for fetch and publish handlers. It wraps the native WordPress HTTP helpers while:

  • Ensures type and timestamp fields are present
  • Maintains chronological ordering via array_unshift()
  • Used by: All step types for consistent data flow

See HTTP Client for implementation details and usage guidance.

Admin Interface

Modern React Architecture: The entire Data Machine admin interface (Pipelines, Logs, Settings, and Jobs) uses a complete React implementation with zero jQuery or AJAX dependencies.

React Implementation:

  • Discovery mode: Returns available timeframe options for UI dropdowns
  • Conversion mode: Returns Unix timestamp for specified timeframe
  • Used by: RSS, Reddit, WordPress Local, WordPress Media, WordPress API

Component Architecture:

  • Case-insensitive Unicode-safe matching
  • Comma-separated keyword support
  • Used by: RSS, Reddit, WordPress Local, WordPress Media, WordPress API

Complete REST API Integration: All admin pages now use REST API architecture with zero jQuery/AJAX dependencies.

Security Model: All admin operations require manage_options capability with WordPress nonce validation.

Extension Framework

Complete extension system for custom handlers and tools:

  • Ensures type and timestamp fields are present
  • Maintains chronological ordering via array_unshift()
  • Used by: All step types for consistent data flow

Key Features

AI Integration

  • Code Consistency: Identical behavior across all handlers using shared filters
  • Maintainability: Single implementation location for shared functionality
  • Extensibility: New handlers automatically inherit shared capabilities
  • Performance: Optimized implementations used across all handlers

Data Processing

  • EngineData: Consolidated featured image attachment and source URL attribution with configuration hierarchy (system defaults override handler config) (@since v0.2.1, enhanced v0.2.6)
  • TaxonomyHandler: Configuration-based taxonomy processing with three selection modes (skip, AI-decided, pre-selected)
  • Direct Integration: WordPress handlers use EngineData and TaxonomyHandler directly for single source of truth data access

Scheduling

  • Specialized component isolation for maintainability
  • Configuration validation and error handling per component
  • WordPress native function integration for optimal performance
  • Comprehensive logging throughout all components
  • Unified engine data operations via EngineData class

Security

  • Files organized by flow instance
  • Automatic purging on job completion
  • Support for local and remote file processing