Data Machine Architecture

Data Machine is an AI-first WordPress plugin that uses a Pipeline+Flow architecture for automated content processing and publication. It provides multi-provider AI integration with tool-first design patterns, centered around a reliability-first Single Item Execution Model, with multi-agent support and a layered memory system.

Core Components

Pipeline+Flow System

  • Pipelines: Reusable templates containing step configurations
  • Flows: Configured instances of pipelines with scheduling
  • Jobs: Individual executions of flows with status tracking, each processing exactly one item. Support parent-child relationships for batch execution via parent_job_id.

Execution Engine

Services layer architecture with direct method calls for optimal performance. The engine implements a four-action execution cycle that processes exactly one item per job to ensure maximum reliability and isolation.

Database Schema

Eight core tables:

TablePurpose
wp_datamachine_pipelinesPipeline templates (reusable), with user_id and agent_id
wp_datamachine_flowsFlow instances (scheduled + configured), with user_id and agent_id
wp_datamachine_jobsJob execution records, with user_id, agent_id, parent_job_id, source, label
wp_datamachine_processed_itemsDeduplication tracking per execution
wp_datamachine_chat_sessionsPersistent conversation state, with agent_id, title, context
wp_datamachine_agentsAgent registry (slug, name, owner, config, status)
wp_datamachine_agent_accessRole-based access control (viewer, operator, admin)
wp_datamachine_logsCentralized system logs with agent scoping

See Database Schema for full table definitions and relationships.

Multi-Agent Architecture

Data Machine supports multiple agents on a single WordPress installation (@since v0.36.1):

┌─────────────────────────────────────────────────┐
│                WordPress Site                    │
│                                                  │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐      │
│  │ Agent A  │  │ Agent B  │  │ Agent C  │      │
│  │          │  │          │  │          │      │
│  │ SOUL.md  │  │ SOUL.md  │  │ SOUL.md  │      │
│  │ MEMORY.md│  │ MEMORY.md│  │ MEMORY.md│      │
│  │ daily/   │  │ daily/   │  │ daily/   │      │
│  │          │  │          │  │          │      │
│  │ pipelines│  │ pipelines│  │ pipelines│      │
│  │ flows    │  │ flows    │  │ flows    │      │
│  │ jobs     │  │ jobs     │  │ jobs     │      │
│  │ chat     │  │ chat     │  │ chat     │      │
│  └──────────┘  └──────────┘  └──────────┘      │
│                                                  │
│  ┌──────────────────────────────────────────┐   │
│  │ Shared Layer: SITE.md, RULES.md          │   │
│  └──────────────────────────────────────────┘   │
│                                                  │
│  ┌──────────┐  ┌──────────┐                     │
│  │ User 1   │  │ User 2   │                     │
│  │ USER.md  │  │ USER.md  │                     │
│  └──────────┘  └──────────┘                     │
└─────────────────────────────────────────────────┘

Key components:

  • Agent Registry (datamachine_agents): Each agent has a unique slug, owner, and configuration
  • Access Control (datamachine_agent_access): Role-based sharing (viewer < operator < admin)
  • Resource Scoping: All pipelines, flows, jobs, and chat sessions carry agent_id
  • Filesystem Isolation: Each agent gets agents/{slug}/ for identity files and daily memory
  • Permission Helper: PermissionHelper resolves agent context and enforces access checks

Layered Memory Architecture

Agent memory is organized in a three-layer directory system below Data Machine’s files root in WordPress uploads:

datamachine-files/
├── shared/              # Site-wide (all agents)
│   ├── SITE.md
│   └── RULES.md
├── agents/              # Per-agent identity
│   ├── agent-a/
│   │   ├── SOUL.md
│   │   ├── MEMORY.md
│   │   └── daily/
│   │       └── 2026/
│   │           └── 03/
│   │               ├── 15.md
│   │               └── 16.md
│   └── agent-b/
│       ├── SOUL.md
│       └── MEMORY.md
├── users/               # Per-user preferences
│   ├── 1/
│   │   ├── USER.md
│   │   └── MEMORY.md
│   └── 2/
│       └── USER.md

CoreMemoryFilesDirective (Priority 20) loads files from layers in order:

  1. SITE.md → RULES.md from the shared layer
  2. SOUL.md → MEMORY.md from the agent layer
  3. USER.md → MEMORY.md from the user layer
  4. Custom files from MemoryFileRegistry (extensions), including files selected by pipelines and flows

See WordPress as Agent Memory for full memory documentation.

Daily Memory System

Temporal knowledge management via date-organized files:

  • DailyMemory: File operations at agents/{slug}/daily/YYYY/MM/DD.md
  • DailyMemoryTask: System task with two phases:
    • Phase 1: Synthesizes daily activity (jobs, chat) into daily file
    • Phase 2: Prunes MEMORY.md when > 8KB, archiving session content to daily file
  • DailyMemorySelectorDirective (Priority 46): Injects daily memory into pipeline AI requests with configurable selection modes (recent days, specific dates, date range, months). Capped at 100KB total.
  • DailyMemoryAbilities: CRUD + search via Abilities API with multi-agent scoping

System Tasks Framework

Background AI operations that run outside the normal pipeline model:

┌─────────────────────┐
│     SystemTask      │  (abstract base)
│                     │
│ execute()           │  ← Task-specific logic
│ completeJob()       │  ← Mark done + store engine_data
│ failJob()           │  ← Record failure
│ reschedule()        │  ← Retry with backoff (max 24)
│ supportsUndo()      │  ← Opt-in undo support
│ undo()              │  ← Reverse recorded effects
│ getPromptDefs()     │  ← Editable AI prompts
│ resolveSystemModel()│  ← Agent-aware model selection
└─────────────────────┘
         ▲
         │ extends
    ┌────┴────────────────────────────────────┐
    │                                          │
ImageGenerationTask  AltTextTask  DailyMemoryTask
ImageOptimizationTask  InternalLinkingTask
AgentCallTask  MetaDescriptionTask  RetentionTask

Undo System: Tasks that record effects in engine_data can be reversed:

  • Phase 1: Synthesizes daily activity (jobs, chat) into daily file
  • Phase 2: Prunes MEMORY.md when > 8KB, archiving session content to daily file

Workspace System

Secure file management outside the web root for agent operations lives in the data-machine-code extension plugin, not Data Machine core.

  • Phase 1: Synthesizes daily activity (jobs, chat) into daily file
  • Phase 2: Prunes MEMORY.md when > 8KB, archiving session content to daily file

Engine Data Architecture

Clean Data Separation: AI agents receive clean data packets without URLs while handlers access engine parameters via centralized filter pattern.

Enhanced Database Storage + Filter Access: Fetch handlers store engine parameters (source_url, image_url) in database; steps retrieve via centralized datamachine_engine_data filter with storage/retrieval mode detection for unified access.

Core Pattern:

php
// Fetch handlers store via centralized filter (array storage)
if ($job_id) {
    apply_filters('datamachine_engine_data', null, $job_id, [
        'source_url' => $source_url,
        'image_url' => $image_url
    ]);
}

// Steps retrieve via centralized filter (EngineData.php)
$engine_data = apply_filters('datamachine_engine_data', [], $job_id);
$source_url = $engine_data['source_url'] ?? null;
$image_url = $engine_data['image_url'] ?? null;

Benefits:

  • post_content_modified → restore WordPress revision
  • post_meta_set → restore previous value
  • attachment_created → delete attachment
  • featured_image_set → restore/remove thumbnail

Abilities-First Architecture (@since v0.11.7)

Performance Revolution: Complete replacement of the older filter-based action and service-manager layers with direct ability classes. REST endpoints, WP-CLI commands, and chat tools all delegate to WordPress Abilities API registrations under inc/Abilities/.

Ability domains (business logic):

  • Location: Managed by data-machine-code workspace settings
  • Git-aware: Clone, status, pull, add, commit, push, log, diff
  • File ops: Read (with pagination), write, edit (find-replace), list directory
  • Security: Outside web root; mutating ops are CLI-only (not REST-exposed)
  • CLI: wp datamachine-code workspace {path,list,clone,remove,show,read,ls,write,edit,git,worktree}

Coding workspace note: Git-aware workspace and GitHub coding operations live in the data-machine-code extension plugin. Data Machine core no longer registers WorkspaceAbilities or GitHub issue abilities.

Benefits:

  • Clean AI Data: AI processes content without URLs for better model performance
  • Centralized Access: Single filter interface for all engine data retrieval
  • Filter Consistency: Maintains architectural pattern of filter-based service discovery
  • Flexible Storage: Steps access only what they need via filter call

Step Types

  • Flow abilities – Flow CRUD, duplication, pause/resume, scheduling, webhooks, and queue management
  • Pipeline abilities – Pipeline CRUD, import/export, and pipeline-step template management
  • Flow step abilities – Individual flow step configuration and handler management
  • Job abilities – Workflow execution, retry/fail/delete/recovery, flow health, and summaries
  • Processed item abilities – Deduplication tracking across workflows
  • Agent abilities – Agent CRUD, access grants, tokens, remote calls, memory, and daily memory
  • File abilities – Agent files, flow uploads, cleanup, and memory scaffolding

Directive System

Priority-ordered context injection into every AI request:

PriorityDirectiveContextPurpose
20CoreMemoryFilesDirectiveAllLayer files + custom registry
22AgentModeDirectiveAllMode-specific guidance for chat, pipeline, and system
25CallerContextDirectiveAll, cross-site onlyAuthenticated A2A caller identity
35AgentDailyMemoryDirectiveChat, pipelineRecent daily archives when enabled
35ClientContextDirectiveAllFree-form client-reported context
40PipelineMemoryFilesDirectivePipelinePer-pipeline memory files
45ChatPipelinesDirectiveChatPipeline/flow context
45FlowMemoryFilesDirectivePipelinePer-flow memory files
50PipelineSystemPromptDirectivePipelineWorkflow instructions

Directives implement DirectiveInterface and return arrays of typed outputs:

  • 3x Performance Improvement: Direct method calls eliminate filter indirection
  • Centralized Business Logic: Consistent validation and error handling
  • Reduced Database Queries: Optimized data access patterns
  • Clean Architecture: Single responsibility per ability class
  • Backward Compatibility: Maintains WordPress hook integration

Authentication System

Base Authentication Provider Architecture (@since v0.2.6): Complete inheritance system with centralized option storage and validation across all authentication providers.

Base Classes:

  • Fetch: Data retrieval with clean content processing (core handlers include Files, RSS, Email, WordPress Local, WordPress Media, and WordPress API; extension plugins can register more)
  • AI: Content processing with multi-provider support (OpenAI, Anthropic, Google, Grok)
  • Publish: Content distribution with modular handler architecture (core handlers include WordPress and Email; extension plugins can register social destinations)
  • Upsert: Content modification (WordPress posts/pages)
  • System Task: Execute system tasks within pipeline flows
  • Agent Ping: Outbound webhook notifications to external agents
  • Webhook Gate: Wait for inbound webhook before proceeding

OAuth Handlers:

  • system_text — plain text content
  • system_json — labeled structured data
  • system_file — file path with MIME type

Authentication Providers:

  • BaseAuthProvider (/inc/Core/OAuth/BaseAuthProvider.php): Abstract base for all authentication providers with unified option storage, callback URL generation, and authentication state checking
  • BaseOAuth1Provider (/inc/Core/OAuth/BaseOAuth1Provider.php): Base for extension-provided OAuth 1.0a providers
  • BaseOAuth2Provider (/inc/Core/OAuth/BaseOAuth2Provider.php): Base for core and extension OAuth 2.0 providers

OAuth2 Flow:

  1. Create state nonce for CSRF protection
  2. Build authorization URL with parameters
  3. Handle callback: verify state, exchange code for token, retrieve account details, store credentials

OAuth1 Flow:

  1. Get request token
  2. Build authorization URL
  3. Handle callback: validate parameters, exchange for access token, store credentials

Benefits:

  • OAuth1Handler (/inc/Core/OAuth/OAuth1Handler.php): Three-legged OAuth 1.0a flow implementation
  • OAuth2Handler (/inc/Core/OAuth/OAuth2Handler.php): Authorization code flow implementation

Universal Engine Architecture

Data Machine v0.2.0 introduced a universal Engine layer (/inc/Engine/AI/) that serves both Pipeline and Chat agents with shared AI infrastructure:

Core Engine Components:

  • Core ships base classes plus concrete providers next to the handlers that need them, such as Email auth.
  • Extension plugins register their own providers through datamachine_auth_providers.

Tool Categories:

  • Eliminates duplicated storage logic across all providers (~60% code reduction per provider)
  • Standardized error handling and logging
  • Unified security implementation
  • Easy integration of new providers via base class extension

Filter-Based Discovery

All components self-register via WordPress filters:

  • AIConversationLoop: Multi-turn conversation execution with tool calling, completion detection, and state management
  • ToolExecutor: Universal tool discovery, enablement validation, and execution across agent types
  • ToolParameters: Centralized parameter building for AI tools with data packet integration
  • ConversationManager: Message formatting and conversation state management
  • RequestBuilder: AI request construction with directive application and tool restructuring
  • ToolResultFinder: Utility for finding tool execution results in data packets

Modular Component Architecture (@since v0.2.1)

Data Machine v0.2.1 introduced modular component systems for enhanced code organization and maintainability:

FilesRepository Components (/inc/Core/FilesRepository/):

  • Handler-specific tools for publish/update operations
  • Global tools for search and analysis (GoogleSearch, LocalSearch, WebFetch, WordPressPostReader)
  • Coding workspace tools live in the data-machine-code extension plugin
  • Agent memory tools (AgentMemory, AgentDailyMemory) for runtime memory access
  • Chat-only tools for workflow building (@since v0.4.3):
    • AddPipelineStep, ApiQuery, AuthenticateHandler, ConfigureFlowSteps, ConfigurePipelineStep, CopyFlow, CreateFlow, CreatePipeline, CreateTaxonomyTerm, ExecuteWorkflowTool, GetHandlerDefaults, ManageLogs, ReadLogs, RunFlow, SearchTaxonomyTerms, SetHandlerDefaults, UpdateFlow
  • Automatic tool discovery and three-layer enablement system

WordPress Shared Components (/inc/Core/WordPress/):

  • AddPipelineStep, ApiQuery, AuthenticateHandler, ConfigureFlowSteps, ConfigurePipelineStep, CopyFlow, CreateFlow, CreatePipeline, CreateTaxonomyTerm, ExecuteWorkflowTool, GetHandlerDefaults, ManageLogs, ReadLogs, RunFlow, SearchTaxonomyTerms, SetHandlerDefaults, UpdateFlow

EngineData (/inc/Core/EngineData.php):

  • AddPipelineStep, ApiQuery, AuthenticateHandler, ConfigureFlowSteps, ConfigurePipelineStep, CopyFlow, CreateFlow, CreatePipeline, CreateTaxonomyTerm, ExecuteWorkflowTool, GetHandlerDefaults, ManageLogs, ReadLogs, RunFlow, SearchTaxonomyTerms, SetHandlerDefaults, UpdateFlow

Engine Components (/inc/Engine/):

  • datamachine_handlers – Register fetch/publish/upsert handlers
  • datamachine_tools – Register AI tools and capabilities (unified static + runtime handler tool registry)
  • datamachine_auth_providers – Register authentication providers
  • datamachine_step_types – Register custom step types
  • datamachine_directives – Register AI context directives
  • datamachine_get_oauth1_handler – OAuth 1.0a handler service discovery
  • datamachine_get_oauth2_handler – OAuth 2.0 handler service discovery

Benefits:

  • DirectoryManager – Directory creation, path management, and three-layer resolution
  • FileStorage – File operations and flow-isolated storage
  • FileCleanup – Retention policy enforcement and cleanup
  • ImageValidator – Image validation and metadata extraction
  • VideoValidator – Video file validation
  • RemoteFileDownloader – Remote file downloading with validation
  • FileRetrieval – Data retrieval from file storage
  • DailyMemory – Daily memory file operations (read, write, append, search, list)

For detailed documentation:

  • TaxonomyHandler – Taxonomy selection and term creation (skip, AI-decided, pre-selected modes)
  • WordPressSettingsHandler – Shared WordPress settings fields
  • WordPressFilters – Service discovery registration

Centralized Handler Filter System

Unified Cross-Cutting Functionality: The engine provides centralized filters for shared functionality across multiple handlers, eliminating code duplication and ensuring consistency.

Core Centralized Filters:

  • Consolidated Operations – Featured image attachment, source URL attribution, and engine data access (@since v0.2.1, enhanced v0.2.6)
  • Unified Interface – Single class for all engine data operations (replaces FeaturedImageHandler and SourceUrlHandler in v0.2.6)

Implementation:

php
// Timeframe parsing example
$cutoff_timestamp = apply_filters('datamachine_timeframe_limit', null, '24_hours');
$date_query = $cutoff_timestamp ? ['after' => gmdate('Y-m-d H:i:s', $cutoff_timestamp)] : [];

// Keyword matching example
$matches = apply_filters('datamachine_keyword_search_match', true, $content, $search_keywords);
if (!$matches) continue; // Skip non-matching items

// Data packet creation example
$data = apply_filters('datamachine_data_packet', $data, $packet_data, $flow_step_id, $step_type);

Benefits:

  • StepNavigator – Centralized step navigation logic for execution flow

WordPress Publish Handler Architecture

Modular Component System: The WordPress publish handler uses specialized processing modules for enhanced maintainability and extensibility.

Core Components:

  • Code Deduplication: Eliminates repetitive functionality across handlers
  • Single Responsibility: Each component has focused purpose
  • Maintainability: Centralized logic simplifies updates
  • Extensibility: Easy to add new functionality via composition

Configuration Hierarchy: System-wide defaults ALWAYS override handler-specific configuration when set, providing consistent behavior across all WordPress publish operations.

Features:

  • FilesRepository Components
  • WordPress Shared Components
  • EngineData
  • StepNavigator

File Management

Flow-isolated UUID storage with automatic cleanup:

  • datamachine_timeframe_limit: Shared timeframe parsing with discovery/conversion modes
    • Discovery mode: Returns available timeframe options for UI dropdowns
    • Conversion mode: Returns Unix timestamp for specified timeframe
    • Used by: RSS, WordPress Local, WordPress Media, WordPress API, and extension fetch handlers
  • datamachine_keyword_search_match: Universal keyword matching with OR logic
    • Case-insensitive Unicode-safe matching
    • Comma-separated keyword support
    • Used by: RSS, WordPress Local, WordPress Media, WordPress API, and extension fetch handlers
  • datamachine_data_packet: Standardized data packet creation and structure
    • Ensures type and timestamp fields are present
    • Maintains chronological ordering via array_unshift()
    • Used by: All step types for consistent data flow

HTTP Client

The centralized HttpClient class (/inc/Core/HttpClient.php) standardizes all outbound requests for fetch and publish handlers. It wraps the native WordPress HTTP helpers while:

  • Discovery mode: Returns available timeframe options for UI dropdowns
  • Conversion mode: Returns Unix timestamp for specified timeframe
  • Used by: RSS, WordPress Local, WordPress Media, WordPress API, and extension fetch handlers

See HTTP Client for implementation details and usage guidance.

Admin Interface

Modern React Architecture: The entire Data Machine admin interface (Pipelines, Logs, Settings, Jobs, and Agents) uses a complete React implementation with zero jQuery or AJAX dependencies.

React Implementation:

  • Case-insensitive Unicode-safe matching
  • Comma-separated keyword support
  • Used by: RSS, WordPress Local, WordPress Media, WordPress API, and extension fetch handlers

Component Architecture:

  • Ensures type and timestamp fields are present
  • Maintains chronological ordering via array_unshift()
  • Used by: All step types for consistent data flow

Complete REST API Integration: All admin pages now use REST API architecture with zero jQuery/AJAX dependencies.

Security Model: All admin operations require manage_options capability with WordPress nonce validation.

Extension Framework

Complete extension system for custom handlers and tools:

  • Discovery mode: Returns available timeframe options for UI dropdowns
  • Conversion mode: Returns Unix timestamp for specified timeframe
  • Used by: RSS, WordPress Local, WordPress Media, WordPress API, and extension fetch handlers

Key Features

AI Integration

  • Case-insensitive Unicode-safe matching
  • Comma-separated keyword support
  • Used by: RSS, WordPress Local, WordPress Media, WordPress API, and extension fetch handlers

Data Processing

  • Ensures type and timestamp fields are present
  • Maintains chronological ordering via array_unshift()
  • Used by: All step types for consistent data flow

Scheduling

  • Code Consistency: Identical behavior across all handlers using shared filters
  • Maintainability: Single implementation location for shared functionality
  • Extensibility: New handlers automatically inherit shared capabilities
  • Performance: Optimized implementations used across all handlers

Security

  • EngineData: Consolidated featured image attachment and source URL attribution with configuration hierarchy (system defaults override handler config) (@since v0.2.1, enhanced v0.2.6)
  • TaxonomyHandler: Configuration-based taxonomy processing with three selection modes (skip, AI-decided, pre-selected)
  • Direct Integration: WordPress handlers use EngineData and TaxonomyHandler directly for single source of truth data access