Data Machine Architecture
Data Machine is an AI-first WordPress plugin that uses a Pipeline+Flow architecture for automated content processing and publication. It provides multi-provider AI integration with tool-first design patterns, centered around a reliability-first Single Item Execution Model, with multi-agent support and a layered memory system.
Core Components
Pipeline+Flow System
- Pipelines: Reusable templates containing step configurations
- Flows: Configured instances of pipelines with scheduling
- Jobs: Individual executions of flows with status tracking, each processing exactly one item. Support parent-child relationships for batch execution via
parent_job_id.
Execution Engine
Services layer architecture with direct method calls for optimal performance. The engine implements a four-action execution cycle that processes exactly one item per job to ensure maximum reliability and isolation.
Database Schema
Eight core tables:
| Table | Purpose |
|---|---|
wp_datamachine_pipelines | Pipeline templates (reusable), with user_id and agent_id |
wp_datamachine_flows | Flow instances (scheduled + configured), with user_id and agent_id |
wp_datamachine_jobs | Job execution records, with user_id, agent_id, parent_job_id, source, label |
wp_datamachine_processed_items | Deduplication tracking per execution |
wp_datamachine_chat_sessions | Persistent conversation state, with agent_id, title, context |
wp_datamachine_agents | Agent registry (slug, name, owner, config, status) |
wp_datamachine_agent_access | Role-based access control (viewer, operator, admin) |
wp_datamachine_logs | Centralized system logs with agent scoping |
See Database Schema for full table definitions and relationships.
Multi-Agent Architecture
Data Machine supports multiple agents on a single WordPress installation (@since v0.36.1):
┌─────────────────────────────────────────────────┐
│ WordPress Site │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Agent A │ │ Agent B │ │ Agent C │ │
│ │ │ │ │ │ │ │
│ │ SOUL.md │ │ SOUL.md │ │ SOUL.md │ │
│ │ MEMORY.md│ │ MEMORY.md│ │ MEMORY.md│ │
│ │ daily/ │ │ daily/ │ │ daily/ │ │
│ │ │ │ │ │ │ │
│ │ pipelines│ │ pipelines│ │ pipelines│ │
│ │ flows │ │ flows │ │ flows │ │
│ │ jobs │ │ jobs │ │ jobs │ │
│ │ chat │ │ chat │ │ chat │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Shared Layer: SITE.md, RULES.md │ │
│ └──────────────────────────────────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ │
│ │ User 1 │ │ User 2 │ │
│ │ USER.md │ │ USER.md │ │
│ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────┘Key components:
- Agent Registry (
datamachine_agents): Each agent has a unique slug, owner, and configuration - Access Control (
datamachine_agent_access): Role-based sharing (viewer < operator < admin) - Resource Scoping: All pipelines, flows, jobs, and chat sessions carry
agent_id - Filesystem Isolation: Each agent gets
agents/{slug}/for identity files and daily memory - Permission Helper:
PermissionHelperresolves agent context and enforces access checks
Layered Memory Architecture
Agent memory is organized in a three-layer directory system below Data Machine’s files root in WordPress uploads:
datamachine-files/
├── shared/ # Site-wide (all agents)
│ ├── SITE.md
│ └── RULES.md
├── agents/ # Per-agent identity
│ ├── agent-a/
│ │ ├── SOUL.md
│ │ ├── MEMORY.md
│ │ └── daily/
│ │ └── 2026/
│ │ └── 03/
│ │ ├── 15.md
│ │ └── 16.md
│ └── agent-b/
│ ├── SOUL.md
│ └── MEMORY.md
├── users/ # Per-user preferences
│ ├── 1/
│ │ ├── USER.md
│ │ └── MEMORY.md
│ └── 2/
│ └── USER.mdCoreMemoryFilesDirective (Priority 20) loads files from layers in order:
- SITE.md → RULES.md from the shared layer
- SOUL.md → MEMORY.md from the agent layer
- USER.md → MEMORY.md from the user layer
- Custom files from
MemoryFileRegistry(extensions), including files selected by pipelines and flows
See WordPress as Agent Memory for full memory documentation.
Daily Memory System
Temporal knowledge management via date-organized files:
- DailyMemory: File operations at
agents/{slug}/daily/YYYY/MM/DD.md - DailyMemoryTask: System task with two phases:
- Phase 1: Synthesizes daily activity (jobs, chat) into daily file
- Phase 2: Prunes MEMORY.md when > 8KB, archiving session content to daily file
- DailyMemorySelectorDirective (Priority 46): Injects daily memory into pipeline AI requests with configurable selection modes (recent days, specific dates, date range, months). Capped at 100KB total.
- DailyMemoryAbilities: CRUD + search via Abilities API with multi-agent scoping
System Tasks Framework
Background AI operations that run outside the normal pipeline model:
┌─────────────────────┐
│ SystemTask │ (abstract base)
│ │
│ execute() │ ← Task-specific logic
│ completeJob() │ ← Mark done + store engine_data
│ failJob() │ ← Record failure
│ reschedule() │ ← Retry with backoff (max 24)
│ supportsUndo() │ ← Opt-in undo support
│ undo() │ ← Reverse recorded effects
│ getPromptDefs() │ ← Editable AI prompts
│ resolveSystemModel()│ ← Agent-aware model selection
└─────────────────────┘
▲
│ extends
┌────┴────────────────────────────────────┐
│ │
ImageGenerationTask AltTextTask DailyMemoryTask
ImageOptimizationTask InternalLinkingTask
AgentCallTask MetaDescriptionTask RetentionTaskUndo System: Tasks that record effects in engine_data can be reversed:
- Phase 1: Synthesizes daily activity (jobs, chat) into daily file
- Phase 2: Prunes MEMORY.md when > 8KB, archiving session content to daily file
Workspace System
Secure file management outside the web root for agent operations lives in the data-machine-code extension plugin, not Data Machine core.
- Phase 1: Synthesizes daily activity (jobs, chat) into daily file
- Phase 2: Prunes MEMORY.md when > 8KB, archiving session content to daily file
Engine Data Architecture
Clean Data Separation: AI agents receive clean data packets without URLs while handlers access engine parameters via centralized filter pattern.
Enhanced Database Storage + Filter Access: Fetch handlers store engine parameters (source_url, image_url) in database; steps retrieve via centralized datamachine_engine_data filter with storage/retrieval mode detection for unified access.
Core Pattern:
// Fetch handlers store via centralized filter (array storage)
if ($job_id) {
apply_filters('datamachine_engine_data', null, $job_id, [
'source_url' => $source_url,
'image_url' => $image_url
]);
}
// Steps retrieve via centralized filter (EngineData.php)
$engine_data = apply_filters('datamachine_engine_data', [], $job_id);
$source_url = $engine_data['source_url'] ?? null;
$image_url = $engine_data['image_url'] ?? null;Benefits:
post_content_modified→ restore WordPress revisionpost_meta_set→ restore previous valueattachment_created→ delete attachmentfeatured_image_set→ restore/remove thumbnail
Abilities-First Architecture (@since v0.11.7)
Performance Revolution: Complete replacement of the older filter-based action and service-manager layers with direct ability classes. REST endpoints, WP-CLI commands, and chat tools all delegate to WordPress Abilities API registrations under inc/Abilities/.
Ability domains (business logic):
- Location: Managed by data-machine-code workspace settings
- Git-aware: Clone, status, pull, add, commit, push, log, diff
- File ops: Read (with pagination), write, edit (find-replace), list directory
- Security: Outside web root; mutating ops are CLI-only (not REST-exposed)
- CLI:
wp datamachine-code workspace {path,list,clone,remove,show,read,ls,write,edit,git,worktree}
Coding workspace note: Git-aware workspace and GitHub coding operations live in the data-machine-code extension plugin. Data Machine core no longer registers WorkspaceAbilities or GitHub issue abilities.
Benefits:
- Clean AI Data: AI processes content without URLs for better model performance
- Centralized Access: Single filter interface for all engine data retrieval
- Filter Consistency: Maintains architectural pattern of filter-based service discovery
- Flexible Storage: Steps access only what they need via filter call
Step Types
- Flow abilities – Flow CRUD, duplication, pause/resume, scheduling, webhooks, and queue management
- Pipeline abilities – Pipeline CRUD, import/export, and pipeline-step template management
- Flow step abilities – Individual flow step configuration and handler management
- Job abilities – Workflow execution, retry/fail/delete/recovery, flow health, and summaries
- Processed item abilities – Deduplication tracking across workflows
- Agent abilities – Agent CRUD, access grants, tokens, remote calls, memory, and daily memory
- File abilities – Agent files, flow uploads, cleanup, and memory scaffolding
Directive System
Priority-ordered context injection into every AI request:
| Priority | Directive | Context | Purpose |
|---|---|---|---|
| 20 | CoreMemoryFilesDirective | All | Layer files + custom registry |
| 22 | AgentModeDirective | All | Mode-specific guidance for chat, pipeline, and system |
| 25 | CallerContextDirective | All, cross-site only | Authenticated A2A caller identity |
| 35 | AgentDailyMemoryDirective | Chat, pipeline | Recent daily archives when enabled |
| 35 | ClientContextDirective | All | Free-form client-reported context |
| 40 | PipelineMemoryFilesDirective | Pipeline | Per-pipeline memory files |
| 45 | ChatPipelinesDirective | Chat | Pipeline/flow context |
| 45 | FlowMemoryFilesDirective | Pipeline | Per-flow memory files |
| 50 | PipelineSystemPromptDirective | Pipeline | Workflow instructions |
Directives implement DirectiveInterface and return arrays of typed outputs:
- 3x Performance Improvement: Direct method calls eliminate filter indirection
- Centralized Business Logic: Consistent validation and error handling
- Reduced Database Queries: Optimized data access patterns
- Clean Architecture: Single responsibility per ability class
- Backward Compatibility: Maintains WordPress hook integration
Authentication System
Base Authentication Provider Architecture (@since v0.2.6): Complete inheritance system with centralized option storage and validation across all authentication providers.
Base Classes:
- Fetch: Data retrieval with clean content processing (core handlers include Files, RSS, Email, WordPress Local, WordPress Media, and WordPress API; extension plugins can register more)
- AI: Content processing with multi-provider support (OpenAI, Anthropic, Google, Grok)
- Publish: Content distribution with modular handler architecture (core handlers include WordPress and Email; extension plugins can register social destinations)
- Upsert: Content modification (WordPress posts/pages)
- System Task: Execute system tasks within pipeline flows
- Agent Ping: Outbound webhook notifications to external agents
- Webhook Gate: Wait for inbound webhook before proceeding
OAuth Handlers:
system_text— plain text contentsystem_json— labeled structured datasystem_file— file path with MIME type
Authentication Providers:
- BaseAuthProvider (
/inc/Core/OAuth/BaseAuthProvider.php): Abstract base for all authentication providers with unified option storage, callback URL generation, and authentication state checking - BaseOAuth1Provider (
/inc/Core/OAuth/BaseOAuth1Provider.php): Base for extension-provided OAuth 1.0a providers - BaseOAuth2Provider (
/inc/Core/OAuth/BaseOAuth2Provider.php): Base for core and extension OAuth 2.0 providers
OAuth2 Flow:
- Create state nonce for CSRF protection
- Build authorization URL with parameters
- Handle callback: verify state, exchange code for token, retrieve account details, store credentials
OAuth1 Flow:
- Get request token
- Build authorization URL
- Handle callback: validate parameters, exchange for access token, store credentials
Benefits:
- OAuth1Handler (
/inc/Core/OAuth/OAuth1Handler.php): Three-legged OAuth 1.0a flow implementation - OAuth2Handler (
/inc/Core/OAuth/OAuth2Handler.php): Authorization code flow implementation
Universal Engine Architecture
Data Machine v0.2.0 introduced a universal Engine layer (/inc/Engine/AI/) that serves both Pipeline and Chat agents with shared AI infrastructure:
Core Engine Components:
- Core ships base classes plus concrete providers next to the handlers that need them, such as Email auth.
- Extension plugins register their own providers through
datamachine_auth_providers.
Tool Categories:
- Eliminates duplicated storage logic across all providers (~60% code reduction per provider)
- Standardized error handling and logging
- Unified security implementation
- Easy integration of new providers via base class extension
Filter-Based Discovery
All components self-register via WordPress filters:
- AIConversationLoop: Multi-turn conversation execution with tool calling, completion detection, and state management
- ToolExecutor: Universal tool discovery, enablement validation, and execution across agent types
- ToolParameters: Centralized parameter building for AI tools with data packet integration
- ConversationManager: Message formatting and conversation state management
- RequestBuilder: AI request construction with directive application and tool restructuring
- ToolResultFinder: Utility for finding tool execution results in data packets
Modular Component Architecture (@since v0.2.1)
Data Machine v0.2.1 introduced modular component systems for enhanced code organization and maintainability:
FilesRepository Components (/inc/Core/FilesRepository/):
- Handler-specific tools for publish/update operations
- Global tools for search and analysis (GoogleSearch, LocalSearch, WebFetch, WordPressPostReader)
- Coding workspace tools live in the
data-machine-codeextension plugin - Agent memory tools (AgentMemory, AgentDailyMemory) for runtime memory access
- Chat-only tools for workflow building (@since v0.4.3):
- AddPipelineStep, ApiQuery, AuthenticateHandler, ConfigureFlowSteps, ConfigurePipelineStep, CopyFlow, CreateFlow, CreatePipeline, CreateTaxonomyTerm, ExecuteWorkflowTool, GetHandlerDefaults, ManageLogs, ReadLogs, RunFlow, SearchTaxonomyTerms, SetHandlerDefaults, UpdateFlow
- Automatic tool discovery and three-layer enablement system
WordPress Shared Components (/inc/Core/WordPress/):
- AddPipelineStep, ApiQuery, AuthenticateHandler, ConfigureFlowSteps, ConfigurePipelineStep, CopyFlow, CreateFlow, CreatePipeline, CreateTaxonomyTerm, ExecuteWorkflowTool, GetHandlerDefaults, ManageLogs, ReadLogs, RunFlow, SearchTaxonomyTerms, SetHandlerDefaults, UpdateFlow
EngineData (/inc/Core/EngineData.php):
- AddPipelineStep, ApiQuery, AuthenticateHandler, ConfigureFlowSteps, ConfigurePipelineStep, CopyFlow, CreateFlow, CreatePipeline, CreateTaxonomyTerm, ExecuteWorkflowTool, GetHandlerDefaults, ManageLogs, ReadLogs, RunFlow, SearchTaxonomyTerms, SetHandlerDefaults, UpdateFlow
Engine Components (/inc/Engine/):
datamachine_handlers– Register fetch/publish/upsert handlersdatamachine_tools– Register AI tools and capabilities (unified static + runtime handler tool registry)datamachine_auth_providers– Register authentication providersdatamachine_step_types– Register custom step typesdatamachine_directives– Register AI context directivesdatamachine_get_oauth1_handler– OAuth 1.0a handler service discoverydatamachine_get_oauth2_handler– OAuth 2.0 handler service discovery
Benefits:
- DirectoryManager – Directory creation, path management, and three-layer resolution
- FileStorage – File operations and flow-isolated storage
- FileCleanup – Retention policy enforcement and cleanup
- ImageValidator – Image validation and metadata extraction
- VideoValidator – Video file validation
- RemoteFileDownloader – Remote file downloading with validation
- FileRetrieval – Data retrieval from file storage
- DailyMemory – Daily memory file operations (read, write, append, search, list)
For detailed documentation:
- TaxonomyHandler – Taxonomy selection and term creation (skip, AI-decided, pre-selected modes)
- WordPressSettingsHandler – Shared WordPress settings fields
- WordPressFilters – Service discovery registration
Centralized Handler Filter System
Unified Cross-Cutting Functionality: The engine provides centralized filters for shared functionality across multiple handlers, eliminating code duplication and ensuring consistency.
Core Centralized Filters:
- Consolidated Operations – Featured image attachment, source URL attribution, and engine data access (@since v0.2.1, enhanced v0.2.6)
- Unified Interface – Single class for all engine data operations (replaces FeaturedImageHandler and SourceUrlHandler in v0.2.6)
Implementation:
// Timeframe parsing example
$cutoff_timestamp = apply_filters('datamachine_timeframe_limit', null, '24_hours');
$date_query = $cutoff_timestamp ? ['after' => gmdate('Y-m-d H:i:s', $cutoff_timestamp)] : [];
// Keyword matching example
$matches = apply_filters('datamachine_keyword_search_match', true, $content, $search_keywords);
if (!$matches) continue; // Skip non-matching items
// Data packet creation example
$data = apply_filters('datamachine_data_packet', $data, $packet_data, $flow_step_id, $step_type);Benefits:
- StepNavigator – Centralized step navigation logic for execution flow
WordPress Publish Handler Architecture
Modular Component System: The WordPress publish handler uses specialized processing modules for enhanced maintainability and extensibility.
Core Components:
- Code Deduplication: Eliminates repetitive functionality across handlers
- Single Responsibility: Each component has focused purpose
- Maintainability: Centralized logic simplifies updates
- Extensibility: Easy to add new functionality via composition
Configuration Hierarchy: System-wide defaults ALWAYS override handler-specific configuration when set, providing consistent behavior across all WordPress publish operations.
Features:
- FilesRepository Components
- WordPress Shared Components
- EngineData
- StepNavigator
File Management
Flow-isolated UUID storage with automatic cleanup:
datamachine_timeframe_limit: Shared timeframe parsing with discovery/conversion modes- Discovery mode: Returns available timeframe options for UI dropdowns
- Conversion mode: Returns Unix timestamp for specified timeframe
- Used by: RSS, WordPress Local, WordPress Media, WordPress API, and extension fetch handlers
datamachine_keyword_search_match: Universal keyword matching with OR logic- Case-insensitive Unicode-safe matching
- Comma-separated keyword support
- Used by: RSS, WordPress Local, WordPress Media, WordPress API, and extension fetch handlers
datamachine_data_packet: Standardized data packet creation and structure- Ensures type and timestamp fields are present
- Maintains chronological ordering via array_unshift()
- Used by: All step types for consistent data flow
HTTP Client
The centralized HttpClient class (/inc/Core/HttpClient.php) standardizes all outbound requests for fetch and publish handlers. It wraps the native WordPress HTTP helpers while:
- Discovery mode: Returns available timeframe options for UI dropdowns
- Conversion mode: Returns Unix timestamp for specified timeframe
- Used by: RSS, WordPress Local, WordPress Media, WordPress API, and extension fetch handlers
See HTTP Client for implementation details and usage guidance.
Admin Interface
Modern React Architecture: The entire Data Machine admin interface (Pipelines, Logs, Settings, Jobs, and Agents) uses a complete React implementation with zero jQuery or AJAX dependencies.
React Implementation:
- Case-insensitive Unicode-safe matching
- Comma-separated keyword support
- Used by: RSS, WordPress Local, WordPress Media, WordPress API, and extension fetch handlers
Component Architecture:
- Ensures type and timestamp fields are present
- Maintains chronological ordering via array_unshift()
- Used by: All step types for consistent data flow
Complete REST API Integration: All admin pages now use REST API architecture with zero jQuery/AJAX dependencies.
Security Model: All admin operations require manage_options capability with WordPress nonce validation.
Extension Framework
Complete extension system for custom handlers and tools:
- Discovery mode: Returns available timeframe options for UI dropdowns
- Conversion mode: Returns Unix timestamp for specified timeframe
- Used by: RSS, WordPress Local, WordPress Media, WordPress API, and extension fetch handlers
Key Features
AI Integration
- Case-insensitive Unicode-safe matching
- Comma-separated keyword support
- Used by: RSS, WordPress Local, WordPress Media, WordPress API, and extension fetch handlers
Data Processing
- Ensures type and timestamp fields are present
- Maintains chronological ordering via array_unshift()
- Used by: All step types for consistent data flow
Scheduling
- Code Consistency: Identical behavior across all handlers using shared filters
- Maintainability: Single implementation location for shared functionality
- Extensibility: New handlers automatically inherit shared capabilities
- Performance: Optimized implementations used across all handlers
Security
- EngineData: Consolidated featured image attachment and source URL attribution with configuration hierarchy (system defaults override handler config) (@since v0.2.1, enhanced v0.2.6)
- TaxonomyHandler: Configuration-based taxonomy processing with three selection modes (skip, AI-decided, pre-selected)
- Direct Integration: WordPress handlers use EngineData and TaxonomyHandler directly for single source of truth data access