Data Machine User Documentation
AI-first WordPress plugin for automating and orchestrating content workflows with a visual pipeline builder, conversational chat agent, REST API, and extensibility through handlers and tools.
Agent-First Architecture
Data Machine is designed for AI agents as primary users, not just tool operators.
The Self-Orchestration Pattern
While humans use Data Machine to automate content workflows, AI agents can use it to automate themselves:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ AGENT │ ──▶ │ QUEUE │ ──▶ │ PIPELINE │ ──▶ │ AGENT PING │
│ queues task │ │ persists │ │ executes │ │ wakes agent│
│ │ │ context │ │ │ │ │
└─────────────┘ └─────────────┘ └─────────────┘ └──────┬──────┘
▲ │
└──────────────────────────────────────────────────────────┘
Agent processes, queues next taskKey concepts:
Prompt Queue as Project Memory: Queue items persist across sessions, storing project context that survives context window limits. Your multi-week project becomes a series of queued prompts.
Agent Ping for Continuity: The
agent_pingstep type triggers external agents (via webhook) after pipeline completion. This is how the loop closes — you get notified when it’s your turn to act. Agent Ping is outbound-only; inbound triggers use the REST API.Phased Execution: Complex projects execute in stages over days or weeks. Each stage completes, pings the agent, and the agent queues the next stage.
Autonomous Loops: An agent can run indefinitely: process result → queue next task → sleep → wake on ping → repeat. Use explicit stop conditions to avoid runaway loops.
Prompt Queue as Project Memory: Queue items persist across sessions, storing project context that survives context window limits. Your multi-week project becomes a series of queued prompts.
System Architecture
- Pipelines are reusable workflow templates that store handler order, tool selections, and AI settings.
- Flows instantiate pipelines with schedule metadata, flow-level overrides, and runtime configuration values stored per flow.
- Ephemeral Workflows (@since v0.8.0) are temporary, on-the-fly workflows triggered via the REST API. They skip database persistence for the workflow definition itself, using sentinel values (
flow_id='direct',pipeline_id='direct') and dynamic configuration stored within the job’s engine snapshot. - Jobs track individual flow executions, persist engine parameters, and power the fully React-based Jobs dashboard for real-time monitoring. Jobs support parent-child relationships for batch execution via
parent_job_id. - Steps execute sequentially (Fetch → AI → Publish/Update) with shared base classes that enforce validation, logging, and engine data synchronization.
Multi-Agent Architecture
Agent Ping for Continuity: The agent_ping step type triggers external agents (via webhook) after pipeline completion. This is how the loop closes — you get notified when it’s your turn to act. Agent Ping is outbound-only; inbound triggers use the REST API.
- Agent Registry: Agents are stored in
datamachine_agentswith unique slugs, owner relationships, and configuration. - Access Control: The
datamachine_agent_accesstable implements role-based access (viewer, operator, admin) for sharing agents across WordPress users. - Resource Scoping: All major resources (pipelines, flows, jobs, chat sessions) carry an
agent_idcolumn. Queries filter by agent context automatically. - Filesystem Isolation: Each agent gets its own directory under
agents/{slug}/for identity files (SOUL.md, MEMORY.md) and daily memory. - Three-Layer Directory System: Memory files are organized into shared (site-wide), agent (identity), and user (personal) layers below Data Machine’s files root in WordPress uploads.
Phased Execution: Complex projects execute in stages over days or weeks. Each stage completes, pings the agent, and the agent queues the next stage.
Memory System
Autonomous Loops: An agent can run indefinitely: process result → queue next task → sleep → wake on ping → repeat. Use explicit stop conditions to avoid runaway loops.
Agent Memory Files
Prompt Queue as Project Memory: Queue items persist across sessions, storing project context that survives context window limits. Your multi-week project becomes a series of queued prompts.
| Layer | Directory | Contents |
|---|---|---|
| Shared | shared layer | SITE.md, RULES.md (site-wide context) |
| Agent | agent slug layer | SOUL.md, MEMORY.md (agent identity and knowledge) |
| User | user ID layer | USER.md, MEMORY.md (human preferences) |
Daily Memory System
Agent Ping for Continuity: The agent_ping step type triggers external agents (via webhook) after pipeline completion. This is how the loop closes — you get notified when it’s your turn to act. Agent Ping is outbound-only; inbound triggers use the REST API.
- Synthesizes daily activity into summary files
- Prunes MEMORY.md when it exceeds 8KB, archiving session-specific content to daily files
Phased Execution: Complex projects execute in stages over days or weeks. Each stage completes, pings the agent, and the agent queues the next stage.
Memory Path Discovery
wp datamachine memory paths --allow-rootAutonomous Loops: An agent can run indefinitely: process result → queue next task → sleep → wake on ping → repeat. Use explicit stop conditions to avoid runaway loops.
This transforms Data Machine from a content automation tool into a self-scheduling execution layer for AI agents.
Abilities API
Data Machine supports multiple agents on a single WordPress installation (@since v0.36.1). Each agent has its own identity, memory, and resource scope.
- Flow, pipeline, and flow-step operations live in focused classes under
inc/Abilities/Flow/,inc/Abilities/Pipeline/, andinc/Abilities/FlowStep/;PipelineStepAbilitieshandles pipeline-step ordering and synchronization. - Job abilities monitor execution outcomes, retries, manual failure, recovery, summaries, and deletion.
ProcessedItemsAbilitiesdeduplicates content across executions by tracking previously processed identifiers.AgentAbilitiesmanages agent CRUD, renaming (with filesystem migration), and deletion.AgentMemoryAbilitiesprovides section-based read, write, append, and search operations on memory files.DailyMemoryAbilitiesmanages daily memory files — read, write, list, search, and delete by date.LogAbilitiesand theLogRepositoryaggregate log entries in thewp_datamachine_logstable for filtering in the admin UI.- Cache invalidation is handled by ability-level
clearCache()methods to ensure dynamic handler and step type registrations are immediately reflected across the system.
See Multi-Agent Architecture for details.
System Tasks Framework
Data Machine uses WordPress itself as the persistent memory layer for AI agents — files on disk, conversations in the database, context assembled at request time.
- Job lifecycle:
completeJob(),failJob(),reschedule()with attempt tracking (max 24 retries) - Editable prompts:
getPromptDefinitions()system with overrides stored indatamachine_task_promptsoption - Undo system:
supportsUndo()andundo()for reversible operations, with effect types for post content, meta, attachments, and featured images
Built-in System Tasks
| Task Type | Class | Description |
|---|---|---|
image_generation | ImageGenerationTask | AI-powered image generation |
image_optimization | ImageOptimizationTask | Image compression and optimization |
alt_text_generation | AltTextTask | AI-generated alt text for images |
internal_linking | InternalLinkingTask | Automated internal link injection |
daily_memory_generation | DailyMemoryTask | Daily memory synthesis and MEMORY.md cleanup |
meta_description_generation | MetaDescriptionTask | AI-generated meta descriptions |
Job Undo System
Markdown files organized in three layers:
post_content_modified— restores WordPress revisionspost_meta_set— restores previous meta valuesattachment_created— deletes created attachmentsfeatured_image_set— restores or removes thumbnails
wp datamachine jobs undo <job_id> --allow-root
wp datamachine jobs undo <job_id> --dry-run --allow-rootData Flow
- DataPacket standardizes the payload (content, metadata, attachments) that AI agents receive, keeping packets chronological and clean of URLs when not needed.
- EngineData stores engine-specific parameters such as
source_url,image_url, and flow context, which fetch handlers persist via thedatamachine_engine_datafilter for downstream handlers. - FilesRepository modules (DirectoryManager, FileStorage, RemoteFileDownloader, ImageValidator, FileCleanup, FileRetrieval) isolate file storage per flow, validate uploads, and enforce automatic cleanup after jobs complete.
AI Integration
- Tool-first architecture enables AI agents (pipeline and chat) to call tools that interact with handlers, external APIs, or workflow metadata.
- PromptBuilder + RequestBuilder apply layered directives via the
datamachine_directivesfilter so every request includes identity, context, and site-specific instructions. - Global tools (Google Search, Local Search, Web Fetch, WordPress Post Reader) are registered under
/inc/Engine/AI/Tools/and available to all agents. - Chat-specific tools (AddPipelineStep, ApiQuery, AuthenticateHandler, ConfigureFlowSteps, ConfigurePipelineStep, CopyFlow, CreateFlow, CreatePipeline, CreateTaxonomyTerm, ExecuteWorkflowTool, GetHandlerDefaults, ManageLogs, ReadLogs, RunFlow, SearchTaxonomyTerms, SetHandlerDefaults, UpdateFlow) orchestrate pipeline and flow management within conversations.
- ToolParameters + ToolResultFinder gather parameter metadata for tools and interpret results inside data packets to keep conversations consistent.
Authentication & Security
- Authentication providers extend BaseAuthProvider, BaseOAuth1Provider, or BaseOAuth2Provider under
/inc/Core/OAuth/; concrete providers live next to their handlers in core or extension plugins. - OAuth handlers (
OAuth1Handler,OAuth2Handler) standardize callback handling, nonce validation, and credential storage. - Capability checks (
manage_options) and WordPress nonces guard REST endpoints; inputs run throughsanitize_*helpers before hitting services. - Multi-agent permissions:
PermissionHelperhandles agent-level access checks viaresolve_scoped_agent_id(),can_access_agent(), andowns_agent_resource(). - HttpClient centralizes outbound HTTP requests with consistent headers, browser-mode simulation, timeout control, and logging via
datamachine_log.
Scheduling & Jobs
- Action Scheduler drives scheduled flow execution while REST endpoints handle immediate runs.
- Flow schedules support manual runs, one-time execution, and recurring intervals (from 5 minutes to weekly). See Scheduling Intervals for available options.
- System task scheduling: DailyMemoryTask and other system tasks run on cron schedules via Action Scheduler.
- Batch execution: Jobs support parent-child relationships via
parent_job_idfor processing multiple items in coordinated batches. - Job abilities and repositories update statuses, emit extensibility actions (
datamachine_update_job_status), and link jobs to logs and processed items for auditing.
Admin Interface
- React-First Architecture: Admin pages are React apps built with
@wordpress/componentsand TanStack Query for server state. - Client UI state: The Pipelines page uses a small Zustand store for UI state (pipeline selection, modals, chat sidebar). Other pages may use local React state.
- Pipeline Builder: Visual pipeline/flow configuration with modal-driven step and handler settings.
- Job Management: React dashboard for job history with server-driven pagination and admin cleanup modal.
- Logs Interface: React logs viewer with filtering controls and REST-backed content loading.
- Integrated Chat: Collapsible sidebar for context-aware pipeline automation and AI-driven workflow assistance, using specialized tools to manage the entire ecosystem.
- Agent Management: Agent creation, configuration, and access control UI.
Key Capabilities
- Multi-agent support with isolated identity, memory, and resources per agent on a single WordPress installation.
- Multi-platform publishing via core fetch/publish/upsert handlers for files, RSS, email, and WordPress, plus extension-provided handlers for social, business, and event destinations.
- Daily memory system for automatic temporal knowledge management with AI-driven pruning.
- System tasks for background AI operations (image generation, alt text, internal linking, meta descriptions) with undo support.
- Extension points through filters such as
datamachine_handlers,datamachine_tools,datamachine_step_types,datamachine_auth_providers, anddatamachine_engine_data. - Directive orchestration ensures every AI request is context-aware, tool-enabled, and consistent with site policies.
- Chartable logging, deduplication, and error handling keep operators informed about job outcomes and prevent duplicate processing.