Data Machine User Documentation
AI-first WordPress plugin for automating and orchestrating content workflows with a visual pipeline builder, conversational chat agent, REST API, and extensibility through handlers and tools.
Agent-First Architecture
Data Machine is designed for AI agents as primary users, not just tool operators.
The Self-Orchestration Pattern
While humans use Data Machine to automate content workflows, AI agents can use it to automate themselves:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ AGENT │ ──▶ │ QUEUE │ ──▶ │ PIPELINE │ ──▶ │ AGENT PING │
│ queues task │ │ persists │ │ executes │ │ wakes agent│
│ │ │ context │ │ │ │ │
└─────────────┘ └─────────────┘ └─────────────┘ └──────┬──────┘
▲ │
└──────────────────────────────────────────────────────────┘
Agent processes, queues next taskKey concepts:
Prompt Queue as Project Memory: Queue items persist across sessions, storing project context that survives context window limits. Your multi-week project becomes a series of queued prompts.
Agent Ping for Continuity: The
agent_pingstep type triggers external agents (via webhook) after pipeline completion. This is how the loop closes — you get notified when it’s your turn to act. Agent Ping is outbound-only; inbound triggers use the REST API.Phased Execution: Complex projects execute in stages over days or weeks. Each stage completes, pings the agent, and the agent queues the next stage.
Autonomous Loops: An agent can run indefinitely: process result → queue next task → sleep → wake on ping → repeat. Use explicit stop conditions to avoid runaway loops.
Prompt Queue as Project Memory: Queue items persist across sessions, storing project context that survives context window limits. Your multi-week project becomes a series of queued prompts.
System Architecture
- Pipelines are reusable workflow templates that store handler order, tool selections, and AI settings.
- Flows instantiate pipelines with schedule metadata, flow-level overrides, and runtime configuration values stored per flow.
- Ephemeral Workflows (@since v0.8.0) are temporary, on-the-fly workflows triggered via the REST API. They skip database persistence for the workflow definition itself, using sentinel values (
flow_id='direct',pipeline_id='direct') and dynamic configuration stored within the job’s engine snapshot. - Jobs track individual flow executions, persist engine parameters, and power the fully React-based Jobs dashboard for real-time monitoring. Jobs support parent-child relationships for batch execution via
parent_job_id. - Steps execute sequentially (Fetch → AI → Publish/Update) with shared base classes that enforce validation, logging, and engine data synchronization.
Multi-Agent Architecture
Agent Ping for Continuity: The agent_ping step type triggers external agents (via webhook) after pipeline completion. This is how the loop closes — you get notified when it’s your turn to act. Agent Ping is outbound-only; inbound triggers use the REST API.
- Agent Registry: Agents are stored in
datamachine_agentswith unique slugs, owner relationships, and configuration. - Access Control: The
datamachine_agent_accesstable implements role-based access (viewer, operator, admin) for sharing agents across WordPress users. - Resource Scoping: All major resources (pipelines, flows, jobs, chat sessions) carry an
agent_idcolumn. Queries filter by agent context automatically. - Filesystem Isolation: Each agent gets its own directory under
agents/{slug}/for identity files (SOUL.md, MEMORY.md) and daily memory. - Three-Layer Directory System: Memory files are organized into shared (site-wide), agent (identity), and user (personal) layers under
wp-content/uploads/datamachine-files/.
Phased Execution: Complex projects execute in stages over days or weeks. Each stage completes, pings the agent, and the agent queues the next stage.
Memory System
Autonomous Loops: An agent can run indefinitely: process result → queue next task → sleep → wake on ping → repeat. Use explicit stop conditions to avoid runaway loops.
Agent Memory Files
Prompt Queue as Project Memory: Queue items persist across sessions, storing project context that survives context window limits. Your multi-week project becomes a series of queued prompts.
| Layer | Directory | Contents |
|---|---|---|
| Shared | shared/ | SITE.md, RULES.md (site-wide context) |
| Agent | agents/{slug}/ | SOUL.md, MEMORY.md (agent identity and knowledge) |
| User | users/{id}/ | USER.md, MEMORY.md (human preferences) |
Daily Memory System
Agent Ping for Continuity: The agent_ping step type triggers external agents (via webhook) after pipeline completion. This is how the loop closes — you get notified when it’s your turn to act. Agent Ping is outbound-only; inbound triggers use the REST API.
- Synthesizes daily activity into summary files
- Prunes MEMORY.md when it exceeds 8KB, archiving session-specific content to daily files
Phased Execution: Complex projects execute in stages over days or weeks. Each stage completes, pings the agent, and the agent queues the next stage.
Memory Path Discovery
wp datamachine agent paths --allow-rootAutonomous Loops: An agent can run indefinitely: process result → queue next task → sleep → wake on ping → repeat. Use explicit stop conditions to avoid runaway loops.
This transforms Data Machine from a content automation tool into a self-scheduling execution layer for AI agents.
Abilities API
Data Machine supports multiple agents on a single WordPress installation (@since v0.36.1). Each agent has its own identity, memory, and resource scope.
FlowAbilities,PipelineAbilities,FlowStepAbilities, andPipelineStepAbilitieshandle creation, duplication, synchronization, and ordering.JobAbilitiesmonitors execution outcomes and updates statuses.ProcessedItemsAbilitiesdeduplicates content across executions by tracking previously processed identifiers.AgentAbilitiesmanages agent CRUD, renaming (with filesystem migration), and deletion.AgentMemoryAbilitiesprovides section-based read, write, append, and search operations on memory files.DailyMemoryAbilitiesmanages daily memory files — read, write, list, search, and delete by date.WorkspaceAbilitiesprovides git-aware workspace operations: clone, read, write, edit files, and run git commands. Moved to data-machine-code extension.
See Multi-Agent Architecture for details.
LogsManageraggregates log entries in thewp_datamachine_logstable for filtering in the admin UI.- Cache invalidation is handled by ability-level
clearCache()methods to ensure dynamic handler and step type registrations are immediately reflected across the system.
Data Machine uses WordPress itself as the persistent memory layer for AI agents — files on disk, conversations in the database, context assembled at request time.
System Tasks Framework
Markdown files organized in three layers:
- Job lifecycle:
completeJob(),failJob(),reschedule()with attempt tracking (max 24 retries) - Editable prompts:
getPromptDefinitions()system with overrides stored indatamachine_task_promptsoption - Undo system:
supportsUndo()andundo()for reversible operations, with effect types for post content, meta, attachments, and featured images
Built-in System Tasks
| Task Type | Class | Description |
|---|---|---|
image_generation | ImageGenerationTask | AI-powered image generation |
image_optimization | ImageOptimizationTask | Image compression and optimization |
alt_text_generation | AltTextTask | AI-generated alt text for images |
internal_linking | InternalLinkingTask | Automated internal link injection |
daily_memory_generation | DailyMemoryTask | Daily memory synthesis and MEMORY.md cleanup |
meta_description_generation | MetaDescriptionTask | AI-generated meta descriptions |
Job Undo System
Temporal knowledge preserved in date-organized files (agents/{slug}/daily/YYYY/MM/DD.md). The DailyMemoryTask system task automatically:
post_content_modified— restores WordPress revisionspost_meta_set— restores previous meta valuesattachment_created— deletes created attachmentsfeatured_image_set— restores or removes thumbnails
wp datamachine jobs undo <job_id> --allow-root
wp datamachine jobs undo <job_id> --dry-run --allow-rootWorkspace System
Note: The workspace system has been moved to the
data-machine-codeextension plugin. The following documentation is for reference only.
Pipelines can selectively inject daily memory via the DailyMemorySelectorDirective with modes: recent days, specific dates, date range, or by month.
- Location:
/var/lib/datamachine/workspace/(configurable viaDATAMACHINE_WORKSPACE_PATH) - Git-aware: Clone repos, track changes, commit, push — all through the Abilities API
- File operations: Read, write, edit files with
@filesyntax support in CLI - Security: Located outside the web root; mutating operations are CLI-only (not exposed via REST)
This canonical CLI command returns the full directory structure and file locations for any agent — the recommended way for external consumers to discover memory file paths.
wp datamachine-code workspace list --allow-root
wp datamachine-code workspace clone https://github.com/org/repo.git --allow-root
wp datamachine-code workspace read path/to/file --allow-root
wp datamachine-code workspace git status --repo=my-repo --allow-rootData Flow
- DataPacket standardizes the payload (content, metadata, attachments) that AI agents receive, keeping packets chronological and clean of URLs when not needed.
- EngineData stores engine-specific parameters such as
source_url,image_url, and flow context, which fetch handlers persist via thedatamachine_engine_datafilter for downstream handlers. - FilesRepository modules (DirectoryManager, FileStorage, RemoteFileDownloader, ImageValidator, FileCleanup, FileRetrieval) isolate file storage per flow, validate uploads, and enforce automatic cleanup after jobs complete.
AI Integration
- Tool-first architecture enables AI agents (pipeline and chat) to call tools that interact with handlers, external APIs, or workflow metadata.
- PromptBuilder + RequestBuilder apply layered directives via the
datamachine_directivesfilter so every request includes identity, context, and site-specific instructions. - Global tools (Google Search, Local Search, Web Fetch, WordPress Post Reader) are registered under
/inc/Engine/AI/Tools/and available to all agents. - Chat-specific tools (AddPipelineStep, ApiQuery, AuthenticateHandler, ConfigureFlowSteps, ConfigurePipelineStep, CopyFlow, CreateFlow, CreatePipeline, CreateTaxonomyTerm, ExecuteWorkflowTool, GetHandlerDefaults, ManageLogs, ReadLogs, RunFlow, SearchTaxonomyTerms, SetHandlerDefaults, UpdateFlow) orchestrate pipeline and flow management within conversations.
- ToolParameters + ToolResultFinder gather parameter metadata for tools and interpret results inside data packets to keep conversations consistent.
Authentication & Security
- Authentication providers extend BaseAuthProvider, BaseOAuth1Provider, or BaseOAuth2Provider under
/inc/Core/OAuth/, covering Twitter, Reddit, Facebook, Threads, Google Sheets, and Bluesky (app passwords). - OAuth handlers (
OAuth1Handler,OAuth2Handler) standardize callback handling, nonce validation, and credential storage. - Capability checks (
manage_options) and WordPress nonces guard REST endpoints; inputs run throughsanitize_*helpers before hitting services. - Multi-agent permissions:
PermissionHelperhandles agent-level access checks viaresolve_scoped_agent_id(),can_access_agent(), andowns_agent_resource(). - HttpClient centralizes outbound HTTP requests with consistent headers, browser-mode simulation, timeout control, and logging via
datamachine_log.
Scheduling & Jobs
- Action Scheduler drives scheduled flow execution while REST endpoints handle immediate runs.
- Flow schedules support manual runs, one-time execution, and recurring intervals (from 5 minutes to weekly). See Scheduling Intervals for available options.
- System task scheduling: DailyMemoryTask and other system tasks run on cron schedules via Action Scheduler.
- Batch execution: Jobs support parent-child relationships via
parent_job_idfor processing multiple items in coordinated batches. - JobManager updates statuses, emits extensibility actions (
datamachine_update_job_status), and links jobs to logs and processed items for auditing.
Admin Interface
- React-First Architecture: Admin pages are React apps built with
@wordpress/componentsand TanStack Query for server state. - Client UI state: The Pipelines page uses a small Zustand store for UI state (pipeline selection, modals, chat sidebar). Other pages may use local React state.
- Pipeline Builder: Visual pipeline/flow configuration with modal-driven step and handler settings.
- Job Management: React dashboard for job history with server-driven pagination and admin cleanup modal.
- Logs Interface: React logs viewer with filtering controls and REST-backed content loading.
- Integrated Chat: Collapsible sidebar for context-aware pipeline automation and AI-driven workflow assistance, using specialized tools to manage the entire ecosystem.
- Agent Management: Agent creation, configuration, and access control UI.
Key Capabilities
- Multi-agent support with isolated identity, memory, and resources per agent on a single WordPress installation.
- Multi-platform publishing via dedicated fetch/publish/update handlers for files, RSS, Reddit, Google Sheets, WordPress, Twitter, Threads, Bluesky, Facebook, and Google Sheets output.
- Daily memory system for automatic temporal knowledge management with AI-driven pruning.
- System tasks for background AI operations (image generation, alt text, internal linking, meta descriptions) with undo support.
- Workspace system for secure git-aware file management outside the web root (moved to data-machine-code extension).
- Extension points through filters such as
datamachine_handlers,chubes_ai_tools,datamachine_step_types,datamachine_auth_providers, anddatamachine_engine_data. - Directive orchestration ensures every AI request is context-aware, tool-enabled, and consistent with site policies.
- Chartable logging, deduplication, and error handling keep operators informed about job outcomes and prevent duplicate processing.