Data Machine User Documentation
AI-first WordPress plugin for automating and orchestrating content workflows with a visual pipeline builder, conversational chat agent, REST API, and extensibility through handlers and tools.
Agent-First Architecture
Data Machine is designed for AI agents as primary users, not just tool operators.
The Self-Orchestration Pattern
While humans use Data Machine to automate content workflows, AI agents can use it to automate themselves:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ AGENT │ ──▶ │ QUEUE │ ──▶ │ PIPELINE │ ──▶ │ AGENT PING │
│ queues task │ │ persists │ │ executes │ │ wakes agent│
│ │ │ context │ │ │ │ │
└─────────────┘ └─────────────┘ └─────────────┘ └──────┬──────┘
▲ │
└──────────────────────────────────────────────────────────┘
Agent processes, queues next taskKey concepts:
Prompt Queue as Project Memory: Queue items persist across sessions, storing project context that survives context window limits. Your multi-week project becomes a series of queued prompts.
Agent Ping for Continuity: The
agent_pingstep type triggers external agents (via webhook) after pipeline completion. This is how the loop closes — you get notified when it’s your turn to act. Agent Ping is outbound-only; inbound triggers use the REST API.Phased Execution: Complex projects execute in stages over days or weeks. Each stage completes, pings the agent, and the agent queues the next stage.
Autonomous Loops: An agent can run indefinitely: process result → queue next task → sleep → wake on ping → repeat. Use explicit stop conditions to avoid runaway loops.
Prompt Queue as Project Memory: Queue items persist across sessions, storing project context that survives context window limits. Your multi-week project becomes a series of queued prompts.
System Architecture
- Pipelines are reusable workflow templates that store handler order, tool selections, and AI settings.
- Flows instantiate pipelines with schedule metadata, flow-level overrides, and runtime configuration values stored per flow.
- Ephemeral Workflows (@since v0.8.0) are temporary, on-the-fly workflows triggered via the REST API. They skip database persistence for the workflow definition itself, using sentinel values (
flow_id='direct',pipeline_id='direct') and dynamic configuration stored within the job’s engine snapshot. - Jobs track individual flow executions, persist engine parameters, and power the fully React-based Jobs dashboard for real-time monitoring.
- Steps execute sequentially (Fetch → AI → Publish/Update) with shared base classes that enforce validation, logging, and engine data synchronization.
Abilities API
Agent Ping for Continuity: The agent_ping step type triggers external agents (via webhook) after pipeline completion. This is how the loop closes — you get notified when it’s your turn to act. Agent Ping is outbound-only; inbound triggers use the REST API.
FlowAbilities,PipelineAbilities,FlowStepAbilities, andPipelineStepAbilitieshandle creation, duplication, synchronization, and ordering.JobAbilitiesmonitors execution outcomes and updates statuses.ProcessedItemsAbilitiesdeduplicates content across executions by tracking previously processed identifiers.
Phased Execution: Complex projects execute in stages over days or weeks. Each stage completes, pings the agent, and the agent queues the next stage.
LogsManageraggregates log entries in thewp_datamachine_logstable for filtering in the admin UI.- Cache invalidation is handled by ability-level
clearCache()methods to ensure dynamic handler and step type registrations are immediately reflected across the system.
Autonomous Loops: An agent can run indefinitely: process result → queue next task → sleep → wake on ping → repeat. Use explicit stop conditions to avoid runaway loops.
Data Flow
- DataPacket standardizes the payload (content, metadata, attachments) that AI agents receive, keeping packets chronological and clean of URLs when not needed.
- EngineData stores engine-specific parameters such as
source_url,image_url, and flow context, which fetch handlers persist via thedatamachine_engine_datafilter for downstream handlers. - FilesRepository modules (DirectoryManager, FileStorage, RemoteFileDownloader, ImageValidator, FileCleanup, FileRetrieval) isolate file storage per flow, validate uploads, and enforce automatic cleanup after jobs complete.
AI Integration
- Tool-first architecture enables AI agents (pipeline and chat) to call tools that interact with handlers, external APIs, or workflow metadata.
- PromptBuilder + RequestBuilder apply layered directives via the
datamachine_directivesfilter so every request includes identity, context, and site-specific instructions. - Global tools (Google Search, Local Search, Web Fetch, WordPress Post Reader) are registered under
/inc/Engine/AI/Tools/and available to all agents. - Chat-specific tools (AddPipelineStep, ApiQuery, AuthenticateHandler, ConfigureFlowSteps, ConfigurePipelineStep, CopyFlow, CreateFlow, CreatePipeline, CreateTaxonomyTerm, ExecuteWorkflowTool, GetHandlerDefaults, ManageLogs, ReadLogs, RunFlow, SearchTaxonomyTerms, SetHandlerDefaults, UpdateFlow) orchestrate pipeline and flow management within conversations.
- ToolParameters + ToolResultFinder gather parameter metadata for tools and interpret results inside data packets to keep conversations consistent.
Authentication & Security
- Authentication providers extend BaseAuthProvider, BaseOAuth1Provider, or BaseOAuth2Provider under
/inc/Core/OAuth/, covering Twitter, Reddit, Facebook, Threads, Google Sheets, and Bluesky (app passwords). - OAuth handlers (
OAuth1Handler,OAuth2Handler) standardize callback handling, nonce validation, and credential storage. - Capability checks (
manage_options) and WordPress nonces guard REST endpoints; inputs run throughsanitize_*helpers before hitting services. - HttpClient centralizes outbound HTTP requests with consistent headers, browser-mode simulation, timeout control, and logging via
datamachine_log.
Scheduling & Jobs
- Action Scheduler drives scheduled flow execution while REST endpoints handle immediate runs.
- Flow schedules support manual runs, one-time execution, and recurring intervals (from 5 minutes to weekly). See Scheduling Intervals for available options.
- JobManager updates statuses, emits extensibility actions (
datamachine_update_job_status), and links jobs to logs and processed items for auditing.
Admin Interface
- React-First Architecture: Admin pages are React apps built with
@wordpress/componentsand TanStack Query for server state. - Client UI state: The Pipelines page uses a small Zustand store for UI state (pipeline selection, modals, chat sidebar). Other pages may use local React state.
- Pipeline Builder: Visual pipeline/flow configuration with modal-driven step and handler settings.
- Job Management: React dashboard for job history with server-driven pagination and admin cleanup modal.
- Logs Interface: React logs viewer with filtering controls and REST-backed content loading.
- Integrated Chat: Collapsible sidebar for context-aware pipeline automation and AI-driven workflow assistance, using specialized tools to manage the entire ecosystem.
Key Capabilities
- Multi-platform publishing via dedicated fetch/publish/update handlers for files, RSS, Reddit, Google Sheets, WordPress, Twitter, Threads, Bluesky, Facebook, and Google Sheets output.
- Extension points through filters such as
datamachine_handlers,chubes_ai_tools,datamachine_step_types,datamachine_auth_providers, anddatamachine_engine_data. - Directive orchestration ensures every AI request is context-aware, tool-enabled, and consistent with site policies.
- Chartable logging, deduplication, and error handling keep operators informed about job outcomes and prevent duplicate processing.