Agent Bundles
Agent bundles are portable, versioned packages for an agent’s runtime behavior. They describe what to install without embedding local database IDs or secrets.
Directory Layout
<bundle>/
├── manifest.json
├── memory/
├── pipelines/<pipeline-slug>.json
├── flows/<flow-slug>.json
├── prompts/<prompt-slug>.md
├── rubrics/<rubric-slug>.md
├── tool-policies/<policy-slug>.json
├── seed-queues/<queue-slug>.json
├── extensions/
└── <extras-key>/ # Plugin-owned opaque extras (e.g. wiki/, datasets/)Only manifest.json, pipelines/, flows/, and memory/ have import/export adapters today. The remaining directories are reserved schema surface for prompts, rubrics, tool policies, auth references, seed queue artifacts, and plugin-owned extension artifacts.
Reserved Trees And Extras
Data Machine owns the directory names listed in BundleSchema::RESERVED_TREES plus the manifest filename. Anything else at the bundle root is opaque to the Bundle layer and round-trips as an extra: a top-level directory whose contents are carried as a file map (<key>/<relative-path> => string contents) under bundle["extras"].
Reserved trees as of schema_version 1:
manifest.json,agent/,memory/pipelines/,flows/prompts/,rubrics/,tool-policies/,auth-refs/,seed-queues/extensions/
Extras rules:
- Extras keys are slug-like directory names (ASCII alphanumerics, dashes, underscores; no slashes).
- Each path inside an extra must start with
<key>/, must not contain..or.segments, and must not be absolute. - Empty extras directories are dropped on read.
- Hidden entries (names starting with
.), symlinks that escape the bundle root, and binary files (NUL-byte heuristic) are skipped on read with a logged warning. - Data Machine does NOT auto-extract extras to disk on install. Consumers persist extras themselves via the post-install hook below.
Consumer Pattern
Plugins claim extras keys through two hooks. Data Machine ships the transport; the consumer ships the semantics.
// At export time, contribute extras the consumer wants packaged.
add_filter(
'datamachine_bundle_export_extras',
function ( array $extras, int $agent_id, array $agent ): array {
$files = my_plugin_collect_wiki_files_for_agent( $agent_id );
if ( ! empty( $files ) ) {
$extras['wiki'] = $files; // keys must start with 'wiki/'
}
return $extras;
},
10,
3
);
// At import time, claim and persist extras the consumer owns.
add_action(
'datamachine_bundle_install_succeeded',
function ( int $agent_id, string $slug, array $bundle_metadata, array $extras, array $context ): void {
if ( empty( $extras['wiki'] ) ) {
return;
}
my_plugin_import_wiki_files( $agent_id, $extras['wiki'], $context['is_upgrade'] );
},
10,
5
);The hook fires after the install/upgrade transaction commits — never on dry-run, never on failure. Listener exceptions are caught, logged, and suppressed; they do not roll back the install.
See Automattic/intelligence#467 for the reference consumer that claims the wiki/ extras key.
Manifest Schema
manifest.json uses schema_version: 1 and requires:
bundle_slug: stable bundle identity.bundle_version: explicit bundle version for upgrade planning.source_refandsource_revision: optional source coordinates.exported_atandexported_by: export metadata.agent:slug,label,description, andagent_configdefaults.included: lists ofmemory,pipelines,flows,prompts,rubrics,tool_policies,auth_refs, andseed_queues, plushandler_authmode.
handler_auth is one of:
refs: includeauth_refhandles only.full: reserved for encrypted credential exports.omit: omit handler auth material.
Artifact Tracking
Bundle installs track artifacts independently of their runtime table. The foundation record shape is:
agent_idbundle_slugbundle_versionartifact_typeartifact_idsource_pathinstalled_hashcurrent_hashlocal_statusinstalled_atupdated_at
artifact_type is one of agent, memory, pipeline, flow, prompt, rubric, tool_policy, auth_ref, seed_queue, or schedule.
Statuses:
clean: current hash equals the installed hash.modified: current hash differs from the installed hash.missing: an installed artifact no longer exists in runtime state.orphaned: runtime state exists without an installed bundle record.
Hashing is deterministic: arrays are recursively sorted and JSON-encoded through BundleSchema::encode_json() before SHA-256 hashing. This makes formatting and associative key order irrelevant while preserving list order.
Memory Section Artifacts
Memory artifacts can be tracked at section granularity. Section records extend the artifact identity with operational memory ownership fields:
agent_idsection_idsection_headingsection_typeowner:bundle,user,runtime, orcompactionbundle_slugandbundle_versionfor bundle-owned seed sectionsinstalled_hash,current_hash, andlocal_status
Self-memory writes are policy-constrained: the default target is the current acting agent, allowed section types are operational (operating_note, source_quirk, run_lesson, task_note), durable facts are rejected, and bundle-owned sections are staged through PendingActions instead of overwritten directly.
Pipeline And Flow Updates
Bundle-installed pipelines and flows use stable portable_slug values as their runtime identity:
- Pipelines resolve by
(agent_id, portable_slug). - Flows resolve by
(pipeline_id, portable_slug). - Re-importing the same bundle updates the existing clean rows instead of creating duplicates.
- Local edits are detected by comparing the current artifact hash with the install-time hash recorded in the owning agent’s
datamachine_bundle.artifactsconfig. - Installed artifact tracking is persisted in the site-scoped
datamachine_bundle_artifactstable, keyed by(agent_id, bundle_slug, artifact_type, artifact_id), so bundle state is independent of runtime tables. - Modified artifacts are reported as conflicts and are not overwritten by the import path.
Schedule and queue policy is intentionally conservative:
- New flows are installed paused/manual. The source schedule is retained as
_original_intervalfor operators to opt in later. - Existing flow schedules are preserved during upgrades.
- Queue slots (
prompt_queue,config_patch_queue,queue_mode) seed new flows but are preserved on upgrades so runtime backlogs are not discarded.
Portable flow-step policy fields are explicit in flows/<flow-slug>.json:
enabled_tools: flow-scoped AI allow-list consumed byToolPolicyResolver.disabled_tools: flow-scoped AI deny-list composed with pipeline-level disabled tools.prompt_queue: AI seed queue entries shaped as{ "prompt": "...", "added_at": "..." }.config_patch_queue: fetch seed queue entries shaped as{ "patch": { ... }, "added_at": "..." }.queue_mode: one ofdrain,loop, orstatic; shared by both queue slots on that flow step.
CLI
Agent package operations live under wp datamachine agent:
install <path|url>imports a local bundle path (.zip,.json, or directory) or a remote URL.installedreports installed package-backed agents without clobberingagent list.status <slug>reports installed version and tracked artifact state by agent slug or bundle slug.diff <path|url>builds a read-only upgrade plan.upgrade <path|url>applies clean updates and stages locally modified artifacts as PendingActions.apply <pending_action_id>accepts a staged bundle PendingAction.
Every read/preview command supports --format=json for automation.
install, import, upgrade, and diff accept --token=<token> and --token-env=<varname> for one-off authenticated downloads — see Authenticated Bundle Sources.
Authenticated Bundle Sources
Public archives install with no configuration. Private GitHub repositories, GitHub Enterprise (GHE) hosts, and any other URL behind Authorization: Bearer <token> are supported through a layered opt-in chain:
- CLI flag (
--token=<token>/--token-env=<varname>) — single-call, never persisted, never logged. - Environment variable / PHP constant —
DATAMACHINE_GITHUB_TOKENfor github.com / raw.githubusercontent.com; per-host symbol for GHE (configured via filter). - WP option (github.com only) —
datamachine_bundle_source_github_token. Lowest precedence convenience slot. - Filter fallback —
datamachine_bundle_source_token_for_urlfor sigillo, AWS Secrets Manager, Vault, etc. - Generic per-request filter —
datamachine_bundle_source_download_argsfor arbitrary header injection (any host, any auth scheme).
Data Machine never persists tokens itself. Storage and rotation live in the host plugin or operator’s environment. Errors that surface from a failed download include the URL but never the Authorization header — see BundleSourceAuth::redact_args_for_log() if you log the args downstream.
Example 1 — github.com private archive via env var
export DATAMACHINE_GITHUB_TOKEN=ghp_xxx
wp datamachine agent install https://github.com/private-org/private-repo/archive/refs/heads/main.zip --yesA 401/403/404 response surfaces as WP_Error( 'datamachine_bundle_source_auth_required', ... ) with a hint about the configured token slots.
URL routing. When a token is available for api.github.com, web-host archive URLs (github.com/<o>/<r>/archive/refs/heads/<branch>.zip, archive/<sha>.zip, and tree/<branch>) are rewritten to api.github.com/repos/<o>/<r>/zipball/<ref> before the request flies. The web-host archive endpoint rejects Personal Access Tokens with HTTP 404; the API endpoint accepts both fine-grained and classic PATs and returns a 302 to a signed S3 URL. Public installs (no token configured) keep using the web-host URL so they don’t pay the API rate-limit cost. The Authorization header is dropped on cross-host redirects so it never leaks to S3 (which would 400 on the unrecognized auth scheme).
Example 2 — GHE archive via filter-configured host + constant
// mu-plugin or theme bootstrap.
add_filter( 'datamachine_bundle_source_ghe_hosts', function ( array $hosts ): array {
$hosts['github.a8c.com'] = 'DATAMACHINE_A8C_GHE_TOKEN';
return $hosts;
} );
define( 'DATAMACHINE_A8C_GHE_TOKEN', 'ghp_yyy' );wp datamachine agent install https://github.a8c.com/team/brain/archive/refs/heads/main.zip --yesExample 3 — Custom secret store via filter callback
For sigillo / AWS Secrets Manager / Vault — anything that doesn’t fit the env-var/constant model — wire a callback on datamachine_bundle_source_download_args and skip the resolution chain entirely:
add_filter(
'datamachine_bundle_source_download_args',
function ( array $args, string $source, string $fetch_url ): array {
$host = wp_parse_url( $fetch_url, PHP_URL_HOST ) ?: '';
if ( str_ends_with( $host, '.example.com' ) ) {
$token = my_sigillo_client()->get( "bundles/{$host}" );
if ( $token ) {
$args['headers']['Authorization'] = 'Bearer ' . $token;
}
}
return $args;
},
10,
3
);Or hook the lower-cost datamachine_bundle_source_token_for_url fallback when you only want to fill the gap for hosts that have no built-in slot.
Source Revision Capture
For GitHub archive responses, BundleSource::fetch_to_tempfile() reads the response ETag header and parses the git SHA out of the W/"<sha>:<format>" (or bare "<sha>") shape GitHub serves. When present and parseable the SHA is exposed via BundleSource::last_resolved_revision() and stamped onto the bundle as source_revision. Best-effort only — installs do not fail when the ETag is absent or unparseable.
Follow-Ups
- Wire importer/exporter paths for prompts, rubrics, tool policies, auth refs, and seed queues.
- Use artifact statuses during bundle upgrade planning: auto-update
clean, stage PendingActions formodified, surfacemissingandorphanedexplicitly. - Add registered bundle sources once bundles move beyond local paths.