Processed Items Endpoint

Implementation: inc/Api/ProcessedItems.php

Base URL: /wp-json/datamachine/v1/processed-items

Overview

Processed items are the deduplication records Data Machine uses to avoid re-processing the same source item across flow executions. Fetch handlers and engine code write these records during execution; the REST surface currently exposes an operator cleanup endpoint for resetting deduplication by pipeline or flow.

For richer read/check/stale-history operations, use the datamachine/* abilities in ProcessedItemsAbilities or the WP-CLI wp datamachine processed-items command.

Revision-Key Conventions

Refresh workloads should use processed items as revision-level dedupe records when the same logical item may be revisited across scheduled runs. Data Machine owns the durable processed-item table and runtime counters; the workload owner owns domain-specific source types and identifier parts.

Use these generic rules for revision keys:

FieldConvention
flow_step_idThe flow step that owns the refresh boundary.
source_typeA stable workload-defined namespace, such as a source or artifact kind.
item_identifierA stable string assembled from the workload scope, logical item ID, and a content/revision hash.

Useful identifier shapes for source/index refresh workloads include:

ShapeUse
`document=
`chunk=
`provider=

Batch jobs should store refresh progress with the generic run metric count names selected, skipped, processed, failed, and retried. RunMetrics::fromJob() reads those counters from engine_data.batch_results, so batch parent jobs can report refresh progress without a workload-specific metrics table.

Authentication

The REST endpoint requires PermissionHelper::can( 'manage_flows' ). Administrators pass through the mapped Data Machine capability fallback.

Endpoint

DELETE /processed-items

Clear processed-item records for a flow or pipeline.

Permission: manage_flows

Parameters:

ParameterTypeRequiredDescription
clear_typestringYesEither pipeline or flow.
target_idintegerYesPipeline ID or flow ID matching clear_type.

Example request:

bash
curl -X DELETE https://example.com/wp-json/datamachine/v1/processed-items 
  -u username:application_password 
  -H 'Content-Type: application/json' 
  -d '{"clear_type":"flow","target_id":42}'

Success response:

json
{
  "success": true,
  "message": "Cleared processed items for flow 42",
  "cleared_count": 17,
  "clear_type": "flow",
  "target_id": 42
}

Ability Surface

ProcessedItemsAbilities registers the current ability surface for programmatic callers:

AbilityPurpose
datamachine/clear-processed-itemsClear dedupe records by flow or pipeline.
datamachine/check-processed-itemCheck whether an item has been processed.
datamachine/has-processed-historyCheck whether a flow has any processed history.
datamachine/processed-items-get-processed-atReturn the last processed timestamp for an item.
datamachine/processed-items-find-staleFilter candidate items to those processed before a threshold.
datamachine/processed-items-find-never-processedFilter candidate items to those with no processed record.