Problem
skip_item currently mixes two very different decisions:
- Source rejection — the AI reasons that the fetched source is irrelevant, too thin, duplicate, spammy, or otherwise not worth writing. This should mark the item processed so it does not return.
- Execution/tool deferral — the AI cannot complete the task because a tool, quality gate, provider, or runtime path failed. This should release the claim and keep the item eligible for a future retry/reprocess.
Because the tool is named and implemented as a generic skip, the model can turn transient/tooling failures into permanent source exclusion.
Runtime evidence
On intelligence-chubes4, Flow 2 / Job 640:
Generated wiki article failed quality gates: generated_identity.target_path must stay inside generated_identity.root.
{ "reason": "tool-error" }
via skip_item.
That marked the source as processed even though the source was usable and the failure was a tooling contract issue. The underlying Intelligence bug was later fixed in Automattic/intelligence#456, but this exposed a Data Machine tool-contract weakness.
Current behavior
DataMachine\Core\Steps\Fetch\Tools\SkipItemTool:
// Mark item as processed so it won't be refetched
do_action(
'datamachine_mark_item_processed',
$flow_step_id,
$source_type,
$item_identifier,
$job_id
);
$status = JobStatus::agentSkipped( $reason );
datamachine_merge_engine_data( $job_id, array( 'job_status' => $status->toString() ) );
ExecuteStepAbility then treats non-failed status overrides as completion and can mark processed again:
if ( str_starts_with( $status_override, JobStatus::FAILED ) === false ) {
$this->markCompletedItemProcessed( $job_id );
}
Desired model
Make the tool surface encode the state machine instead of relying on negative prompt constraints.
Add explicit terminal/disposition tools:
reject_source
Use when the AI determines the source itself is not worth processing.
Examples:
- irrelevant to the flow/topic
- too thin to support a durable page
- duplicate of existing durable knowledge
- spam/noise/transient chatter
- fails source-quality reasoning gates
Behavior:
- mark item processed
- complete job as
agent_rejected / agent_skipped - source-rejected or equivalent
- source should not be selected again by normal fresh-candidate flow
defer_item
Use when the AI is struggling or the runtime/tooling prevented a safe write, but the source may still be good.
Examples:
wiki_upsert/publish/update tool failed
- quality gate/tool contract mismatch
- model lacks enough confidence and wants a future attempt or human review
- provider/network instability after partial progress
- temporary dependency outage
Behavior:
- release source claim
- do not mark processed
- complete job as deferred/retryable/manual-review depending on input
- item remains eligible for future fetch/reprocess
Acceptance criteria
- Add tool(s) or explicit disposition contract so
reject_source marks processed and defer_item does not.
- Preserve backwards compatibility for existing
skip_item if needed, but steer new handler tool definitions toward explicit dispositions.
- Update tool descriptions so positive affordances are clear:
reject_source: reasoned content/source rejection.
defer_item: cannot safely complete now; keep item eligible.
- Add tests proving:
reject_source marks processed.
defer_item releases claim and leaves item unprocessed.
- generic/tool-error deferral does not permanently exclude an item.
- existing successful pipeline completion still marks processed normally.
- Consider whether failed required update/upsert tools should default to engine-level fail/retry instead of inviting the model to call a source-rejection tool.
Why this matters
This improves quality gating and self-policing without negative constraints like “do not call skip_item for tool errors.” The AI can reason about source quality, while the engine enforces the consequences of each explicit disposition.
Problem
skip_itemcurrently mixes two very different decisions:Because the tool is named and implemented as a generic skip, the model can turn transient/tooling failures into permanent source exclusion.
Runtime evidence
On
intelligence-chubes4, Flow 2 / Job 640:wiki_upsert.wiki_upsertfailed because of an Intelligence mounted target-path contract bug:{ "reason": "tool-error" }via
skip_item.That marked the source as processed even though the source was usable and the failure was a tooling contract issue. The underlying Intelligence bug was later fixed in Automattic/intelligence#456, but this exposed a Data Machine tool-contract weakness.
Current behavior
DataMachine\Core\Steps\Fetch\Tools\SkipItemTool:ExecuteStepAbilitythen treats non-failed status overrides as completion and can mark processed again:Desired model
Make the tool surface encode the state machine instead of relying on negative prompt constraints.
Add explicit terminal/disposition tools:
reject_sourceUse when the AI determines the source itself is not worth processing.
Examples:
Behavior:
agent_rejected/agent_skipped - source-rejectedor equivalentdefer_itemUse when the AI is struggling or the runtime/tooling prevented a safe write, but the source may still be good.
Examples:
wiki_upsert/publish/update tool failedBehavior:
Acceptance criteria
reject_sourcemarks processed anddefer_itemdoes not.skip_itemif needed, but steer new handler tool definitions toward explicit dispositions.reject_source: reasoned content/source rejection.defer_item: cannot safely complete now; keep item eligible.reject_sourcemarks processed.defer_itemreleases claim and leaves item unprocessed.Why this matters
This improves quality gating and self-policing without negative constraints like “do not call skip_item for tool errors.” The AI can reason about source quality, while the engine enforces the consequences of each explicit disposition.