Skip to content

Report and reconcile bundle-preserved queue drift #1781

@chubes4

Description

@chubes4

Problem

Agent bundle install/diff currently handles portable pipeline/flow ownership, but it hides an important runtime drift class for self-operating flows: preserved queue/runtime state can differ materially from the bundle seed queue.

This surfaced while trying to make the WordPress.com Intelligence brain self-operating on intelligence-chubes4.

The WordPress.com brain bundle installed successfully:

studio wp datamachine agent install /Users/chubes/Developer/intelligence/setup/agent-bundles/wordpress-com-wiki --yes --format=json
studio wp datamachine agent diff /Users/chubes/Developer/intelligence/setup/agent-bundles/wordpress-com-wiki --format=json

After install, diff reported all 7 artifacts as no_op, so portable ownership worked.

But flow 2 (wordpress-com-history-mgs-loop-queue) retained old runtime queue state from pre-bundle manual experiments:

  • queue_mode: drain
  • stale config_patch_queue entries with max_items: 50
  • broad query patches such as WordPress.com performance incident, WordPress.com support help docs, etc.

The bundle seed expects a safer autonomous loop:

  • queue_mode: loop
  • small seed patches capped at max_items: 5
  • reviewed brain query-set metadata

Because queue_policy is currently create_seed_upgrade_preserve_existing, the install preserved the old queue, and the subsequent run consumed the stale broad entries. The parent job failed because all 10 child jobs failed; child failures included:

  • ai_processing_failed
  • empty_data_packet_returned
  • Cannot generate from an empty prompt. Add content using withText() or similar methods.

Why this matters

The preserve-existing default is safe for generic upgrades, but autonomous brain bundles need a reviewable adoption path when live runtime queues differ from bundle seeds.

A second agent diff saying no_op is misleading if the installed artifact is materially using stale preserved queue state. Operators need to see that difference and choose what to do.

Desired behavior

Data Machine should explicitly detect and report preserved runtime queue/scheduling drift for bundle-owned flows.

At minimum, datamachine agent diff should show a warning or needs_approval item when:

  • existing flow runtime queue fields differ from the bundle seed queue;
  • existing queue_mode differs from the bundle queue mode;
  • existing scheduling differs from bundle scheduling in a way that prevents autonomous operation;
  • preserved queue entries are broad/high-volume relative to the bundle seed.

Then install/sync should expose an explicit policy for queue reconciliation, such as:

  • preserve_existing — current behavior;
  • replace_with_bundle_seed — replace queue/runtime fields with bundle artifact seed;
  • archive_existing_then_replace — move old queue entries into audit metadata/log/PendingAction, then apply bundle seed;
  • dry-run preview for each mode.

Names are flexible; the important contract is that replacing runtime queue state must be explicit and previewable.

Acceptance criteria

  • datamachine agent diff <bundle> --format=json reports queue/runtime drift instead of treating a flow as plain no_op when preserved queue state differs from the bundle seed.
  • Diff output includes enough detail for operators to see current queue depth, current queue mode, bundle queue depth, bundle queue mode, and scheduling differences.
  • datamachine agent install <bundle> --dry-run --format=json previews the same queue reconciliation decision.
  • There is an explicit non-interactive install option or policy to replace preserved queue state with the bundle seed.
  • There is a safe option to archive existing queue state before replacement, or an equivalent audit trail.
  • Existing default behavior remains safe: installs do not silently wipe live queues.
  • Focused smoke coverage proves:
    • existing flow with matching portable slug but stale queue is reported as drift;
    • default install preserves queue;
    • explicit replace/archive mode applies the bundle seed queue;
    • second diff after explicit reconcile reports no queue drift.

Related robustness issue

The same run also showed AI child jobs failing hard when fetched items produced empty data packets/prompts:

Cannot generate from an empty prompt. Add content using withText() or similar methods.
empty_data_packet_returned

If that is a separate concern, split it out, but autonomous ingestion should fail soft for empty/thin fetched data where possible (completed_no_items or agent skip) rather than causing an entire parent batch to fail.

AI assistance

  • AI assistance: Yes
  • Tool(s): OpenCode (GPT-5.5)
  • Used for: Diagnosed the WordPress.com brain bundle adoption failure on intelligence-chubes4 and drafted this issue. Chris remains responsible for prioritization and merge decisions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions