Skip to content

Avoid duplicating loop-mode queue entries during job retry and stuck recovery #1810

@chubes4

Description

@chubes4

Problem

Recovering or manually retrying a job that consumed a queued config patch from a queue_mode: loop flow can duplicate the queue entry.

Observed on intelligence-chubes4 Flow 2 (WordPress.com wiki: MGS history queue):

  1. Flow 2 fetch step uses config_patch_queue with queue_mode: loop.
  2. Job 638 consumed a config patch and became stuck in processing.
  3. Running:
    studio wp datamachine jobs recover-stuck --flow=2
    reported Requeued: 1.
  4. Flow 2 then had a duplicate performance-mgs config patch.

Root cause

Loop-mode queue consumption already rotates the consumed entry back to the tail:

inc/Abilities/Flow/QueueAbility.php

// Drain or loop: pop the head, optionally rotate.
$entry = array_shift( $queue );

if ( 'loop' === $queue_mode ) {
    $queue[] = $entry;
}

But stuck-job recovery blindly appends the backup again:

inc/Abilities/Job/RecoverStuckJobsAbility.php

$flow_config[ $step_id ][ $slot ][] = $entry;

Manual retry has the same blind append pattern:

inc/Abilities/Job/RetryJobAbility.php

$flow_config[ $step_id ][ $slot ][] = $entry;

For queue_mode: loop, this turns one logical queued patch into two physical entries.

Scope

Fix Data Machine core queue restoration so retry/recovery does not duplicate loop-mode queue entries.

Likely affected files:

  • inc/Abilities/Job/RecoverStuckJobsAbility.php
  • inc/Abilities/Job/RetryJobAbility.php
  • tests covering queued prompt/config patch backup restoration

Desired behavior

  • drain mode: restore consumed prompt/config patch on retry/recovery because the item was removed from the queue.
  • loop mode: do not append a duplicate because consume already rotated the item back into the queue.
  • static mode: no restoration needed; static mode does not mutate queue state.
  • Both prompt queues and config patch queues should behave consistently.
  • Manual retry and recover-stuck should share the same semantics.

Acceptance criteria

  • Tests prove recover-stuck does not duplicate config_patch_queue entries for queue_mode: loop.
  • Tests prove manual retry does not duplicate config_patch_queue entries for queue_mode: loop.
  • Tests prove drain mode still restores consumed entries.
  • If code is duplicated between retry and recovery, extract a small shared helper or otherwise keep the semantics identical.
  • No source-specific workaround for Intelligence/MGS/Flow 2.

Runtime evidence

Flow 2 job 638 had a queued_prompt_backup for config_patch_queue and mode: loop. Recovery appended that backup even though the original loop consume had already rotated the entry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions