fix(github): dedupe re-run workflows in getCheckSuiteStatus by zbigniewsobiecki · Pull Request #1254 · mongrel-intelligence/cascade

zbigniewsobiecki · 2026-05-02T20:01:58Z

Live regression — wrong fork dispatched respond-to-ci on a green PR

ucho/PR #231 (CI fully green at the latest attempt) dispatched `respond-to-ci` at 19:49:41 with `triggerType=check-failure`, even though the user could see the PR was green. Wasteful agent run; `respond-to-ci` is the agent that fixes failing CI, so running it on a green PR confuses the loop. (I cancelled the stuck run and the worker hit its watchdog at 5m16s — message: "Agent Timeout".)

Root cause

`gh pr view 231` showed each CI workflow had run TWICE on the same head_sha. The first attempt of `Rebuild ucho-cli template` FAILED at 19:44:40; the second SUCCEEDED at 19:45:28. PR's overall UI status used the latest attempt → green. But cascade's API call returned BOTH attempts.

`src/github/client.ts:306-355` `getCheckSuiteStatus`:

Calls `listWorkflowRunsForRepo({head_sha})` — returns ALL workflow_runs including failed-then-rerun ones.
For each run, calls `listJobsForWorkflowRun(run.id)`.
Concatenates all jobs into `checkRuns`.

Result: 2 workflow_runs × N jobs each. The first run's failed `Rebuild ucho-cli template` job appears alongside the second run's successful one.

`check-suite-success.handle` (the fork from PR #1241/#1243) computes:
```ts
const anyFailed = checkRuns.some(cr => cr.conclusion === 'failure' || ...);
if (allComplete && anyFailed) return dispatchRespondToCi(...); // ← WRONG on green PR
```

`anyFailed=true` because of the stale failure. Mistakenly forks. The `triggerType: 'check-failure'` in the run record confirms it.

Why `filter=latest` on `listJobsForWorkflowRun` didn't help

GitHub's `listJobsForWorkflowRun` accepts `filter='latest' | 'all'` (default: `latest`). That filter dedupes job ATTEMPTS within a SINGLE workflow_run (the "Re-run failed jobs" case). It does NOT dedupe across multiple workflow_runs of the same workflow.

In PR #231's case, the two attempts came from TWO DIFFERENT workflow_runs (each a complete run). Each run's `filter=latest` is "latest within ITS run" — which is the only run for that run. Both runs' jobs are returned.

The fix

Dedupe `workflowRuns` by `workflow_id` BEFORE fetching jobs. GitHub returns runs sorted by `created_at` desc, so the first occurrence per workflow_id is the latest.

```ts
const latestRunByWorkflow = new Map<number, (typeof workflowRuns)[number]>();
for (const run of workflowRuns) {
if (!latestRunByWorkflow.has(run.workflow_id)) {
latestRunByWorkflow.set(run.workflow_id, run);
}
}
const dedupedRuns = [...latestRunByWorkflow.values()];
```

3-line addition. The downstream `anyFailed` / `allPassing` computations and the `check-suite-success.handle` fork all become correct without changing.

Why at the github-client layer

Source of truth match: GitHub's PR UI uses the latest attempt's status; cascade should match that.
Centralization: every caller of `getCheckSuiteStatus` (`check-suite-success`, `check-suite-failure`, anywhere else asking "current state of CI?") benefits from the dedup without changing.

Edge cases handled

Same `workflow_id` across different events (push vs pull_request) — keeps the most recent regardless of event. Matches what the PR UI shows.
Different `workflow_id`s on the same SHA (CI + CodeQL + E2B) — different keys, all kept. New `keeps separate workflow_ids distinct` test pins this so over-aggressive dedup can't slip in.
No workflow runs: empty Map → empty list → downstream `checkRuns.length === 0` handling unchanged.

Tests

NEW `dedupes workflow runs by workflow_id, keeping only the latest re-run` — direct PR fix: install Claude Code CLI in worker image #231 regression pin. Two workflow_runs same workflow_id, first failed second succeeded → `allPassing=true`, only 1 check_run, `listJobsForWorkflowRun` NOT called for the older run (proves the dedup actually skipped the stale jobs).
NEW `keeps separate workflow_ids distinct (CI vs CodeQL on same SHA)` — over-aggressive dedup guard.
Updated `mockWorkflowRuns` helper to default `workflow_id` to the run id (existing tests don't care; this lets them keep semantically matching).
Updated 2 pagination tests to include `workflow_id` so dedup doesn't collapse undefined-keyed runs.

Out of scope (intentionally — flag for follow-up)

Audit `getFailedWorkflowRunJobs` (used by respond-to-ci agent for failure context). Different semantics — it WANTS to surface what failed even if subsequently succeeded. Different audit; not this PR.
Eventually-consistent GitHub API: workflow-runs-list endpoint may take a few seconds to reflect a new attempt. Orthogonal.

Test plan

`npx vitest run --project unit-core tests/unit/github/client.test.ts` → 53/53
`npx vitest run --project unit-triggers --project unit-api --project unit-core` → 7687/7687
`npm run typecheck` clean
`npm run lint` clean
CI green
Reviewer approval (@nhopeatall)
After deploy: open a fresh cascade-impl PR; manually re-run a CI job from the GitHub UI to create the multi-run-same-SHA scenario. On the LAST `check_suite.completed` event, Loki should show `Check-suite success trigger matched — dispatching review` (NOT the `dispatchRespondToCi` log line). New run record should be `agentType=review`, NOT `respond-to-ci`.

🤖 Generated with Claude Code

ucho/PR #231 had its CI fully green at the latest attempt but cascade dispatched respond-to-ci anyway, with triggerType=check-failure. Wasteful agent run; respond-to-ci is the agent that fixes failing CI, so running it on a green PR confuses the loop semantics. Root cause: each CI workflow on the PR's head_sha ran TWICE on the same SHA (push-then-rerun, or two pushes that resolved to the same SHA). The first attempt of `Rebuild ucho-cli template` FAILED at 19:44:40; the second attempt SUCCEEDED at 19:45:28. GitHub's `listWorkflowRunsForRepo` returns BOTH workflow_runs. cascade's `getCheckSuiteStatus` iterated both and concatenated their jobs — including the stale FAILURE record. `check-suite-success.handle` (the fork from PR #1241/#1243) computes `anyFailed = checkRuns.some(cr => cr.conclusion === 'failure' || ...)`. With the stale failure in the list, `anyFailed=true` even though the PR is green at the latest attempt. The handler mistakenly forks to `dispatchRespondToCi(...)`. The triggerType=check-failure in the run record confirms it came through this fork. GitHub's `listJobsForWorkflowRun` accepts `filter='latest'` (default) which dedupes job ATTEMPTS within a single workflow_run (the "Re-run failed jobs" case). It does NOT dedupe across multiple workflow_runs of the same workflow on the same SHA — which is what bit us. Fix: dedupe `workflowRuns` by `workflow_id` BEFORE fetching jobs. GitHub returns runs sorted by `created_at` desc, so the first occurrence per workflow_id is the latest. Three-line addition in `getCheckSuiteStatus` at the GitHub-client layer — every caller (`check-suite-success`, `check-suite-failure`, anywhere else that asks "current state of CI?") benefits without changing. Why at the client layer: - Source of truth match: GitHub's PR UI uses the latest attempt's status; cascade should match that. - Centralization: the same dedup bug would otherwise surface separately per caller. Closing it once at the boundary eliminates the class. Edge cases handled: - Same `workflow_id` across different events (push vs pull_request) — keeps the most recent regardless of event. - Different `workflow_id`s (e.g. CI + CodeQL) — both kept; new test pins this so over-aggressive dedup can't ship. - No workflow runs — Map yields empty list, downstream code already handles `checkRuns.length === 0`. Tests: - New `dedupes workflow runs by workflow_id, keeping only the latest re-run` test directly pins the PR #231 incident: 2 workflow_runs same workflow_id, first failed second succeeded — assert allPassing=true and only 1 check_run returned. Also asserts `listJobsForWorkflowRun` is NOT called for the older run (saves API quota and proves the dedup actually skipped the stale jobs). - New `keeps separate workflow_ids distinct (CI vs CodeQL on same SHA)` guard against over-aggressive dedup. - Updated `mockWorkflowRuns` helper to default `workflow_id` to the run id so existing tests (which don't care about workflow_id) keep semantically matching. - Updated two pagination tests to include `workflow_id` so the dedup doesn't collapse multiple-runs-with-undefined-workflow_id to one. Out of scope (follow-up): - Audit `getFailedWorkflowRunJobs` (used by respond-to-ci agent) — different semantics (it WANTS to show what failed even if subsequently succeeded). - Eventually-consistent GitHub API — the workflow-runs-list endpoint may take a few seconds to reflect a new attempt. Orthogonal to dedup. Verification: vitest 7687/7687, typecheck clean, lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nhopeatall

LGTM — Correct, minimal fix for a real production regression. The dedup-by-workflow_id approach correctly matches GitHub PR UI semantics (latest attempt wins), is placed at the right layer (github client, so all callers benefit), and the tests pin both the regression case and the over-aggressive-dedup guard.

🕵️ claude-code · claude-opus-4-6 · run details

codecov · 2026-05-02T20:06:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

zbigniewsobiecki requested a review from nhopeatall May 2, 2026 20:01

nhopeatall approved these changes May 2, 2026

View reviewed changes

zbigniewsobiecki merged commit a68c3a7 into dev May 2, 2026
9 checks passed

zbigniewsobiecki deleted the fix/check-suite-dedupe-rerun-workflows branch May 2, 2026 20:06

zbigniewsobiecki mentioned this pull request May 2, 2026

Promote dev to main: getCheckSuiteStatus dedupes re-run workflows #1255

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(github): dedupe re-run workflows in getCheckSuiteStatus#1254

fix(github): dedupe re-run workflows in getCheckSuiteStatus#1254
zbigniewsobiecki merged 1 commit intodevfrom
fix/check-suite-dedupe-rerun-workflows

zbigniewsobiecki commented May 2, 2026

Uh oh!

nhopeatall left a comment

Uh oh!

codecov Bot commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zbigniewsobiecki commented May 2, 2026

Live regression — wrong fork dispatched respond-to-ci on a green PR

Root cause

Why `filter=latest` on `listJobsForWorkflowRun` didn't help

The fix

Why at the github-client layer

Edge cases handled

Tests

Out of scope (intentionally — flag for follow-up)

Test plan

Uh oh!

nhopeatall left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 2, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants