From 0545d5d7c5d0f6a2a8617ed10f6051eea5065051 Mon Sep 17 00:00:00 2001
From: Zbigniew Sobiecki <zbigniew@sobiecki.name>
Date: Sat, 2 May 2026 20:00:35 +0000
Subject: [PATCH] fix(github): dedupe re-run workflows in getCheckSuiteStatus
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

ucho/PR #231 had its CI fully green at the latest attempt but cascade
dispatched respond-to-ci anyway, with triggerType=check-failure. Wasteful
agent run; respond-to-ci is the agent that fixes failing CI, so running it
on a green PR confuses the loop semantics.

Root cause: each CI workflow on the PR's head_sha ran TWICE on the same
SHA (push-then-rerun, or two pushes that resolved to the same SHA). The
first attempt of `Rebuild ucho-cli template` FAILED at 19:44:40; the
second attempt SUCCEEDED at 19:45:28. GitHub's `listWorkflowRunsForRepo`
returns BOTH workflow_runs. cascade's `getCheckSuiteStatus` iterated
both and concatenated their jobs — including the stale FAILURE record.

`check-suite-success.handle` (the fork from PR #1241/#1243) computes
`anyFailed = checkRuns.some(cr => cr.conclusion === 'failure' || ...)`.
With the stale failure in the list, `anyFailed=true` even though the PR
is green at the latest attempt. The handler mistakenly forks to
`dispatchRespondToCi(...)`. The triggerType=check-failure in the run
record confirms it came through this fork.

GitHub's `listJobsForWorkflowRun` accepts `filter='latest'` (default)
which dedupes job ATTEMPTS within a single workflow_run (the
"Re-run failed jobs" case). It does NOT dedupe across multiple
workflow_runs of the same workflow on the same SHA — which is what bit us.

Fix: dedupe `workflowRuns` by `workflow_id` BEFORE fetching jobs. GitHub
returns runs sorted by `created_at` desc, so the first occurrence per
workflow_id is the latest. Three-line addition in `getCheckSuiteStatus`
at the GitHub-client layer — every caller (`check-suite-success`,
`check-suite-failure`, anywhere else that asks "current state of CI?")
benefits without changing.

Why at the client layer:
- Source of truth match: GitHub's PR UI uses the latest attempt's status;
  cascade should match that.
- Centralization: the same dedup bug would otherwise surface separately
  per caller. Closing it once at the boundary eliminates the class.

Edge cases handled:
- Same `workflow_id` across different events (push vs pull_request) — keeps
  the most recent regardless of event.
- Different `workflow_id`s (e.g. CI + CodeQL) — both kept; new test pins
  this so over-aggressive dedup can't ship.
- No workflow runs — Map yields empty list, downstream code already
  handles `checkRuns.length === 0`.

Tests:
- New `dedupes workflow runs by workflow_id, keeping only the latest re-run`
  test directly pins the PR #231 incident: 2 workflow_runs same workflow_id,
  first failed second succeeded — assert allPassing=true and only 1
  check_run returned. Also asserts `listJobsForWorkflowRun` is NOT called
  for the older run (saves API quota and proves the dedup actually skipped
  the stale jobs).
- New `keeps separate workflow_ids distinct (CI vs CodeQL on same SHA)`
  guard against over-aggressive dedup.
- Updated `mockWorkflowRuns` helper to default `workflow_id` to the run id
  so existing tests (which don't care about workflow_id) keep semantically
  matching.
- Updated two pagination tests to include `workflow_id` so the dedup
  doesn't collapse multiple-runs-with-undefined-workflow_id to one.

Out of scope (follow-up):
- Audit `getFailedWorkflowRunJobs` (used by respond-to-ci agent) — different
  semantics (it WANTS to show what failed even if subsequently succeeded).
- Eventually-consistent GitHub API — the workflow-runs-list endpoint may
  take a few seconds to reflect a new attempt. Orthogonal to dedup.

Verification: vitest 7687/7687, typecheck clean, lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 src/github/client.ts             | 21 +++++++-
 tests/unit/github/client.test.ts | 89 ++++++++++++++++++++++++++++++--
 2 files changed, 104 insertions(+), 6 deletions(-)

diff --git a/src/github/client.ts b/src/github/client.ts
index ea98769e..82ad0554 100644
--- a/src/github/client.ts
+++ b/src/github/client.ts
@@ -316,9 +316,28 @@ export const githubClient = {
 			per_page: 100,
 		});
 
+		// Dedupe by workflow_id — keep only the most recent run per workflow.
+		// `listWorkflowRunsForRepo` returns runs sorted by `created_at` desc,
+		// so the first occurrence we see for a given workflow_id is the
+		// latest. Closes the bug where a failed-then-rerun workflow's stale
+		// FAILURE job (e.g. `Rebuild ucho-cli template` on ucho/PR #231)
+		// leaked into the aggregate-status query, made `anyFailed=true`, and
+		// caused the success handler to mistakenly fork to respond-to-ci on
+		// a PR whose CI was actually green at the LATEST attempt. The default
+		// `filter=latest` on `listJobsForWorkflowRun` only dedupes job
+		// attempts WITHIN a single workflow_run; it does not dedupe across
+		// multiple workflow_runs of the same workflow on the same SHA.
+		const latestRunByWorkflow = new Map<number, (typeof workflowRuns)[number]>();
+		for (const run of workflowRuns) {
+			if (!latestRunByWorkflow.has(run.workflow_id)) {
+				latestRunByWorkflow.set(run.workflow_id, run);
+			}
+		}
+		const dedupedRuns = [...latestRunByWorkflow.values()];
+
 		// Fetch jobs for each workflow run to get per-job granularity
 		const jobResults = await Promise.all(
-			workflowRuns.map((run) =>
+			dedupedRuns.map((run) =>
 				client.paginate(client.actions.listJobsForWorkflowRun, {
 					owner,
 					repo,
diff --git a/tests/unit/github/client.test.ts b/tests/unit/github/client.test.ts
index bf859cba..c64e6749 100644
--- a/tests/unit/github/client.test.ts
+++ b/tests/unit/github/client.test.ts
@@ -488,11 +488,18 @@ describe('githubClient', () => {
 
 	describe('getCheckSuiteStatus', () => {
 		function mockWorkflowRuns(
-			runs: { id: number }[],
+			runs: { id: number; workflow_id?: number }[],
 			jobsMap: Record<number, { name: string; status: string; conclusion: string | null }[]>,
 		) {
+			// Default workflow_id to the run id so each run looks like its own
+			// distinct workflow — matches existing test semantics. Tests that
+			// exercise the rerun-dedup behavior pass workflow_id explicitly.
+			const runsWithWorkflowId = runs.map((r) => ({
+				workflow_id: r.workflow_id ?? r.id,
+				...r,
+			}));
 			mockActions.listWorkflowRunsForRepo.mockResolvedValue({
-				data: { workflow_runs: runs },
+				data: { workflow_runs: runsWithWorkflowId },
 			});
 			mockActions.listJobsForWorkflowRun.mockImplementation(({ run_id }: { run_id: number }) => {
 				return Promise.resolve({
@@ -585,6 +592,72 @@ describe('githubClient', () => {
 			expect(result.totalCount).toBe(2);
 			expect(result.checkRuns).toHaveLength(2);
 		});
+
+		// PR #231 (ucho, 2026-05-02) regression pin. Same `head_sha` had TWO
+		// workflow_runs of the same workflow (push + manual re-run, or two
+		// pushes resolving to the same SHA). The first run's job FAILED;
+		// the re-run SUCCEEDED. GitHub's `listWorkflowRunsForRepo` returns
+		// BOTH runs (sorted desc by created_at). Pre-fix, the aggregate
+		// included the stale failure → `anyFailed=true` → check-suite-success
+		// mistakenly forked to dispatch respond-to-ci on a green PR.
+		it('dedupes workflow runs by workflow_id, keeping only the latest re-run', async () => {
+			// Two runs of the same workflow on the same SHA. Returned by GitHub
+			// in descending created_at order: id=2 (latest, success) first,
+			// id=1 (oldest, failed) second.
+			mockWorkflowRuns(
+				[
+					{ id: 2, workflow_id: 100 }, // latest re-run — success
+					{ id: 1, workflow_id: 100 }, // first attempt — failed
+				],
+				{
+					2: [{ name: 'Rebuild ucho-cli template', status: 'completed', conclusion: 'success' }],
+					1: [{ name: 'Rebuild ucho-cli template', status: 'completed', conclusion: 'failure' }],
+				},
+			);
+
+			const result = await withGitHubToken('test-token', () =>
+				githubClient.getCheckSuiteStatus('owner', 'repo', 'sha123'),
+			);
+
+			// Only the latest run's jobs should be in the result — the stale
+			// failure must NOT be counted.
+			expect(result.checkRuns).toHaveLength(1);
+			expect(result.checkRuns[0].conclusion).toBe('success');
+			expect(result.allPassing).toBe(true);
+			// And we should NOT have called listJobsForWorkflowRun for the
+			// older run (id=1) — that's what makes the dedup actually save
+			// API quota and avoid the bug.
+			expect(mockActions.listJobsForWorkflowRun).toHaveBeenCalledWith(
+				expect.objectContaining({ run_id: 2 }),
+			);
+			expect(mockActions.listJobsForWorkflowRun).not.toHaveBeenCalledWith(
+				expect.objectContaining({ run_id: 1 }),
+			);
+		});
+
+		it('keeps separate workflow_ids distinct (CI vs CodeQL on same SHA)', async () => {
+			// Two DIFFERENT workflows on the same SHA must both be kept —
+			// dedup is per-workflow_id, not per-SHA. Regression guard against
+			// over-aggressive dedup that would drop CodeQL because CI is also
+			// running.
+			mockWorkflowRuns(
+				[
+					{ id: 10, workflow_id: 100 },
+					{ id: 11, workflow_id: 200 },
+				],
+				{
+					10: [{ name: 'lint', status: 'completed', conclusion: 'success' }],
+					11: [{ name: 'codeql', status: 'completed', conclusion: 'success' }],
+				},
+			);
+
+			const result = await withGitHubToken('test-token', () =>
+				githubClient.getCheckSuiteStatus('owner', 'repo', 'sha123'),
+			);
+
+			expect(result.checkRuns).toHaveLength(2);
+			expect(result.allPassing).toBe(true);
+		});
 	});
 
 	describe('getPRDiff', () => {
@@ -676,7 +749,13 @@ describe('githubClient', () => {
 		});
 
 		it('getCheckSuiteStatus uses paginate for both workflow runs and jobs', async () => {
-			mockPaginate.mockResolvedValueOnce([{ id: 1 }, { id: 2 }]); // 2 runs
+			// Distinct workflow_ids so getCheckSuiteStatus's per-workflow dedup
+			// keeps both runs (otherwise both would dedupe to a single
+			// `undefined` workflow_id key — see PR #231 dedup fix).
+			mockPaginate.mockResolvedValueOnce([
+				{ id: 1, workflow_id: 100 },
+				{ id: 2, workflow_id: 200 },
+			]);
 			mockPaginate.mockResolvedValueOnce([
 				{ name: 'job1', status: 'completed', conclusion: 'success' },
 			]);
@@ -705,8 +784,8 @@ describe('githubClient', () => {
 
 		it('getFailedWorkflowRunJobs uses paginate for both workflow runs and jobs', async () => {
 			mockPaginate.mockResolvedValueOnce([
-				{ id: 1, name: 'CI', conclusion: 'failure' },
-				{ id: 2, name: 'OK', conclusion: 'success' },
+				{ id: 1, workflow_id: 100, name: 'CI', conclusion: 'failure' },
+				{ id: 2, workflow_id: 200, name: 'OK', conclusion: 'success' },
 			]);
 			mockPaginate.mockResolvedValueOnce([
 				{ name: 'lint', conclusion: 'failure', steps: [{ name: 'eslint', conclusion: 'failure' }] },