From 0545d5d7c5d0f6a2a8617ed10f6051eea5065051 Mon Sep 17 00:00:00 2001 From: Zbigniew Sobiecki Date: Sat, 2 May 2026 20:00:35 +0000 Subject: [PATCH] fix(github): dedupe re-run workflows in getCheckSuiteStatus MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ucho/PR #231 had its CI fully green at the latest attempt but cascade dispatched respond-to-ci anyway, with triggerType=check-failure. Wasteful agent run; respond-to-ci is the agent that fixes failing CI, so running it on a green PR confuses the loop semantics. Root cause: each CI workflow on the PR's head_sha ran TWICE on the same SHA (push-then-rerun, or two pushes that resolved to the same SHA). The first attempt of `Rebuild ucho-cli template` FAILED at 19:44:40; the second attempt SUCCEEDED at 19:45:28. GitHub's `listWorkflowRunsForRepo` returns BOTH workflow_runs. cascade's `getCheckSuiteStatus` iterated both and concatenated their jobs — including the stale FAILURE record. `check-suite-success.handle` (the fork from PR #1241/#1243) computes `anyFailed = checkRuns.some(cr => cr.conclusion === 'failure' || ...)`. With the stale failure in the list, `anyFailed=true` even though the PR is green at the latest attempt. The handler mistakenly forks to `dispatchRespondToCi(...)`. The triggerType=check-failure in the run record confirms it came through this fork. GitHub's `listJobsForWorkflowRun` accepts `filter='latest'` (default) which dedupes job ATTEMPTS within a single workflow_run (the "Re-run failed jobs" case). It does NOT dedupe across multiple workflow_runs of the same workflow on the same SHA — which is what bit us. Fix: dedupe `workflowRuns` by `workflow_id` BEFORE fetching jobs. GitHub returns runs sorted by `created_at` desc, so the first occurrence per workflow_id is the latest. Three-line addition in `getCheckSuiteStatus` at the GitHub-client layer — every caller (`check-suite-success`, `check-suite-failure`, anywhere else that asks "current state of CI?") benefits without changing. Why at the client layer: - Source of truth match: GitHub's PR UI uses the latest attempt's status; cascade should match that. - Centralization: the same dedup bug would otherwise surface separately per caller. Closing it once at the boundary eliminates the class. Edge cases handled: - Same `workflow_id` across different events (push vs pull_request) — keeps the most recent regardless of event. - Different `workflow_id`s (e.g. CI + CodeQL) — both kept; new test pins this so over-aggressive dedup can't ship. - No workflow runs — Map yields empty list, downstream code already handles `checkRuns.length === 0`. Tests: - New `dedupes workflow runs by workflow_id, keeping only the latest re-run` test directly pins the PR #231 incident: 2 workflow_runs same workflow_id, first failed second succeeded — assert allPassing=true and only 1 check_run returned. Also asserts `listJobsForWorkflowRun` is NOT called for the older run (saves API quota and proves the dedup actually skipped the stale jobs). - New `keeps separate workflow_ids distinct (CI vs CodeQL on same SHA)` guard against over-aggressive dedup. - Updated `mockWorkflowRuns` helper to default `workflow_id` to the run id so existing tests (which don't care about workflow_id) keep semantically matching. - Updated two pagination tests to include `workflow_id` so the dedup doesn't collapse multiple-runs-with-undefined-workflow_id to one. Out of scope (follow-up): - Audit `getFailedWorkflowRunJobs` (used by respond-to-ci agent) — different semantics (it WANTS to show what failed even if subsequently succeeded). - Eventually-consistent GitHub API — the workflow-runs-list endpoint may take a few seconds to reflect a new attempt. Orthogonal to dedup. Verification: vitest 7687/7687, typecheck clean, lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) --- src/github/client.ts | 21 +++++++- tests/unit/github/client.test.ts | 89 ++++++++++++++++++++++++++++++-- 2 files changed, 104 insertions(+), 6 deletions(-) diff --git a/src/github/client.ts b/src/github/client.ts index ea98769e..82ad0554 100644 --- a/src/github/client.ts +++ b/src/github/client.ts @@ -316,9 +316,28 @@ export const githubClient = { per_page: 100, }); + // Dedupe by workflow_id — keep only the most recent run per workflow. + // `listWorkflowRunsForRepo` returns runs sorted by `created_at` desc, + // so the first occurrence we see for a given workflow_id is the + // latest. Closes the bug where a failed-then-rerun workflow's stale + // FAILURE job (e.g. `Rebuild ucho-cli template` on ucho/PR #231) + // leaked into the aggregate-status query, made `anyFailed=true`, and + // caused the success handler to mistakenly fork to respond-to-ci on + // a PR whose CI was actually green at the LATEST attempt. The default + // `filter=latest` on `listJobsForWorkflowRun` only dedupes job + // attempts WITHIN a single workflow_run; it does not dedupe across + // multiple workflow_runs of the same workflow on the same SHA. + const latestRunByWorkflow = new Map(); + for (const run of workflowRuns) { + if (!latestRunByWorkflow.has(run.workflow_id)) { + latestRunByWorkflow.set(run.workflow_id, run); + } + } + const dedupedRuns = [...latestRunByWorkflow.values()]; + // Fetch jobs for each workflow run to get per-job granularity const jobResults = await Promise.all( - workflowRuns.map((run) => + dedupedRuns.map((run) => client.paginate(client.actions.listJobsForWorkflowRun, { owner, repo, diff --git a/tests/unit/github/client.test.ts b/tests/unit/github/client.test.ts index bf859cba..c64e6749 100644 --- a/tests/unit/github/client.test.ts +++ b/tests/unit/github/client.test.ts @@ -488,11 +488,18 @@ describe('githubClient', () => { describe('getCheckSuiteStatus', () => { function mockWorkflowRuns( - runs: { id: number }[], + runs: { id: number; workflow_id?: number }[], jobsMap: Record, ) { + // Default workflow_id to the run id so each run looks like its own + // distinct workflow — matches existing test semantics. Tests that + // exercise the rerun-dedup behavior pass workflow_id explicitly. + const runsWithWorkflowId = runs.map((r) => ({ + workflow_id: r.workflow_id ?? r.id, + ...r, + })); mockActions.listWorkflowRunsForRepo.mockResolvedValue({ - data: { workflow_runs: runs }, + data: { workflow_runs: runsWithWorkflowId }, }); mockActions.listJobsForWorkflowRun.mockImplementation(({ run_id }: { run_id: number }) => { return Promise.resolve({ @@ -585,6 +592,72 @@ describe('githubClient', () => { expect(result.totalCount).toBe(2); expect(result.checkRuns).toHaveLength(2); }); + + // PR #231 (ucho, 2026-05-02) regression pin. Same `head_sha` had TWO + // workflow_runs of the same workflow (push + manual re-run, or two + // pushes resolving to the same SHA). The first run's job FAILED; + // the re-run SUCCEEDED. GitHub's `listWorkflowRunsForRepo` returns + // BOTH runs (sorted desc by created_at). Pre-fix, the aggregate + // included the stale failure → `anyFailed=true` → check-suite-success + // mistakenly forked to dispatch respond-to-ci on a green PR. + it('dedupes workflow runs by workflow_id, keeping only the latest re-run', async () => { + // Two runs of the same workflow on the same SHA. Returned by GitHub + // in descending created_at order: id=2 (latest, success) first, + // id=1 (oldest, failed) second. + mockWorkflowRuns( + [ + { id: 2, workflow_id: 100 }, // latest re-run — success + { id: 1, workflow_id: 100 }, // first attempt — failed + ], + { + 2: [{ name: 'Rebuild ucho-cli template', status: 'completed', conclusion: 'success' }], + 1: [{ name: 'Rebuild ucho-cli template', status: 'completed', conclusion: 'failure' }], + }, + ); + + const result = await withGitHubToken('test-token', () => + githubClient.getCheckSuiteStatus('owner', 'repo', 'sha123'), + ); + + // Only the latest run's jobs should be in the result — the stale + // failure must NOT be counted. + expect(result.checkRuns).toHaveLength(1); + expect(result.checkRuns[0].conclusion).toBe('success'); + expect(result.allPassing).toBe(true); + // And we should NOT have called listJobsForWorkflowRun for the + // older run (id=1) — that's what makes the dedup actually save + // API quota and avoid the bug. + expect(mockActions.listJobsForWorkflowRun).toHaveBeenCalledWith( + expect.objectContaining({ run_id: 2 }), + ); + expect(mockActions.listJobsForWorkflowRun).not.toHaveBeenCalledWith( + expect.objectContaining({ run_id: 1 }), + ); + }); + + it('keeps separate workflow_ids distinct (CI vs CodeQL on same SHA)', async () => { + // Two DIFFERENT workflows on the same SHA must both be kept — + // dedup is per-workflow_id, not per-SHA. Regression guard against + // over-aggressive dedup that would drop CodeQL because CI is also + // running. + mockWorkflowRuns( + [ + { id: 10, workflow_id: 100 }, + { id: 11, workflow_id: 200 }, + ], + { + 10: [{ name: 'lint', status: 'completed', conclusion: 'success' }], + 11: [{ name: 'codeql', status: 'completed', conclusion: 'success' }], + }, + ); + + const result = await withGitHubToken('test-token', () => + githubClient.getCheckSuiteStatus('owner', 'repo', 'sha123'), + ); + + expect(result.checkRuns).toHaveLength(2); + expect(result.allPassing).toBe(true); + }); }); describe('getPRDiff', () => { @@ -676,7 +749,13 @@ describe('githubClient', () => { }); it('getCheckSuiteStatus uses paginate for both workflow runs and jobs', async () => { - mockPaginate.mockResolvedValueOnce([{ id: 1 }, { id: 2 }]); // 2 runs + // Distinct workflow_ids so getCheckSuiteStatus's per-workflow dedup + // keeps both runs (otherwise both would dedupe to a single + // `undefined` workflow_id key — see PR #231 dedup fix). + mockPaginate.mockResolvedValueOnce([ + { id: 1, workflow_id: 100 }, + { id: 2, workflow_id: 200 }, + ]); mockPaginate.mockResolvedValueOnce([ { name: 'job1', status: 'completed', conclusion: 'success' }, ]); @@ -705,8 +784,8 @@ describe('githubClient', () => { it('getFailedWorkflowRunJobs uses paginate for both workflow runs and jobs', async () => { mockPaginate.mockResolvedValueOnce([ - { id: 1, name: 'CI', conclusion: 'failure' }, - { id: 2, name: 'OK', conclusion: 'success' }, + { id: 1, workflow_id: 100, name: 'CI', conclusion: 'failure' }, + { id: 2, workflow_id: 200, name: 'OK', conclusion: 'success' }, ]); mockPaginate.mockResolvedValueOnce([ { name: 'lint', conclusion: 'failure', steps: [{ name: 'eslint', conclusion: 'failure' }] },