Skip to content

fix(triggers): read REDIS_URL via routerConfig so dedup survives envScrub#1250

Merged
zbigniewsobiecki merged 1 commit intodevfrom
fix/review-dedup-survives-env-scrub
May 1, 2026
Merged

fix(triggers): read REDIS_URL via routerConfig so dedup survives envScrub#1250
zbigniewsobiecki merged 1 commit intodevfrom
fix/review-dedup-survives-env-scrub

Conversation

@zbigniewsobiecki
Copy link
Copy Markdown
Member

Live prod regression — fixes silent post-completion-hook failure in worker processes

PR #1248 shipped Redis-backed dedup. It works in cascade-router but throws inside cascade-worker processes with:

```
ERROR Review-dispatch dedup Redis call failed — failing closed
error: 'Error: REDIS_URL is required for review-dispatch dedup'
```

Confirmed live at 2026-05-01T17:25:44 in worker `cascade-worker-coalesce_ucho_MNG-461_*`. Sentry tag: `review_dedup_redis_down`.

Root cause

`src/utils/envScrub.ts:13-18` lists `REDIS_URL` in `SENSITIVE_ENV_KEYS`. Every cascade-worker calls `scrubSensitiveEnv()` early in startup (after `getDb()` initializes the DB pool), which does `delete process.env.REDIS_URL` — intentional, to keep infra secrets out of agent-spawned subprocesses.

The dedup module reads `process.env.REDIS_URL` lazily at first dispatch — well after the scrub. Sees `undefined`. Throws. Fail-closed handler returns false. Post-completion-hook logs `Skipping post-completion review: already dispatched` — which is the same log line the working dedup emits when correctly rejecting a duplicate, hence the bug evaded the prod canary until Loki searched specifically for `REDIS_URL is required`.

Why router is unaffected, why this didn't bite immediately

  • cascade-router never calls scrubSensitiveEnv. Its `process.env.REDIS_URL` stays set; check-suite-success path works correctly. PR refactor(agents): extract shared GitHub agent abstraction #196's verified-clean review came through this path.
  • The post-completion-hook already short-circuits when CI isn't all passing at impl exit (`agent-execution.ts:269-275`). The bug only triggers on the narrow timing where CI is fully green BEFORE impl finishes. Small fraction of runs.
  • Infra is fine — verified `~/Code/infra/compose/docker-compose.yaml` correctly forwards `REDIS_URL=redis://redis:6379` to cascade-router (line 52, 76) and `REDIS_URL=redis://redis-dev:6379` to cascade-router-dev (line 124, 148). No infra change needed.

The fix

Read the URL via `routerConfig.redisUrl` instead of `process.env.REDIS_URL`. `routerConfig` is captured at module-load time in `src/router/config.ts:137-138` — well before `scrubSensitiveEnv()` runs. The cached value survives the scrub, exactly the pattern envScrub.ts's docstring describes for the DB pool.

Two-line source change in `src/triggers/github/review-dispatch-dedup.ts`:

  • `import { routerConfig } from '../../router/config.js';` at the top.
  • Replace `process.env.REDIS_URL` with `routerConfig.redisUrl` in `getRedis()`.

JSDoc rewritten to explain the scrub interaction.

Tests

New regression-pin test in `review-dispatch-dedup.test.ts`:

```ts
it('claims successfully even after process.env.REDIS_URL is deleted', async () => {
// 1) claim once with REDIS_URL set — succeeds (creates singleton).
// 2) delete process.env.REDIS_URL (simulating envScrub).
// 3) claim again with the SAME key → false (dedup hit, NOT throw).
// 4) claim with DIFFERENT key → true (succeeds despite scrubbed env).
});
```

Pre-PR-#1250 this test fails on step 3 with `Error: REDIS_URL is required`. Post-fix it passes cleanly.

Out of scope (intentionally — follow-up)

  • `src/queue/cancel.ts` has the same lazy-read pattern in `getPublisher()`/`getSubscriber()`. Only called from cascade-router and cascade-dashboard (neither scrubs env), so it doesn't bite today. Apply the same routerConfig swap in a separate atomic PR for symmetry.
  • Broader `SENSITIVE_ENV_KEYS` audit — the list is short (`CREDENTIAL_MASTER_KEY`, `DATABASE_URL`, `DATABASE_SSL`, `REDIS_URL`); a follow-up audit can confirm everything else uses the `routerConfig` cache or its own module-load capture.

Test plan

  • `npx vitest run --project unit-triggers tests/unit/triggers/github/review-dispatch-dedup.test.ts` → 16/16 (including new pin)
  • `npx vitest run --project unit-triggers --project unit-api` → 2497/2497
  • `npm run typecheck` clean
  • `npm run lint` clean (13 pre-existing warnings, none on changed files)
  • CI green
  • Reviewer approval (@nhopeatall)
  • After deploy: Loki search for `REDIS_URL is required for review-dispatch dedup` since deploy → 0 hits within 1 hour.
  • After deploy: positive signal — Loki shows `Claimed review dispatch for PR+SHA { trigger: 'post-completion-hook' }` from a worker container, confirming the post-completion-hook is alive again.

🤖 Generated with Claude Code

…crub

PR #1248's Redis-backed dedup throws inside cascade-worker processes:

  ERROR Review-dispatch dedup Redis call failed — failing closed
    error: 'Error: REDIS_URL is required for review-dispatch dedup'

Confirmed live at 2026-05-01T17:25:44 in worker
`cascade-worker-coalesce_ucho_MNG-461_*`. Sentry tag: `review_dedup_redis_down`.

Root cause: `src/utils/envScrub.ts:13-18` lists REDIS_URL in
`SENSITIVE_ENV_KEYS`. Worker-entry calls `scrubSensitiveEnv()` early in
startup, which does `delete process.env.REDIS_URL` to keep infra secrets
out of agent-spawned subprocesses. The dedup module reads
`process.env.REDIS_URL` lazily at first dispatch — well after the scrub —
so it sees `undefined` and throws. The fail-closed handler returns false,
and the post-completion-hook silently logs `Skipping post-completion
review: already dispatched` (which looks identical to the working dedup's
"correctly rejected duplicate" log, hence the bug evaded the prod canary).

Why it didn't bite immediately: the post-completion-hook short-circuits
when CI isn't all passing at impl exit
(`agent-execution.ts:269-275`), so the bug only triggers on the narrow
timing where CI is fully green BEFORE impl finishes. Small fraction of
runs.

Why router is unaffected: cascade-router never calls scrubSensitiveEnv.
PR #196's review (verified 1-run earlier) was dispatched from the router
via check-suite-success — that path was never broken.

Fix: read the URL via `routerConfig.redisUrl` instead of
`process.env.REDIS_URL`. routerConfig is captured at module-load time in
`src/router/config.ts:137-138`, well before scrubSensitiveEnv runs. The
cached value survives the scrub the same way the DB pool's cached
connection string does — exactly the pattern envScrub.ts's docstring
describes.

Two-line source change + new regression-pin test that claims, deletes
process.env.REDIS_URL, and claims again — both must succeed (second
deduped) without throwing `REDIS_URL is required`.

Verification: vitest 2497/2497, typecheck clean, lint clean.

Out of scope (follow-up): `src/queue/cancel.ts` has the same
`process.env.REDIS_URL` lazy pattern but is only called from cascade-router
and cascade-dashboard (neither scrubs env), so it doesn't bite today.
Apply the same routerConfig swap for symmetry in a separate PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@nhopeatall nhopeatall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — minimal, well-targeted fix for a real production regression.

The root cause analysis is correct: routerConfig.redisUrl is captured at module-load time (config.ts:137-138) which is before scrubSensitiveEnv() runs (worker-entry.ts:480), so the cached value survives the scrub. This matches the established pattern described in envScrub.ts own docstring for the DB pool.

The regression-pin test is solid — it exercises the exact failure scenario (singleton reuse after env deletion) with proper cleanup in a finally block.

The out-of-scope note about src/queue/cancel.ts having the same lazy-read pattern is accurate and correctly scoped out (only called from router/dashboard which do not scrub).

🕵️ claude-code · claude-opus-4-6 · run details

@zbigniewsobiecki zbigniewsobiecki merged commit 97bb822 into dev May 1, 2026
8 checks passed
@zbigniewsobiecki zbigniewsobiecki deleted the fix/review-dedup-survives-env-scrub branch May 1, 2026 17:52
@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/triggers/github/review-dispatch-dedup.ts 66.66% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants