fix(triggers): read REDIS_URL via routerConfig so dedup survives envScrub#1250
Conversation
…crub PR #1248's Redis-backed dedup throws inside cascade-worker processes: ERROR Review-dispatch dedup Redis call failed — failing closed error: 'Error: REDIS_URL is required for review-dispatch dedup' Confirmed live at 2026-05-01T17:25:44 in worker `cascade-worker-coalesce_ucho_MNG-461_*`. Sentry tag: `review_dedup_redis_down`. Root cause: `src/utils/envScrub.ts:13-18` lists REDIS_URL in `SENSITIVE_ENV_KEYS`. Worker-entry calls `scrubSensitiveEnv()` early in startup, which does `delete process.env.REDIS_URL` to keep infra secrets out of agent-spawned subprocesses. The dedup module reads `process.env.REDIS_URL` lazily at first dispatch — well after the scrub — so it sees `undefined` and throws. The fail-closed handler returns false, and the post-completion-hook silently logs `Skipping post-completion review: already dispatched` (which looks identical to the working dedup's "correctly rejected duplicate" log, hence the bug evaded the prod canary). Why it didn't bite immediately: the post-completion-hook short-circuits when CI isn't all passing at impl exit (`agent-execution.ts:269-275`), so the bug only triggers on the narrow timing where CI is fully green BEFORE impl finishes. Small fraction of runs. Why router is unaffected: cascade-router never calls scrubSensitiveEnv. PR #196's review (verified 1-run earlier) was dispatched from the router via check-suite-success — that path was never broken. Fix: read the URL via `routerConfig.redisUrl` instead of `process.env.REDIS_URL`. routerConfig is captured at module-load time in `src/router/config.ts:137-138`, well before scrubSensitiveEnv runs. The cached value survives the scrub the same way the DB pool's cached connection string does — exactly the pattern envScrub.ts's docstring describes. Two-line source change + new regression-pin test that claims, deletes process.env.REDIS_URL, and claims again — both must succeed (second deduped) without throwing `REDIS_URL is required`. Verification: vitest 2497/2497, typecheck clean, lint clean. Out of scope (follow-up): `src/queue/cancel.ts` has the same `process.env.REDIS_URL` lazy pattern but is only called from cascade-router and cascade-dashboard (neither scrubs env), so it doesn't bite today. Apply the same routerConfig swap for symmetry in a separate PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nhopeatall
left a comment
There was a problem hiding this comment.
LGTM — minimal, well-targeted fix for a real production regression.
The root cause analysis is correct: routerConfig.redisUrl is captured at module-load time (config.ts:137-138) which is before scrubSensitiveEnv() runs (worker-entry.ts:480), so the cached value survives the scrub. This matches the established pattern described in envScrub.ts own docstring for the DB pool.
The regression-pin test is solid — it exercises the exact failure scenario (singleton reuse after env deletion) with proper cleanup in a finally block.
The out-of-scope note about src/queue/cancel.ts having the same lazy-read pattern is accurate and correctly scoped out (only called from router/dashboard which do not scrub).
🕵️ claude-code · claude-opus-4-6 · run details
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Live prod regression — fixes silent post-completion-hook failure in worker processes
PR #1248 shipped Redis-backed dedup. It works in cascade-router but throws inside cascade-worker processes with:
```
ERROR Review-dispatch dedup Redis call failed — failing closed
error: 'Error: REDIS_URL is required for review-dispatch dedup'
```
Confirmed live at 2026-05-01T17:25:44 in worker `cascade-worker-coalesce_ucho_MNG-461_*`. Sentry tag: `review_dedup_redis_down`.
Root cause
`src/utils/envScrub.ts:13-18` lists `REDIS_URL` in `SENSITIVE_ENV_KEYS`. Every cascade-worker calls `scrubSensitiveEnv()` early in startup (after `getDb()` initializes the DB pool), which does `delete process.env.REDIS_URL` — intentional, to keep infra secrets out of agent-spawned subprocesses.
The dedup module reads `process.env.REDIS_URL` lazily at first dispatch — well after the scrub. Sees `undefined`. Throws. Fail-closed handler returns false. Post-completion-hook logs `Skipping post-completion review: already dispatched` — which is the same log line the working dedup emits when correctly rejecting a duplicate, hence the bug evaded the prod canary until Loki searched specifically for `REDIS_URL is required`.
Why router is unaffected, why this didn't bite immediately
The fix
Read the URL via `routerConfig.redisUrl` instead of `process.env.REDIS_URL`. `routerConfig` is captured at module-load time in `src/router/config.ts:137-138` — well before `scrubSensitiveEnv()` runs. The cached value survives the scrub, exactly the pattern envScrub.ts's docstring describes for the DB pool.
Two-line source change in `src/triggers/github/review-dispatch-dedup.ts`:
JSDoc rewritten to explain the scrub interaction.
Tests
New regression-pin test in `review-dispatch-dedup.test.ts`:
```ts
it('claims successfully even after process.env.REDIS_URL is deleted', async () => {
// 1) claim once with REDIS_URL set — succeeds (creates singleton).
// 2) delete process.env.REDIS_URL (simulating envScrub).
// 3) claim again with the SAME key → false (dedup hit, NOT throw).
// 4) claim with DIFFERENT key → true (succeeds despite scrubbed env).
});
```
Pre-PR-#1250 this test fails on step 3 with `Error: REDIS_URL is required`. Post-fix it passes cleanly.
Out of scope (intentionally — follow-up)
Test plan
🤖 Generated with Claude Code