Skip to content

fix(router): don't consume circuit breaker probe slot during routing#172

Merged
kianwoon merged 2 commits intomainfrom
fix/circuit-breaker-probe-consumption-156
Apr 3, 2026
Merged

fix(router): don't consume circuit breaker probe slot during routing#172
kianwoon merged 2 commits intomainfrom
fix/circuit-breaker-probe-consumption-156

Conversation

@kianwoon
Copy link
Copy Markdown
Owner

@kianwoon kianwoon commented Apr 3, 2026

Summary

  • Replace canProceed() with getState() in distribution routing to prevent probe slot consumption
  • Add 2 regression tests proving getState() is read-only and canProceed() correctly gates probes

Problem

In router.ts, the distribution routing code called canProceed() to check if a provider's circuit breaker was open. This had a side-effect: it transitioned the breaker from openhalf-open and consumed the probe slot. When the actual request reached forwardWithFallback() and called canProceed() again, the probe was already consumed (halfOpenInProgress=true), so the provider was skipped.

Result: providers stayed blocked after circuit cooldown in distribution mode. Recovery only happened via the slow health-probe timeout path (5s delay per cycle).

Fix

Use getState() === "open" for the read-only routing check. The actual canProceed() call in forwardWithFallback() is now the only place that can consume the probe slot.

Test plan

  • 2 new regression tests in circuit-breaker.test.ts
  • All 274 existing tests pass
  • Build succeeds

Closes #156

kianwoon added 2 commits April 3, 2026 21:18
…agent traffic

Claude Code can fire multiple concurrent upstream requests via subagents.
A single HTTP/2 connection was sufficient for basic use but could bottleneck
during parallel agent workflows. Bumping to 3 connections per provider
per session gives headroom for typical subagent concurrency.
…resolution

Replace canProceed() with getState() === "open" in the distribution routing
path. canProceed() has a side-effect of transitioning open→half-open and
consuming the probe slot, which prevented the actual request in
forwardWithFallback() from probing the provider.

This caused providers to stay blocked after circuit cooldown in distribution
mode — the routing layer consumed the probe but no actual request was sent,
so recovery depended on the slow health-probe timeout path (5s).

Add regression tests proving getState() is read-only and canProceed()
correctly gates probe access.

Closes #156
@kianwoon kianwoon merged commit b13998e into main Apr 3, 2026
5 checks passed
@kianwoon kianwoon deleted the fix/circuit-breaker-probe-consumption-156 branch April 3, 2026 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Critical] Circuit breaker probe consumed by routing check in distribution mode

1 participant