Skip to content

feat: provider fallback chain with primary-leave cooldown#413

Open
furukama wants to merge 3 commits intomainfrom
feat/provider-fallbacks
Open

feat: provider fallback chain with primary-leave cooldown#413
furukama wants to merge 3 commits intomainfrom
feat/provider-fallbacks

Conversation

@furukama
Copy link
Copy Markdown
Contributor

Summary

Adds a resilience layer in front of the OpenAI-compatible gateway path so a single auth or rate-limit failure no longer breaks a chat. Inspired by the _try_activate_fallback pattern in hermes-agent/run_agent.py:

  • Auth (401/403) → immediate switch. On any provider auth/permission error, the next entry in the configured fallback chain is activated and the request retried.
  • Rate limit (429/quota) → switch + cooldown, but only on primary-leave. A 60-second cooldown is armed on the primary provider only when we leave it (first-time switch, or returning to primary after a fallback). Switching from one fallback to the next does not rearm primary's cooldown.
  • Subsequent requests skip a cooled-down primary and go straight to the first healthy fallback until the cooldown elapses.
  • Streaming-safe. The streaming handler tracks streamStarted and refuses mid-stream switches to avoid duplicated text deltas.

Configuration is via a new env var, HYBRIDAI_FALLBACK_CHAIN, holding a JSON array of entries:

HYBRIDAI_FALLBACK_CHAIN='[
  {"model":"openrouter/anthropic/claude-3.5-haiku","keyEnv":"OPENROUTER_API_KEY"},
  {"model":"mistral/mistral-small"}
]'

Each entry supports model (required), optional baseUrl, keyEnv (env var to read the API key from), chatbotId, and agentId.

Changes

  • src/gateway/provider-fallback.ts (new): ProviderFallbackController, module-level cooldown map, classifyProviderError, loadFallbackChainFromEnv, and the callWithProviderFallback wrapper.
  • src/gateway/openai-compatible.ts: wraps the two tool-chat handlers (non-streaming + streaming) in callWithProviderFallback.
  • tests/provider-fallback.test.ts (new): 11 unit tests covering chain parsing, error classification, chain advancement with skip-on-resolve-failure, primary-leave cooldown semantics, key-env override, primary-cooldown short-circuit, and exhausted-chain re-throw.

Verification

  • npx tsc --noEmit — clean.
  • npx vitest run tests/provider-fallback.test.ts — 11/11 passing.
  • Full unit suite delta vs main: identical pass count, no regressions introduced (the pre-existing providers.factory.test.ts and gateway-http-server.test.ts failures predate this branch).

Test plan

  • Set HYBRIDAI_FALLBACK_CHAIN to a chain that starts with a deliberately-bad API key for the primary; confirm the request still completes via fallback.
  • Trigger a 429 on the primary; confirm cooldown is set and the next request goes straight to the fallback.
  • Confirm chain-internal switches (fallback A → fallback B) do NOT re-arm primary cooldown.
  • Exercise streaming chat completions and confirm a mid-stream provider failure surfaces as an error rather than producing duplicated content.

🤖 Generated with Claude Code

Benedikt Koehler and others added 2 commits April 26, 2026 22:34
Adds a resilience layer modeled on hermes-agent's `_try_activate_fallback`
pattern: an ordered fallback chain swaps in a backup provider on auth
(401/403) or rate-limit (429/quota) failures, while a 60s cooldown clock
is set only when leaving the primary — chain-internal switches don't
re-arm it. Subsequent requests skip a cooled-down primary and go
straight to the first healthy fallback.

- New `src/gateway/provider-fallback.ts` with `ProviderFallbackController`,
  module-level cooldown map, error classifier, and `callWithProviderFallback`
  wrapper.
- Wraps both tool-chat and streaming tool-chat handlers in
  `openai-compatible.ts`. Streaming retries refuse mid-stream switches
  to avoid duplicated text deltas.
- Configured via `HYBRIDAI_FALLBACK_CHAIN` env var (JSON array of
  `{model, baseUrl?, keyEnv?, chatbotId?, agentId?}` entries).
- 11 new unit tests covering chain parsing, error classification,
  primary-leave cooldown semantics, key-env override, primary-cooldown
  skip, and exhausted-chain re-throw.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 26, 2026 20:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a provider fallback controller for the OpenAI-compatible gateway path to improve resiliency by retrying requests against a configured fallback chain when auth or rate-limit errors occur, including a “primary leave” cooldown mechanism and streaming-safe behavior.

Changes:

  • Introduces ProviderFallbackController + helpers to parse HYBRIDAI_FALLBACK_CHAIN, classify provider errors, and apply a primary-provider cooldown.
  • Wraps OpenAI-compatible tool-chat handlers (non-streaming + streaming) with callWithProviderFallback.
  • Adds a new unit test suite validating parsing, classification, chain advancement, cooldown semantics, and keyEnv behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/gateway/provider-fallback.ts Implements fallback chain parsing, error classification, cooldown tracking, and the fallback wrapper/controller.
src/gateway/openai-compatible.ts Applies the fallback wrapper to tool-chat request paths (including streaming).
tests/provider-fallback.test.ts Adds unit coverage for fallback chain behavior and cooldown semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/gateway/provider-fallback.ts Outdated
Comment on lines +26 to +28
function isRecord(value: unknown): value is Record<string, unknown> {
return !!value && typeof value === 'object' && !Array.isArray(value);
}
Comment on lines +212 to +225
if (
params.chain.length > 0 &&
isProviderCooledDown(params.primaryRuntime.provider)
) {
const activation = await controller.tryActivate(
'rate_limit',
params.primaryRuntime.provider,
);
if (activation) {
runtime = activation.runtime;
model = activation.model;
params.onFallback?.(activation, 'rate_limit');
}
}
Comment thread src/gateway/openai-compatible.ts Outdated
Comment on lines +708 to +738
const result = await callWithProviderFallback({
primaryRuntime: runtime,
primaryModel: prepared.model,
chain: loadFallbackChainFromEnv(),
invoke: async (activeRuntime, activeModel) => {
if (streamStarted) {
throw new Error(
'Stream already started; cannot retry provider fallback mid-stream.',
);
}
return callOpenAICompatibleModelStream({
runtime: activeRuntime,
model: activeModel,
messages,
tools: input.tools,
toolChoice: input.toolChoice,
onTextDelta: (delta) => {
if (!delta) return;
streamStarted = true;
if (!isResponseWritable(res)) return;
writeOpenAICompatibleStreamChunk(
res,
buildOpenAICompatibleStreamTextChunk({
completionId,
created,
model: prepared.responseModel,
content: delta,
}),
);
},
});
- Reuse the shared `isRecord` helper from `src/utils/type-guards.ts`
  instead of redeclaring it locally.
- Stop re-arming the primary cooldown on the cooled-down skip path:
  `tryActivate` now accepts `{ markCooldown }`, and the initial skip in
  `callWithProviderFallback` passes `false`. Without this, steady traffic
  while the primary was cooling down would push its deadline forward on
  every request and the primary would never recover. Covered by a new
  test that fires three back-to-back requests against a cooled-down
  primary and asserts the original 5 s deadline is honored.
- Add an optional `shouldFallback(err, reason)` callback to
  `callWithProviderFallback`. The streaming tool-chat handler passes
  `() => !streamStarted`, so a mid-stream provider failure now
  re-throws the original 401/429 error instead of being masked by a
  generic "Stream already started" placeholder. Covered by tests for
  both the suppress and allow paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants