feat: provider fallback chain with primary-leave cooldown by furukama · Pull Request #413 · HybridAIOne/hybridclaw

furukama · 2026-04-26T20:35:11Z

Summary

Adds a resilience layer in front of the OpenAI-compatible gateway path so a single auth or rate-limit failure no longer breaks a chat. Inspired by the _try_activate_fallback pattern in hermes-agent/run_agent.py:

Auth (401/403) → immediate switch. On any provider auth/permission error, the next entry in the configured fallback chain is activated and the request retried.
Rate limit (429/quota) → switch + cooldown, but only on primary-leave. A 60-second cooldown is armed on the primary provider only when we leave it (first-time switch, or returning to primary after a fallback). Switching from one fallback to the next does not rearm primary's cooldown.
Subsequent requests skip a cooled-down primary and go straight to the first healthy fallback until the cooldown elapses.
Streaming-safe. The streaming handler tracks streamStarted and refuses mid-stream switches to avoid duplicated text deltas.

Configuration is via a new env var, HYBRIDAI_FALLBACK_CHAIN, holding a JSON array of entries:

HYBRIDAI_FALLBACK_CHAIN='[
  {"model":"openrouter/anthropic/claude-3.5-haiku","keyEnv":"OPENROUTER_API_KEY"},
  {"model":"mistral/mistral-small"}
]'

Each entry supports model (required), optional baseUrl, keyEnv (env var to read the API key from), chatbotId, and agentId.

Changes

src/gateway/provider-fallback.ts (new): ProviderFallbackController, module-level cooldown map, classifyProviderError, loadFallbackChainFromEnv, and the callWithProviderFallback wrapper.
src/gateway/openai-compatible.ts: wraps the two tool-chat handlers (non-streaming + streaming) in callWithProviderFallback.
tests/provider-fallback.test.ts (new): 11 unit tests covering chain parsing, error classification, chain advancement with skip-on-resolve-failure, primary-leave cooldown semantics, key-env override, primary-cooldown short-circuit, and exhausted-chain re-throw.

Verification

npx tsc --noEmit — clean.
npx vitest run tests/provider-fallback.test.ts — 11/11 passing.
Full unit suite delta vs main: identical pass count, no regressions introduced (the pre-existing providers.factory.test.ts and gateway-http-server.test.ts failures predate this branch).

Test plan

Set HYBRIDAI_FALLBACK_CHAIN to a chain that starts with a deliberately-bad API key for the primary; confirm the request still completes via fallback.
Trigger a 429 on the primary; confirm cooldown is set and the next request goes straight to the fallback.
Confirm chain-internal switches (fallback A → fallback B) do NOT re-arm primary cooldown.
Exercise streaming chat completions and confirm a mid-stream provider failure surfaces as an error rather than producing duplicated content.

🤖 Generated with Claude Code

Adds a resilience layer modeled on hermes-agent's `_try_activate_fallback` pattern: an ordered fallback chain swaps in a backup provider on auth (401/403) or rate-limit (429/quota) failures, while a 60s cooldown clock is set only when leaving the primary — chain-internal switches don't re-arm it. Subsequent requests skip a cooled-down primary and go straight to the first healthy fallback. - New `src/gateway/provider-fallback.ts` with `ProviderFallbackController`, module-level cooldown map, error classifier, and `callWithProviderFallback` wrapper. - Wraps both tool-chat and streaming tool-chat handlers in `openai-compatible.ts`. Streaming retries refuse mid-stream switches to avoid duplicated text deltas. - Configured via `HYBRIDAI_FALLBACK_CHAIN` env var (JSON array of `{model, baseUrl?, keyEnv?, chatbotId?, agentId?}` entries). - 11 new unit tests covering chain parsing, error classification, primary-leave cooldown semantics, key-env override, primary-cooldown skip, and exhausted-chain re-throw. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds a provider fallback controller for the OpenAI-compatible gateway path to improve resiliency by retrying requests against a configured fallback chain when auth or rate-limit errors occur, including a “primary leave” cooldown mechanism and streaming-safe behavior.

Changes:

Introduces ProviderFallbackController + helpers to parse HYBRIDAI_FALLBACK_CHAIN, classify provider errors, and apply a primary-provider cooldown.
Wraps OpenAI-compatible tool-chat handlers (non-streaming + streaming) with callWithProviderFallback.
Adds a new unit test suite validating parsing, classification, chain advancement, cooldown semantics, and keyEnv behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`src/gateway/provider-fallback.ts`	Implements fallback chain parsing, error classification, cooldown tracking, and the fallback wrapper/controller.
`src/gateway/openai-compatible.ts`	Applies the fallback wrapper to tool-chat request paths (including streaming).
`tests/provider-fallback.test.ts`	Adds unit coverage for fallback chain behavior and cooldown semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+function isRecord(value: unknown): value is Record<string, unknown> {
+  return !!value && typeof value === 'object' && !Array.isArray(value);
+}


+  if (
+    params.chain.length > 0 &&
+    isProviderCooledDown(params.primaryRuntime.provider)
+  ) {
+    const activation = await controller.tryActivate(
+      'rate_limit',
+      params.primaryRuntime.provider,
+    );
+    if (activation) {
+      runtime = activation.runtime;
+      model = activation.model;
+      params.onFallback?.(activation, 'rate_limit');
+    }
+  }


+    const result = await callWithProviderFallback({
+      primaryRuntime: runtime,
+      primaryModel: prepared.model,
+      chain: loadFallbackChainFromEnv(),
+      invoke: async (activeRuntime, activeModel) => {
+        if (streamStarted) {
+          throw new Error(
+            'Stream already started; cannot retry provider fallback mid-stream.',
+          );
+        }
+        return callOpenAICompatibleModelStream({
+          runtime: activeRuntime,
+          model: activeModel,
+          messages,
+          tools: input.tools,
+          toolChoice: input.toolChoice,
+          onTextDelta: (delta) => {
+            if (!delta) return;
+            streamStarted = true;
+            if (!isResponseWritable(res)) return;
+            writeOpenAICompatibleStreamChunk(
+              res,
+              buildOpenAICompatibleStreamTextChunk({
+                completionId,
+                created,
+                model: prepared.responseModel,
+                content: delta,
+              }),
+            );
+          },
+        });


- Reuse the shared `isRecord` helper from `src/utils/type-guards.ts` instead of redeclaring it locally. - Stop re-arming the primary cooldown on the cooled-down skip path: `tryActivate` now accepts `{ markCooldown }`, and the initial skip in `callWithProviderFallback` passes `false`. Without this, steady traffic while the primary was cooling down would push its deadline forward on every request and the primary would never recover. Covered by a new test that fires three back-to-back requests against a cooled-down primary and asserts the original 5 s deadline is honored. - Add an optional `shouldFallback(err, reason)` callback to `callWithProviderFallback`. The streaming tool-chat handler passes `() => !streamStarted`, so a mid-stream provider failure now re-throws the original 401/429 error instead of being masked by a generic "Stream already started" placeholder. Covered by tests for both the suppress and allow paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Benedikt Koehler and others added 2 commits April 26, 2026 22:34

style: apply biome formatting to provider-fallback files

281be06

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 26, 2026 20:35

Copilot AI reviewed Apr 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: provider fallback chain with primary-leave cooldown#413

feat: provider fallback chain with primary-leave cooldown#413
furukama wants to merge 3 commits intomainfrom
feat/provider-fallbacks

furukama commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

furukama commented Apr 26, 2026

Summary

Changes

Verification

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants