Capability-aware multi-provider failover with budget gates #243

HomenShum · 2026-04-26T22:38:08Z

HomenShum
Apr 26, 2026

Capability-aware multi-provider failover with budget gates

Why now

While dogfooding #225's decompose flow on OpenRouter free tier, I hit 4 consecutive 429 Provider returned error... is temporarily rate-limited upstream on deepseek/deepseek-v4-flash and deepseek/deepseek-v4-pro. Agent gave up. There's no auto-failover, so any user who steps away mid-decompose comes back to a dead session with the toast errors and partial state.

OpenRouter's free models specifically have aggressive free-models-per-min rate limits. This isn't a bug in OpenRouter — it's a contract. But it is a fatal failure mode for hands-off agentic flows unless we fail over.

The plumbing for this already exists (multi-provider config, the model-discovery config:v1:test-endpoint we use for "Found 355 models", typed complete() abstraction, generation events). What's missing is the routing layer.

Proposed shape — two layers

OpenRouter and most modern provider gateways have their own native failover (provider.allow_fallbacks: true + provider.order: [...]). But that only handles transient failures within a single gateway. When the whole gateway dies, or capability requirements aren't met, you need a client-side layer.

Layer 1: Server-side native routing (free, fixes transient). Set provider.allow_fallbacks: true + provider.order: [...] in the OpenAI-compat headers when baseUrl is OpenRouter. Handles 429/5xx within OpenRouter's pool with no client-side complexity.

Layer 2: Client-side cross-provider failover (catches when OpenRouter itself is down). Capability-aware resolver: given (currentModel, requiredCapabilities, budgetRemaining), return ordered candidate list. Same-tier-or-better only, never silent downgrade.

Anti-pattern to actively avoid

Cascading degradation. Naive failover chains gpt-5 → claude-haiku → gemini-flash → tiny-7B and silently retries all of them. User gets "completed" job with terrible output and no idea why. Three guardrails:

Capability tier preserved or improved, never lowered — chain resolver filters candidates by tier >= currentTier. Fall sideways or up, never down.
Every switch is visible — new model_switched chat message kind ("Switched deepseek-v4-flash → kimi-k2.6 because: 429 rate-limit, capability tier preserved. Continuing your task.") forces user-aware degradation
Capability mismatch is honest failure — if vision is required and no vision-capable fallback exists, return {ok: false, error: 'no_capable_fallback', tried: [...]} (HONEST_STATUS rule), don't silently strip the image

Implementation outline (8 PRs, sequenced)

#	PR	Effort	Highest leverage?
1	OpenRouter native routing config — set `provider.allow_fallbacks: true` + `provider.order: [...]` in OpenAI client headers when baseUrl is OpenRouter	~30 min	Ship first, fixes today's failure with one config change
2	`packages/core/src/fallback/model-capability-registry.ts` — small JSON map of known model IDs → tags `{vision, tools, longContext, tier: 'frontier'\|'strong'\|'fast'\|'tiny'}`, inferred from ID with override list	~1h	enables rest of plan
3	`packages/core/src/fallback/chain-resolver.ts` — given `(currentModel, requiredCaps, budgetRemaining)` returns ordered candidate list. Same-tier-or-better only	~1.5h	the brain
4	Wire `complete()` in `packages/providers/src/index.ts` to catch 429/5xx/timeout/upstream-error, invoke resolver, swap model, retry once with new model, emit typed `model_switched` event	~2h	the failover itself
5	`model_switched` chat message kind — grey card with from/to/reason. Visible audit trail, non-blocking	~1h	trust signal
6	Budget gates — hard caps `{perTurnUsd, perSessionUsd, perDesignUsd}`. When exceeded, pause job + push toast "Budget cap hit — resume?" rather than silent runaway	~1.5h	safety valve
7	Settings → "Resilience" tab — per-design fallback chain override, budget sliders, "preferred capability tier when failover fires" toggle	~2h	user control
8	`LESSONS.md` integration (cross-link with the rollback Discussion) — append `{model: 'A', failedWith: 429, succeededWith: 'B', count: N}` per failover. Resolver reads this on next turn to upgrade B's priority for this design specifically	~1h	makes the system learn its own infra

PR-1 alone is the cheapest meaningful ship and would have prevented today's specific failure.

Risks & Mitigations

Risk	Mitigation
Silent capability downgrade	Tier-floor enforcement in resolver + visible `model_switched` card
Runaway cost	Hard budget caps (turn/session/design); pause-on-exceed not silent-on-exceed
Tool-call format drift across providers	Capability tag includes `toolCallFormat`; only swap within same format family
Stale fallback chain references dead models	Validate against discovered models on app start; prune missing ones with notification
OpenRouter native fallback hides which model actually answered	Surface `x-router-model` response header in chat audit trail
Failover loop (B fails, retry → A fails, retry → B fails…)	Per-turn retry budget = 3; after that, surface honest failure with full chain attempt log

Alignment check before code

Capability tag scope — start with hand-curated map of ~30 known models, or invest upfront in auto-extraction from provider metadata? My lean: hand-curated for v1, auto-extraction once the map proves the right shape.
Budget enforcement strictness — soft warning + ask, vs hard pause? My lean: hard pause for perTurnUsd, soft warning for perDesignUsd. Configurable.
Should the resolver be deterministic, or should it weight by recent success/failure rate? Deterministic is debuggable; weighted is better behavior. My lean: deterministic for v1 with LESSONS.md as the explicit "weighting" channel (item 8 above).
OpenRouter provider.order config — global default, per-design override, both? My lean: both, with sensible defaults.

Cross-references

Surfaced during dogfood of [Feature]: image 2 已经够厉害了，最需要的是如何把生成好的UI 变成组件化，再到原型的过程！ #225 / PR feat(core): add decompose-to-ui-kit + boolean parity verifiers (Phase 1 of #225) #241 (decompose to UI kit)
Sibling Discussions: "Time-travel chat: snapshot + rollback + lessons subsystem", "Spiral detector: auto-pause after N consecutive non-progressive turns"
Item 8 (LESSONS.md integration) intentionally couples to the rollback Discussion — the same per-design memory that learns from semantic failures should learn from infrastructure failures

Reference fork-state evidence

The 4× 429 spiral that motivated this is reproducible by setting OpenRouter as the active provider, picking any free-tier model, and running the decompose flow on a moderately complex artifact. The agent makes 4-6 tool calls in close succession; OpenRouter's free-models-per-min cap kicks in around call 3-4; the existing error path doesn't retry-with-fallback so generation dies.

Looking forward to direction before writing any of the actual code. PR-1 (OpenRouter native routing, ~30 min) could ship as a one-off if there's appetite while we discuss the broader shape.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capability-aware multi-provider failover with budget gates #243

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Capability-aware multi-provider failover with budget gates #243

Uh oh!

HomenShum Apr 26, 2026

Capability-aware multi-provider failover with budget gates

Why now

Proposed shape — two layers

Anti-pattern to actively avoid

Implementation outline (8 PRs, sequenced)

Risks & Mitigations

Alignment check before code

Cross-references

Reference fork-state evidence

Replies: 0 comments

HomenShum
Apr 26, 2026