🚀 Feature Request
Problem
Provider outages, rate limits, and timeouts cause complete request failures with no recovery. There's no built-in way to fall through to a backup model or distribute load across providers — leaving all resilience logic to the application developer to wire manually.
💡 Proposed Solution
Two focused additions to @yourgpt/llm-sdk:
- Fallback Chain — auto-retry with the next provider on failure
- Routing Strategies — priority (default) and round-robin
Fallback Chain
When the primary model fails, the SDK automatically retries with the next model in the fallbacks list — transparently, without any change to the calling code.
Triggers fallback on:
5xx server errors
- Network timeouts
- Provider unavailability
429 rate limit errors
Does not trigger on:
4xx client errors (bad request, invalid API key) — these are caller bugs, not provider failures
An onFallback callback fires on each failed attempt, exposing the attempted model, next model, error message, and attempt number — for logging and observability.
A FallbackExhaustedError is thrown when all models in the chain fail, with a per-model breakdown of what failed and why.
Routing Strategies
Instead of always trying the primary model first, a routing strategy determines which model in the pool to call first.
| Strategy |
Description |
priority |
Try models in defined order. Default. |
round-robin |
Rotate starting model evenly across calls |
Routing Store (Pluggable State for Strategies)
Strategies like round-robin need to track state (e.g. which model was last used) to work correctly across multiple calls. By default the SDK uses an in-memory store — works out of the box for single-process apps but resets on restart and does not share state across instances.
For production multi-instance or serverless deployments, users can plug in their own store via a simple get/set interface — no specific client is mandated.
- Built-in:
MemoryRoutingStore (default, zero config, zero deps)
- Bring your own: Any store that implements the interface — Redis, Upstash, Cloudflare KV, DynamoDB, or anything else. The SDK ships the interface, the user owns the implementation.
This keeps the SDK lightweight and deployment-agnostic — no Redis client is ever bundled or required.
🚀 Feature Request
Problem
Provider outages, rate limits, and timeouts cause complete request failures with no recovery. There's no built-in way to fall through to a backup model or distribute load across providers — leaving all resilience logic to the application developer to wire manually.
💡 Proposed Solution
Two focused additions to
@yourgpt/llm-sdk:Fallback Chain
When the primary model fails, the SDK automatically retries with the next model in the
fallbackslist — transparently, without any change to the calling code.Triggers fallback on:
5xxserver errors429rate limit errorsDoes not trigger on:
4xxclient errors (bad request, invalid API key) — these are caller bugs, not provider failuresAn
onFallbackcallback fires on each failed attempt, exposing the attempted model, next model, error message, and attempt number — for logging and observability.A
FallbackExhaustedErroris thrown when all models in the chain fail, with a per-model breakdown of what failed and why.Routing Strategies
Instead of always trying the primary model first, a routing strategy determines which model in the pool to call first.
priorityround-robinRouting Store (Pluggable State for Strategies)
Strategies like
round-robinneed to track state (e.g. which model was last used) to work correctly across multiple calls. By default the SDK uses an in-memory store — works out of the box for single-process apps but resets on restart and does not share state across instances.For production multi-instance or serverless deployments, users can plug in their own store via a simple
get/setinterface — no specific client is mandated.MemoryRoutingStore(default, zero config, zero deps)This keeps the SDK lightweight and deployment-agnostic — no Redis client is ever bundled or required.