@stackbilt/llm-providers

A multi-provider LLM abstraction layer with automatic failover, graduated circuit breakers, cost tracking, and intelligent retry. Built for Cloudflare Workers but runs anywhere with a standard fetch API. Extracted from a production orchestration platform handling 80K+ LOC across multiple services.

Features

Multi-provider failover -- OpenAI, Anthropic, Cloudflare Workers AI, Cerebras, and Groq behind a single interface
Graduated circuit breaker -- 4-state machine (closed / degraded / recovering / open) with probabilistic traffic routing prevents cascading failures
Exponential backoff retry -- configurable delays, jitter, and per-error-class behavior
Cost tracking and optimization -- per-provider cost attribution, budget alerts with CreditLedger, automatic routing to cheaper providers
Declarative model catalog -- semantic model metadata drives recommendations, provider defaults, and fallback routing
Rate limit enforcement -- CreditLedger tracks RPM/RPD/TPM/TPD per provider; factory skips providers that exceed limits
Streaming with fallback -- SSE streaming on all providers; factory-level streaming routes through the same circuit-breaker and fallback chain as non-streaming requests
Tool/function calling -- OpenAI, Anthropic, Cerebras, and Cloudflare tool use with unified response format
Tool-use loop helper -- generateResponseWithTools owns the request → parse → execute → repeat cycle with iteration caps, cost limits, and abort signal support
Provider-agnostic cache hints -- LLMRequest.cache translates to provider-native caching (Anthropic cache_control breakpoints; automatic on OpenAI/Groq/Cerebras); cached token counts normalized into TokenUsage
Schema drift detection -- envelope validation on every provider response; streaming frames validated per-chunk; SchemaDriftError routes through fallback chain and fires onSchemaDrift hook
Schema canary -- runCanaryCheck / extractShape / compareShapes for comparing live response shapes against committed golden fixtures
Image generation -- Cloudflare Workers AI (SDXL, FLUX) and Google Gemini
Health monitoring -- per-provider health checks, metrics, and circuit breaker state
Structured logging -- injectable Logger interface; silent by default, opt-in to console or custom loggers
Zero runtime dependencies -- no transitive dependency tree to audit

Installation

npm install @stackbilt/llm-providers

Quick Start

import { LLMProviders, MODELS } from '@stackbilt/llm-providers';

const llm = new LLMProviders({
  openai: { apiKey: process.env.OPENAI_API_KEY },
  anthropic: { apiKey: process.env.ANTHROPIC_API_KEY },
  cloudflare: { ai: env.AI }, // Cloudflare Workers AI binding
  defaultProvider: 'auto',
  costOptimization: true,
  enableCircuitBreaker: true,
});

const response = await llm.generateResponse({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Summarize the circuit breaker pattern.' },
  ],
  maxTokens: 1000,
  temperature: 0.7,
});

console.log(response.message);
console.log(`Provider: ${response.provider}, Cost: $${response.usage.cost}`);

Auto-Discovery from Environment

import { LLMProviders } from '@stackbilt/llm-providers';

// Scans env for ANTHROPIC_API_KEY, OPENAI_API_KEY, GROQ_API_KEY,
// CEREBRAS_API_KEY, and AI binding — configures only what's present
const llm = LLMProviders.fromEnv(env, {
  costOptimization: true,
  enableCircuitBreaker: true,
});

Providers

Provider	Models	Streaming	Tools	Notes
OpenAI	GPT-4o Mini, GPT-4 Turbo, GPT-4, GPT-3.5 Turbo	Yes	Yes	Default: `gpt-4o-mini`
Anthropic	Claude Opus 4.6, Sonnet 4.6, Sonnet 4, Haiku 4.5, 3.7 Sonnet, 3.5 Sonnet/Haiku, 3 Opus/Sonnet	Yes	Yes	Default: `claude-haiku-4-5-20251001`
Cloudflare	Gemma 4 26B, Llama 4 Scout, GPT-OSS 120B, LLaMA 3.x, Mistral 7B, Qwen 1.5, TinyLlama, and more	Yes	GPT-OSS, Gemma 4, Llama 4 Scout	Default is request-aware and catalog-driven
Cerebras	LLaMA 3.1 8B, LLaMA 3.3 70B, ZAI-GLM 4.7, Qwen 3 235B	Yes	GLM/Qwen only	~2,200 tok/s
Groq	LLaMA 3.3 70B Versatile, LLaMA 3.1 8B Instant, GPT-OSS 120B	Yes	LLaMA 3.3 70B, GPT-OSS 120B	Ultra-fast inference

Provider Configuration

// OpenAI
{ apiKey: 'sk-...', organization: 'org-...', project: 'proj-...' }

// Anthropic
{ apiKey: 'sk-ant-...', version: '2023-06-01' }

// Cloudflare Workers AI
{ ai: env.AI, accountId: '...' }

// Cerebras
{ apiKey: 'csk-...' }

// Groq
{ apiKey: 'gsk_...' }

Logging

The library is silent by default. Opt in to logging by passing a Logger:

import { LLMProviders, consoleLogger } from '@stackbilt/llm-providers';

const llm = new LLMProviders({
  anthropic: { apiKey: '...', logger: consoleLogger },
  logger: consoleLogger, // factory-level logging
});

Or implement your own Logger interface (debug, info, warn, error).

Circuit Breaker

Each provider gets a graduated circuit breaker that routes traffic away from failing providers with probabilistic degradation.

State	Behavior
Closed	100% traffic to primary. Failures increment counter.
Degraded	Traffic splits probabilistically (90% → 70% → 40% → 10%) as failures accumulate.
Recovering	Success steps traffic back up one level at a time.
Open	0% traffic. After `resetTimeout` ms, failures decay and traffic resumes.

Default: 5-step degradation curve [1.0, 0.9, 0.7, 0.4, 0.1], 60s reset timeout, 5-minute monitoring window.

import { CircuitBreakerManager } from '@stackbilt/llm-providers';

const manager = new CircuitBreakerManager({
  failureThreshold: 5,
  resetTimeout: 60000,
  monitoringPeriod: 300000,
  degradationCurve: [1.0, 0.9, 0.7, 0.4, 0.1],
});

const breaker = manager.getBreaker('openai');
console.log(breaker.getHealth());

Cost Tracking & Budget Management

import { CreditLedger, LLMProviders } from '@stackbilt/llm-providers';

const ledger = new CreditLedger({
  budgets: [
    { provider: 'openai', monthlyBudget: 50, rateLimits: { rpm: 60, rpd: 10000 } },
    { provider: 'anthropic', monthlyBudget: 100 },
  ],
});

// Threshold alerts fire at 80%, 90%, 95% utilization
ledger.on((event) => {
  if (event.type === 'threshold_crossed') {
    console.warn(`${event.provider}: ${event.tier} — ${event.utilizationPct.toFixed(0)}% of budget`);
  }
});

const llm = new LLMProviders({
  openai: { apiKey: '...' },
  anthropic: { apiKey: '...' },
  costOptimization: true,
  ledger, // Factory enforces rate limits and tracks spend
});

Model Catalog & Runtime Selection

Model selection is driven by a declarative catalog rather than a hardcoded fallback array. The selector intersects:

requested use case and capabilities
configured providers
circuit breaker state (CLOSED, DEGRADED, RECOVERING, OPEN)
CreditLedger utilization and projected burn/depletion pressure

The catalog also distinguishes active, compatibility, and retired models. Retired IDs can remain exported for compatibility, but they are not recommendation targets.

import {
  MODEL_CATALOG,
  MODEL_RECOMMENDATIONS,
  getRecommendedModel,
  inferUseCaseFromRequest
} from '@stackbilt/llm-providers';

const useCase = inferUseCaseFromRequest({
  messages: [{ role: 'user', content: 'Call the weather tool' }],
  tools: [{
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get weather',
      parameters: { type: 'object' }
    }
  }]
});

const model = getRecommendedModel('TOOL_CALLING', ['cloudflare', 'openai']);

For runtime-aware recommendations from a configured instance:

const recommended = llm.getRecommendedModel({
  messages: [{ role: 'user', content: 'Summarize this incident' }],
  maxTokens: 800
});

Fallback Rules

Customize when and how the factory falls back between providers:

const llm = new LLMProviders({
  openai: { apiKey: '...' },
  anthropic: { apiKey: '...' },
  cloudflare: { ai: env.AI },
  cerebras: { apiKey: '...' },
  fallbackRules: [
    { condition: 'rate_limit', fallbackProvider: 'cloudflare' },
    { condition: 'cost', threshold: 10, fallbackProvider: 'cloudflare' },
    { condition: 'error', fallbackProvider: 'anthropic' },
  ],
});

Default provider precedence remains Cloudflare → Cerebras → Groq → Anthropic → OpenAI, but actual dispatch is catalog-driven and can be reordered at runtime by request fit, circuit-breaker state, and ledger burn-rate pressure.

Error Handling

Structured error classes for each failure mode:

import {
  RateLimitError,
  QuotaExceededError,
  AuthenticationError,
  CircuitBreakerOpenError,
  TimeoutError,
} from '@stackbilt/llm-providers';

try {
  await llm.generateResponse(request);
} catch (error) {
  if (error instanceof RateLimitError) {
    // Automatic retry already attempted; consider switching providers
  } else if (error instanceof CircuitBreakerOpenError) {
    // Provider is temporarily disabled
  } else if (error instanceof AuthenticationError) {
    // Check API key -- will NOT trigger fallback
  }
}

Model Constants

import { MODELS, getRecommendedModel } from '@stackbilt/llm-providers';

// Current-gen models
MODELS.CLAUDE_OPUS_4_6;         // 'claude-opus-4-6-20250618'
MODELS.CLAUDE_SONNET_4_6;       // 'claude-sonnet-4-6-20250618'
MODELS.CLAUDE_HAIKU_4_5;        // 'claude-haiku-4-5-20251001'
MODELS.GPT_4O;                  // 'gpt-4o' (deprecated / compatibility only)
MODELS.GPT_4O_MINI;             // 'gpt-4o-mini'
MODELS.CEREBRAS_ZAI_GLM_4_7;    // 'zai-glm-4.7'

// Get best active model for a use case given available providers
const model = getRecommendedModel('COST_EFFECTIVE', ['openai', 'cloudflare']);

Factory-Level Streaming

generateResponseStream uses the same provider-selection, circuit-breaker, and exhaustion-registry path as generateResponse. Pre-stream HTTP errors (401, 429, 503, circuit open) fall over to the next provider before emitting the first chunk.

const stream = await llm.generateResponseStream({
  messages: [{ role: 'user', content: 'Tell me a story.' }],
  model: 'claude-haiku-4-5-20251001',
});

const reader = stream.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  process.stdout.write(value); // string chunk
}

Tool-Use Loop

generateResponseWithTools owns the generateResponse → parse → execute → append → repeat cycle. It enforces iteration caps, cumulative cost limits, and abort-signal cancellation — no boilerplate needed on the caller side.

import { LLMProviders, ToolLoopLimitError } from '@stackbilt/llm-providers';

const result = await llm.generateResponseWithTools(
  {
    messages: [{ role: 'user', content: 'What is 2 + 2 * 3?' }],
    tools: [{
      type: 'function',
      function: {
        name: 'calculate',
        description: 'Evaluate a math expression',
        parameters: { type: 'object', properties: { expr: { type: 'string' } }, required: ['expr'] }
      }
    }],
  },
  {
    execute: async (name, args) => {
      if (name === 'calculate') return eval((args as { expr: string }).expr);
      throw new Error(`Unknown tool: ${name}`);
    }
  },
  { maxIterations: 5, maxCostUSD: 0.10 }
);

console.log(result.message); // final assistant response after tool execution

Prompt Cache Hints

Pass a provider-agnostic cache hint on any request. The library translates it to the appropriate provider-native mechanism.

const response = await llm.generateResponse({
  messages: [{ role: 'user', content: 'Summarize the context.' }],
  systemPrompt: 'You are an expert at analyzing long documents. [... 10KB of stable context ...]',
  model: 'claude-haiku-4-5-20251001',
  cache: {
    strategy: 'provider-prefix',   // mark the stable prefix for caching
    cacheablePrefix: 'auto',       // cache system prompt + tools (default)
  },
});

// Cached token counts are normalized in TokenUsage
console.log(response.usage.cacheReadInputTokens);    // Anthropic cache hit tokens
console.log(response.usage.cachedInputTokens);       // OpenAI / Groq / Cerebras cache hit tokens

Strategy	Behavior
`'off'`	No caching hints sent
`'provider-prefix'`	Mark stable prefix for provider-side caching
`'response'`	Enable AI Gateway response caching (via `GatewayMetadata`)
`'both'`	Both prefix and response caching

Schema Drift Canary

Use the canary utilities to compare a live provider response against a committed golden fixture and detect API shape drift before it reaches production.

import {
  extractShape, compareShapes, runCanaryCheck
} from '@stackbilt/llm-providers';

// 1. Load your committed golden fixture (flat path → type map)
import goldenShape from './fixtures/openai.json';

// 2. Fetch a raw response from the provider (your responsibility)
const liveResponse = await fetch('https://api.openai.com/v1/chat/completions', ...).then(r => r.json());

// 3. Check for drift
const report = runCanaryCheck('openai', goldenShape, liveResponse);

if (report.status === 'drift') {
  console.error('OpenAI response shape changed!', report.diff);
  // diff.added   — new fields (additive, usually safe)
  // diff.removed — missing fields (breaking, alert immediately)
  // diff.changed — type-changed fields (breaking, alert immediately)
}

Generate your initial golden fixture from a known-good response:

import { extractShape } from '@stackbilt/llm-providers';
import fs from 'fs';

const shape = extractShape(knownGoodResponse);
fs.writeFileSync('fixtures/openai.json', JSON.stringify(shape, null, 2));

API Reference

Core Classes

Class	Description
`LLMProviders`	High-level facade -- initialize providers, generate responses, check health
`LLMProviderFactory`	Lower-level factory with provider chain building, catalog-based routing, and fallback logic
`OpenAIProvider`	OpenAI GPT models (streaming, tools)
`AnthropicProvider`	Anthropic Claude models (streaming, tools)
`CloudflareProvider`	Cloudflare Workers AI (streaming, tools on GPT-OSS/Gemma 4/Llama 4, batch)
`CerebrasProvider`	Cerebras fast inference (streaming, tools on GLM/Qwen)
`GroqProvider`	Groq fast inference (streaming, tools on GPT-OSS/LLaMA 3.3 70B)
`BaseProvider`	Abstract base with shared resiliency, metrics, and cost calculation

Utilities

Class / Export	Description
`CircuitBreaker`	Graduated 4-state circuit breaker with probabilistic degradation
`CircuitBreakerManager`	Manages circuit breakers across multiple providers
`RetryManager`	Exponential backoff retry with jitter
`CostTracker`	Per-provider cost accumulation and budget alerts
`CreditLedger`	Monthly budgets, rate limits, burn rate projection, threshold events
`CostOptimizer`	Static methods for optimal provider selection
`MODEL_CATALOG`	Declarative model metadata for routing and recommendation
`ImageProvider`	Multi-provider image generation (Cloudflare SDXL/FLUX, Google Gemini)
`extractShape`	Walk a raw API response into a flat `path → type` shape map
`compareShapes`	Diff two shape maps into `{ added, removed, changed }`
`runCanaryCheck`	One-shot canary: extract live shape, compare against golden, return `CanaryReport`
`validateSchema`	Low-level envelope validator (for custom provider authors)

Logger

Export	Description
`Logger`	Interface: `debug`, `info`, `warn`, `error` methods
`noopLogger`	Silent logger (default)
`consoleLogger`	Forwards to `console.*` (opt-in)

Key Types

Type	Description
`LLMRequest`	Unified request: messages, model, temperature, tools, response_format, cache, lora
`LLMResponse`	Unified response: message, usage (with cost), provider, tool calls
`TokenUsage`	Token counts, cost, and cached token fields (cachedInputTokens, cacheReadInputTokens, cacheCreationInputTokens)
`CacheHints`	Cache strategy, key, ttl, sessionId, cacheablePrefix for provider-agnostic prompt caching
`ToolExecutor`	Interface for `generateResponseWithTools`: `execute(name, args) => Promise<unknown>`
`ToolLoopOptions`	Loop config: maxIterations, maxCostUSD, onIteration, abortSignal
`CanaryReport`	Schema canary result: provider, status ('ok'
`ShapeMap`	Flat `path → JSON-type` map produced by `extractShape`
`ProviderFactoryConfig`	Factory config: provider configs, fallback rules, ledger, logger
`CostAnalytics`	Cost breakdown, total, and recommendations
`ProviderHealthEntry`	Health status, metrics, circuit breaker state, capabilities
`ModelCatalogEntry`	Declarative model metadata: provider, lifecycle, capabilities, use cases

Factory Functions

Function	Description
`createLLMProviders(config)`	Create an `LLMProviders` instance
`createCostOptimizedLLMProviders(config)`	Create with cost optimization, circuit breakers, and retries enabled
`LLMProviders.fromEnv(env)`	Auto-discover providers from environment variables
`llm.generateResponse(request)`	Generate a response with provider selection and fallback
`llm.generateResponseStream(request)`	Streaming generation; fallback chain active before first chunk
`llm.generateResponseWithTools(request, executor, opts?)`	Managed tool-use loop with caps and abort-signal support
`llm.getRecommendedModel(request, useCase?)`	Runtime recommendation using configured providers, health, and ledger state
`getRecommendedModel(useCase, providers, context?)`	Pick the best active model for a use case
`runCanaryCheck(provider, golden, liveResponse)`	Compare live response shape against golden fixture
`extractShape(obj)`	Extract flat path → type map from any object
`retry(fn, config)`	One-shot retry wrapper for any async function

License

Apache-2.0

Built by Stackbilt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

@stackbilt/llm-providers

Features

Installation

Quick Start

Auto-Discovery from Environment

Providers

Provider Configuration

Logging

Circuit Breaker

Cost Tracking & Budget Management

Model Catalog & Runtime Selection

Fallback Rules

Error Handling

Model Constants

Factory-Level Streaming

Tool-Use Loop

Prompt Cache Hints

Schema Drift Canary

API Reference

Core Classes

Utilities

Logger

Key Types

Factory Functions

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

@stackbilt/llm-providers

Features

Installation

Quick Start

Auto-Discovery from Environment

Providers

Provider Configuration

Logging

Circuit Breaker

Cost Tracking & Budget Management

Model Catalog & Runtime Selection

Fallback Rules

Error Handling

Model Constants

Factory-Level Streaming

Tool-Use Loop

Prompt Cache Hints

Schema Drift Canary

API Reference

Core Classes

Utilities

Logger

Key Types

Factory Functions

License