Releases: Stackbilt-dev/llm-providers
v1.6.0 — SSE validation, cache hints, schema canary
What's new
Streaming schema validation (#41)
All four providers now surface malformed SSE frames as SchemaDriftError and fire onSchemaDrift instead of swallowing silently. Anthropic additionally validates content_block_delta event shape and delta.text type; future tool-streaming delta types are skipped via forward-compat discriminator.
Cache-aware routing (#52)
- New
CacheHintstype —LLMRequest.cacheis a no-op for callers that don't set it - Anthropic:
strategy: 'provider-prefix'wraps the system prompt as a content block withcache_control: { type: 'ephemeral' }and marks the last tool as a breakpoint - OpenAI / Groq / Cerebras: automatic caching with no request-side translation needed
- Cached token counts normalized into
TokenUsage:cachedInputTokens,cacheReadInputTokens,cacheCreationInputTokens supportsPromptCacheflag added toModelCapabilities
Schema drift canary (#39 Part 2)
extractShape(obj)— flatpath → typemap from any response objectcompareShapes(golden, live)— diffs two shape maps into{ added, removed, changed }runCanaryCheck(provider, golden, liveResponse)— one-shot canary returning aCanaryReport- Golden fixtures committed for all five providers under
src/__tests__/fixtures/response-shapes/ - All three utilities exported from the package root
Previously merged, now documented
- Factory-level streaming with fallback (#26) —
generateResponseStreamuses the same circuit-breaker and fallback chain asgenerateResponse - Tool-use loop helper (#28) —
generateResponseWithToolswithToolLoopLimitError,ToolLoopAbortedError, iteration/cost caps, and abort-signal support - Cloudflare AI Gateway metadata forwarding (#29) —
cf-aig-*headers forwarded only whenbaseUrlmatches the Gateway pattern - Cloudflare LoRA / fine-tune forwarding (#51) —
LLMRequest.loraforwarded to Workers AI binding
Bug fixes
stop_sequenceschema false positive — was typed asstring; real Anthropic API returnsnullwhen no stop sequence triggers, causingSchemaDriftErroron every normal response. Fixed tostring-or-null.AnthropicProvider.getProviderBalance()— was calling a non-existent endpoint (/v1/organizations/cost_report). Now returnsunavailablewith a message directing users to the Admin API, matching the Groq pattern.
Full changelog
See CHANGELOG.md for the complete entry.
v1.5.1 — fix Cloudflare llama-3.2 vision silent empty response
Fixed
analyzeImage()silent empty response on Cloudflare —@cf/meta/llama-3.2-11b-vision-instructvia the Workers AI binding requires a raw{ image: number[], prompt, max_tokens }input shape, not the OpenAI-compatiblemessages/image_urlformat. The chat path returnschoices[0].message.content === nullvia the binding, causingextractText()to silently return"". The provider now detects this model and dispatches to the raw binding format. Other vision models are unaffected. Fixes #53.
Full changelog: https://github.com/Stackbilt-dev/llm-providers/blob/main/CHANGELOG.md
v1.5.0
Consolidates the unreleased 1.4.0 scope and undocumented features into a single minor release. 1.4.0 was tagged in package.json but never published to npm; consumers upgrading from 1.3.0 receive all of the following atomically.
Added
- Declarative model catalog (
src/model-catalog.ts) — semantic catalog for provider/model metadata, recommendation use cases, lifecycle status, and runtime scoring - Runtime recommendation API —
LLMProviders#getRecommendedModel(request, useCase?)exposes the same routing logic the factory uses internally - Schema drift envelope validation —
OpenAIProvider,GroqProvider,CerebrasProvider, andAnthropicProvidernow validate response envelopes at the provider boundary, throwingSchemaDriftErroron mismatch instead of corrupting downstream consumers silently LLMProviders.fromEnv()static factory — auto-discovers providers from Cloudflare Workersenvbindings without manual wiring- Model drift test — asserts every provider's
models[]is symmetrically covered by its capabilities map - Catalog tests — coverage for retired-model exclusion, health-aware ranking, request-shape use-case inference
Changed
- Factory routing selects provider/model pairs from the catalog instead of hardcoded ordering
- Health-aware dispatch considers circuit-breaker state including degraded and recovering providers, not just fully open
- Budget-aware dispatch — with a
CreditLedgerattached, selection demotes providers under high utilization or near projected depletion - Provider defaults for OpenAI, Anthropic, Cloudflare, Cerebras, and Groq resolve through the shared catalog
- Cloudflare model recommendation prefers modern active baselines (Gemma 4, GPT-OSS) instead of legacy TinyLlama/Qwen heuristics
- Recommendation exports exclude retired targets (e.g.
gpt-4o) while preserving deprecated constants for compatibility
Deprecated
MODELS.CLAUDE_3_HAIKU— migrate toCLAUDE_HAIKU_4_5orCLAUDE_3_5_HAIKUMODELS.GPT_4O— migrate toGPT_4O_MINIor a current GPT-4 successor
Removed
claude-3-haiku-20240307,gpt-4o, and dead aliasgpt-4-turbo-previewdropped from providermodels[]and capabilities tables. Arbitrary-string passthrough on request inputs is unchanged — consumers pinning olderMODELSenum values via string literals are not affected.
Full changelog: CHANGELOG.md
v1.3.0 — Cloudflare Workers AI vision support
Added
- Cloudflare Workers AI vision support —
CloudflareProvidernow acceptsrequest.imagesand routes to vision-capable models. Previously image data was silently dropped on the CF path. - Three new CF vision models:
@cf/google/gemma-4-26b-a4b-it— 256K context, vision + function calling + reasoning@cf/meta/llama-4-scout-17b-16e-instruct— natively multimodal, tool calling@cf/meta/llama-3.2-11b-vision-instruct— image understanding
CloudflareProvider.supportsVision = true— factory'sanalyzeImagenow dispatches to CF when configured.- Factory default vision fallback —
getDefaultVisionModel()falls back to@cf/google/gemma-4-26b-a4b-itwhen neither Anthropic nor OpenAI is configured, enabling CF-only deployments to useanalyzeImage().
Changed
- Images are passed to CF using the OpenAI-compatible
image_urlcontent-part shape (base64 data URIs). HTTP image URLs throw a helpfulConfigurationError— fetch the image and pass bytes inimage.data. - Attempting
request.imageson a non-vision CF model throws aConfigurationErrornaming the vision-capable alternatives.
Usage
factory.analyzeImage({
image: { data: base64, mimeType: 'image/jpeg' },
prompt: 'Extract recipe data',
model: '@cf/google/gemma-4-26b-a4b-it',
});See #43 for details.
v1.1.0 — Multi-Modal: Image Generation
Image Generation Provider
@stackbilt/llm-providers is now multi-modal — text + image inference under one package.
New: ImageProvider
import { ImageProvider } from '@stackbilt/llm-providers';
const img = new ImageProvider({
cloudflareAi: env.AI,
geminiApiKey: env.GEMINI_API_KEY,
});
const result = await img.generateImage({
prompt: 'a mountain landscape at sunset',
model: 'flux-dev',
});
// result.image: ArrayBuffer, result.responseTime, result.providerBuilt-in Models
| Model | Provider | Use Case |
|---|---|---|
sdxl-lightning |
Cloudflare | Fast drafts, free tier |
flux-klein |
Cloudflare | Balanced quality/speed |
flux-dev |
Cloudflare | Highest CF quality |
gemini-flash-image |
Text rendering capable | |
gemini-flash-image-preview |
Latest preview model |
Extracted from img-forge production codebase. Battle-tested response normalization handles all Workers AI return formats.
Full changelog: CHANGELOG.md
v1.0.0 — Production Release
First stable release. Production-tested in AEGIS cognitive kernel since v1.72.0.
Highlights
- Zero runtime dependencies — supply chain security by design
- 5 providers: OpenAI, Anthropic, Cloudflare Workers AI, Cerebras, Groq
LLMProviders.fromEnv()— one-line multi-provider setup- Graduated circuit breakers — automatic failover with half-open probe recovery
- CreditLedger — per-provider budget tracking with threshold alerts + burn rate projection
- npm provenance — every version cryptographically linked to its source commit
Install
npm install @stackbilt/llm-providersQuick Start
import { LLMProviders } from '@stackbilt/llm-providers';
const llm = LLMProviders.fromEnv(process.env);
const response = await llm.generateResponse({
messages: [{ role: 'user', content: 'Hello!' }],
});See README for full documentation.
See SECURITY.md for supply chain security policy.