Blazing-fast, zero-overhead local LLM router for production AI apps.
Optimize LLM costs and latency by routing prompts to the right model locally. No extra API calls, no network overhead, just smart heuristic classification in < 1ms.
- 💸 Zero-Cost Routing: Runs 100% locally. No expensive LLM-based classification calls.
- ⚡ Ultra-Low Latency: Heuristic-based classification adds less than 1ms to your stack.
- 🧠 Tiered Intelligence: Automatically maps prompts to
SIMPLE,MEDIUM,COMPLEX, orREASONINGtiers. - 🤖 Agentic Detection: Specialized logic to identify multi-step, tool-heavy tasks.
- 🌍 Multilingual Support: Native intent detection for 10+ major languages.
- 🛠️ Developer First: Type-safe, customizable, and works with Bun, Node.js, and Deno.
In high-volume AI applications, using high-end models (like GPT-4o or Claude 3.5 Sonnet) for every request is a waste of both time and money. Traditional routers use another LLM call to classify the prompt, which adds latency and cost.
llm-switchboard solves this by using a high-performance heuristic engine that scores prompts across 14 weighted dimensions instantly.
# Using Bun (Recommended)
bun install llm-switchboard
# Using NPM
npm install llm-switchboard
# Using Yarn
yarn add llm-switchboardllm-switchboard classifies every prompt into one of four tiers, allowing you to map specific models to specific task complexities.
| Tier | Task Type | Ideal For | Default Model |
|---|---|---|---|
🟢 SIMPLE |
Utility | Greetings, yes/no, simple data extraction. | moonshot/kimi-k2.5 |
🟡 MEDIUM |
Creative | Summarization, standard chat, basic coding. | xai/grok-code-fast-1 |
🔴 COMPLEX |
Technical | Systems design, deep analysis, large context. | google/gemini-3.1-pro-preview |
🧠 REASONING |
Logic | Math, proofs, complex debugging, multi-step logic. | xai/grok-4-1-fast-reasoning |
Set your model preferences once at application startup.
import { configureRouter, getProductionModel } from "llm-switchboard";
// Configure your routing table
configureRouter({
tiers: {
SIMPLE: { primary: "meta-llama/llama-3-8b-instruct" },
MEDIUM: { primary: "anthropic/claude-3-haiku" }
},
overrides: {
agenticMode: true
}
});
// Get the best model for a prompt
const model = getProductionModel("What is the weather like in Tokyo?");
console.log(model); // => "meta-llama/llama-3-8b-instruct"Override global settings for specific, high-priority, or sensitive prompts.
const model = getProductionModel(prompt, {
customTiers: {
COMPLEX: {
primary: "local-mixtral-8x7b",
fallback: [] // No cloud fallbacks for privacy
}
}
});The classification engine analyzes prompts across multiple dimensions including:
- Token Density: Estimating semantic weight vs. length.
- Syntactic Markers: Detecting code chunks, mathematical notation, and imperative verbs.
- Instruction Depth: Identifying complex formatting demands (JSON, Tables, CSV).
- Agentic Signatures: Multi-step planning patterns and tool-use intent.
- Domain Context: Scanning for technical terminology and high-entropy keywords.
We include a comprehensive test suite to help you benchmark classification accuracy.
bun run testMIT © Uo1428