Skip to content

Uo1428/llm-switchboard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔀 llm-switchboard

Blazing-fast, zero-overhead local LLM router for production AI apps.

NPM Version License GitHub Stars TypeScript

Optimize LLM costs and latency by routing prompts to the right model locally. No extra API calls, no network overhead, just smart heuristic classification in < 1ms.


🌟 Key Features

  • 💸 Zero-Cost Routing: Runs 100% locally. No expensive LLM-based classification calls.
  • Ultra-Low Latency: Heuristic-based classification adds less than 1ms to your stack.
  • 🧠 Tiered Intelligence: Automatically maps prompts to SIMPLE, MEDIUM, COMPLEX, or REASONING tiers.
  • 🤖 Agentic Detection: Specialized logic to identify multi-step, tool-heavy tasks.
  • 🌍 Multilingual Support: Native intent detection for 10+ major languages.
  • 🛠️ Developer First: Type-safe, customizable, and works with Bun, Node.js, and Deno.

🚀 Why llm-switchboard?

In high-volume AI applications, using high-end models (like GPT-4o or Claude 3.5 Sonnet) for every request is a waste of both time and money. Traditional routers use another LLM call to classify the prompt, which adds latency and cost.

llm-switchboard solves this by using a high-performance heuristic engine that scores prompts across 14 weighted dimensions instantly.


📦 Installation

# Using Bun (Recommended)
bun install llm-switchboard

# Using NPM
npm install llm-switchboard

# Using Yarn
yarn add llm-switchboard

🚦 Smart Tiering System

llm-switchboard classifies every prompt into one of four tiers, allowing you to map specific models to specific task complexities.

Tier Task Type Ideal For Default Model
🟢 SIMPLE Utility Greetings, yes/no, simple data extraction. moonshot/kimi-k2.5
🟡 MEDIUM Creative Summarization, standard chat, basic coding. xai/grok-code-fast-1
🔴 COMPLEX Technical Systems design, deep analysis, large context. google/gemini-3.1-pro-preview
🧠 REASONING Logic Math, proofs, complex debugging, multi-step logic. xai/grok-4-1-fast-reasoning

📖 Usage

⚙️ Global Configuration

Set your model preferences once at application startup.

import { configureRouter, getProductionModel } from "llm-switchboard";

// Configure your routing table
configureRouter({
  tiers: {
    SIMPLE: { primary: "meta-llama/llama-3-8b-instruct" },
    MEDIUM: { primary: "anthropic/claude-3-haiku" }
  },
  overrides: {
    agenticMode: true
  }
});

// Get the best model for a prompt
const model = getProductionModel("What is the weather like in Tokyo?");
console.log(model); // => "meta-llama/llama-3-8b-instruct"

🎯 Per-Request Overrides

Override global settings for specific, high-priority, or sensitive prompts.

const model = getProductionModel(prompt, {
  customTiers: {
    COMPLEX: { 
      primary: "local-mixtral-8x7b",
      fallback: [] // No cloud fallbacks for privacy
    }
  }
});

📊 How it Works

The classification engine analyzes prompts across multiple dimensions including:

  • Token Density: Estimating semantic weight vs. length.
  • Syntactic Markers: Detecting code chunks, mathematical notation, and imperative verbs.
  • Instruction Depth: Identifying complex formatting demands (JSON, Tables, CSV).
  • Agentic Signatures: Multi-step planning patterns and tool-use intent.
  • Domain Context: Scanning for technical terminology and high-entropy keywords.

🧪 Development & Testing

We include a comprehensive test suite to help you benchmark classification accuracy.

bun run test

📄 License

MIT © Uo1428

Built with ❤️ for the open-source AI community.

About

Blazing-fast, zero-cost local LLM router. Classify and route prompts to specialized AI models with <1ms latency using heuristic rules.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages