Skip to content

backryun/OmniRoute

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2,467 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

🚀 OmniRoute — The Free AI Gateway

Never stop coding. Save 15-95% eligible tokens with RTK+Caveman compression + auto-fallback to FREE & low-cost AI models.

The most complete open-source AI proxy — one endpoint, 160+ providers, 13 routing strategies, zero downtime. Multi-platform: Web, Desktop (Electron), Mobile (PWA + Termux). Fully extensible via MCP Server (37 tools), A2A Protocol, and Memory/Skills systems. Available in 40+ languages.

Chat Completions • Responses API • Embeddings • Image Generation • Video • Music • Audio Speech/Transcription • Reranking • Moderations • Web Search • MCP Server • A2A Protocol • 4,600+ Tests • 100% TypeScript


Get $100 Free AI Credits

🔥 Limited offer: Sign up at AgentRouter and get $100 in free AI credits
Access GPT-5, Claude, Gemini, DeepSeek & 100+ models. No credit card required. Claim your credits →


diegosouzapw%2FOmniRoute | Trendshift

🚀 Quick Start💡 Features🗜️ Compression💰 Pricing🎯 Use Cases🌍 Proxy❓ FAQ📖 Docs💬 WhatsApp


🌐 Available in: 🇺🇸 English | 🇧🇷 Português (Brasil) | 🇪🇸 Español | 🇫🇷 Français | 🇮🇹 Italiano | 🇷🇺 Русский | 🇨🇳 中文 (简体) | 🇩🇪 Deutsch | 🇮🇳 हिन्दी | 🇹🇭 ไทย | 🇺🇦 Українська | 🇸🇦 العربية | 🇯🇵 日本語 | 🇻🇳 Tiếng Việt | 🇧🇬 Български | 🇩🇰 Dansk | 🇫🇮 Suomi | 🇮🇱 עברית | 🇭🇺 Magyar | 🇮🇩 Bahasa Indonesia | 🇰🇷 한국어 | 🇲🇾 Bahasa Melayu | 🇳🇱 Nederlands | 🇳🇴 Norsk | 🇵🇹 Português (Portugal) | 🇷🇴 Română | 🇵🇱 Polski | 🇸🇰 Slovenčina | 🇸🇪 Svenska | 🇵🇭 Filipino | 🇨🇿 Čeština

npm version NPM Weekly NPM Monthly NPM Yearly

Docker Hub Docker Pulls Electron Downloads license

total contributions github streak Website WhatsApp


🖼️ Main Dashboard

OmniRoute Dashboard

📸 Dashboard Preview

Click to see dashboard screenshots
Page Screenshot
Providers Providers
Combos Combos
Analytics Analytics
Health Health
Translator Translator
Settings Settings
CLI Tools CLI Tools
Usage Logs Usage
Endpoints Endpoints

🤖 Free AI Provider for your favorite coding agents

Connect any AI-powered IDE or CLI tool through OmniRoute — free API gateway for unlimited coding.

OpenClaw
OpenClaw

⭐ 205K
NanoBot
NanoBot

⭐ 20.9K
PicoClaw
PicoClaw

⭐ 14.6K
ZeroClaw
ZeroClaw

⭐ 9.9K
IronClaw
IronClaw

⭐ 2.1K
OpenCode
OpenCode

⭐ 106K
Codex CLI
Codex CLI

⭐ 60.8K
Claude Code
Claude Code

⭐ 67.3K
Gemini CLI
Gemini CLI

⭐ 94.7K
Kilo Code
Kilo Code

⭐ 15.5K

📡 All agents connect via http://localhost:20128/v1 or http://cloud.omniroute.online/v1 — one config, unlimited models and quota


📺 OmniRoute in Action — Video Guides

OmniRoute — Guia em Português
🇧🇷 Português
Guia completo do OmniRoute
OmniRoute — English Guide
🇺🇸 English
Complete OmniRoute walkthrough
OmniRoute — Руководство на русском
🇷🇺 Русский
Полное руководство по OmniRoute

🎬 Made a video about OmniRoute? We'd love to feature it here! Open an issue or discussion with the link and we'll add it to this showcase.


🤔 Why OmniRoute?

Stop wasting money, tokens and hitting limits:

❌ Subscription quota expires unused every month ❌ Rate limits stop you mid-coding ❌ Tool outputs (git diff, grep, ls...) burn tokens fast ❌ Expensive APIs ($20-50/month per provider) ❌ Manual switching between providers ❌ Each provider has a different API format ❌ AI providers blocked in your country

OmniRoute solves all of this:

Prompt Compression — auto-compress prompts & tool outputs, save 15-95% eligible tokens per request with RTK+Caveman stacked mode ✅ Maximize subscriptions — track quota, use every bit before reset ✅ Auto fallback — Subscription → API Key → Cheap → Free, zero downtime ✅ Multi-account — round-robin between accounts per provider ✅ Format translation — OpenAI ↔ Claude ↔ Gemini ↔ Responses API, any tool works ✅ 3-level proxy — bypass geo-blocks with global, per-provider, and per-key proxies ✅ 10 multi-modal APIs — chat, images, video, music, audio, search in one endpoint ✅ MCP + A2A — 29 MCP tools + agent-to-agent protocol, production-ready ✅ Universal — works with Claude Code, Codex, Gemini CLI, Cursor, Cline, OpenClaw, any CLI tool


📧 Support

💬 Join our community! WhatsApp Group — Get help, share tips, and stay updated.

🐛 Reporting a Bug?

When opening an issue, please run the system-info command and attach the generated file:

npm run system-info

This generates a system-info.txt with your Node.js version, OmniRoute version, OS details, installed CLI tools (qoder, gemini, claude, codex, antigravity, droid, etc.), Docker/PM2 status, and system packages — everything we need to reproduce your issue quickly. Attach the file directly to your GitHub issue.


🛠️ Supported CLI Tools

OmniRoute works seamlessly with 16+ AI coding tools — one config, all tools:

Claude Code
Anthropic
Codex CLI
OpenAI
Gemini CLI
Google
Cursor
IDE
OpenClaw
CLI
Antigravity
VS Code
Cline
Extension
Continue
Extension
Kilo Code
Extension
Kiro
AWS IDE
OpenCode
CLI
Droid
CLI
AMP
CLI
Copilot
GitHub
Windsurf
IDE
Hermes
CLI
Qwen CLI
Alibaba
Custom
Any tool

📖 Full setup for each tool: docs/CLI-TOOLS.md


🌐 Supported Providers — 160+

🔐 OAuth Providers

Claude Code
Anthropic OAuth
Antigravity
Google OAuth
Codex
OpenAI OAuth
GitHub Copilot
GitHub OAuth
Cursor
Cursor OAuth
Kimi Coding
Moonshot OAuth
Kilo Code
Kilo OAuth
Cline
Cline OAuth

🆓 Free Providers (No Cost)

🟢 Kiro AI
Claude Sonnet/Haiku
Unlimited FREE
🟢 Qoder AI
Kimi-K2, DeepSeek-R1
Unlimited FREE
🟢 Pollinations
GPT-5, Claude, Llama 4
No API key needed
🟢 Qwen Code
Qwen3 Coder Plus
Unlimited FREE
🟢 LongCat AI
Flash-Lite
50M tokens/day
🟢 Cloudflare AI
50+ models
10K neurons/day
🟢 Puter AI
GPT-4.1, Claude
Rate-limited free
🟢 NVIDIA NIM
Llama, Mistral
1K req/day free

🔑 API Key Providers (120+)

OpenAI Anthropic Gemini DeepSeek Groq xAI (Grok)
Mistral OpenRouter GLM Kimi MiniMax Fireworks
Together AI Cerebras Cohere NVIDIA Perplexity SiliconFlow
Nebius HuggingFace DeepInfra SambaNova Vertex AI Azure OpenAI
AWS Bedrock Snowflake Databricks Venice.ai AI21 Labs Meta Llama
...and 90+ more providers

Alibaba · Amazon Q · AssemblyAI · Baidu Qianfan · Baseten · Black Forest Labs · Blackbox · Brave Search · Bytez · CablyAI · Cartesia · ChatGPT Web · Chutes.ai · Clarifai · Codestral · CrofAI · DataRobot · Deepgram · ElevenLabs · Empower · Exa Search · Fal.ai · Featherless AI · FenayAI · FriendliAI · Galadriel · GigaChat · GitLab Duo · GLHF Chat · GoAPI · Heroku AI · Hyperbolic · IBM watsonx · Inference.net · Inworld · Jina AI · Kilo Gateway · Lambda AI · LaoZhang · Linkup Search · LlamaGate · Maritalk · Modal · Moonshot AI · Morph · Muse Spark · NanoBanana · NanoGPT · NLP Cloud · Nous Research · Novita AI · nScale · OCI · Ollama Cloud · OVHcloud · PiAPI · PlayHT · Poe · Predibase · PublicAI · Qwen Code · Recraft · Reka · Runway · SAP · Scaleway · SearchAPI · SearXNG · Serper · Stability AI · Synthetic · Tavily · TheB.AI · Topaz · Upstage · v0 (Vercel) · Vercel AI Gateway · Volcengine · Voyage AI · W&B Inference · Xiaomi MiMo · You.com · Z.AI · + OpenAI/Anthropic-compatible custom endpoints

🏠 Self-Hosted

LM Studio Ollama vLLM Llamafile Docker Model Runner
NVIDIA Triton XInference oobabooga ComfyUI SD WebUI

🔄 How It Works

┌─────────────┐
│  Your CLI   │  (Claude Code, Codex, Gemini CLI, OpenClaw, Cursor, Cline...)
│   Tool      │
└──────┬──────┘
       │ http://localhost:20128/v1
       ↓
┌──────────────────────────────────────────────────┐
│              OmniRoute (Smart Router)             │
│  • 🗜️ Prompt Compression (save 15-95% eligible)  │
│  • Format translation (OpenAI ↔ Claude ↔ Gemini) │
│  • Quota tracking + Embeddings + Images          │
│  • Auto token refresh + Rate limit management    │
└──────┬───────────────────────────────────────────┘
       │
       ├─→ [Tier 1: SUBSCRIPTION] Claude Code, Codex, Gemini CLI
       │   ↓ quota exhausted
       ├─→ [Tier 2: API KEY] DeepSeek, Groq, xAI, Mistral, NVIDIA NIM, etc.
       │   ↓ budget limit
       ├─→ [Tier 3: CHEAP] GLM ($0.6/1M), MiniMax ($0.2/1M)
       │   ↓ budget limit
       └─→ [Tier 4: FREE] Qoder, Qwen, Kiro (unlimited)

Result: Never stop coding, minimal cost + 15-95% eligible token savings

🗜️ Prompt Compression — Save 15-95% Eligible Tokens Automatically

Why use many token when few token do trick? OmniRoute's built-in compression pipeline reduces token usage before requests reach the provider. It combines ideas from RTK - Rust Token Killer and Caveman (⭐ 51K+).

How It Works

Every request passes through the compression pipeline transparently — no client changes needed:

┌──────────────────┐     ┌─────────────────────────────┐     ┌──────────────┐
│   Client sends   │────▶│  OmniRoute Compression      │────▶│  Provider    │
│   full prompt    │     │  Pipeline (7 options)        │     │  receives    │
│   (10,000 tok)   │     │                              │     │  compressed  │
│                  │     │  🪶 Lite ........... ~15%     │     │  (~1,080 tok)│
│                  │     │  🪨 Standard ....... ~30%     │     │              │
│                  │     │  ⚡ Aggressive ..... ~50%     │     │  💰 up to 95%│
│                  │     │  🔥 Ultra .......... ~75%     │     │              │
│                  │     │  🧰 RTK ............ 60-90%    │     │              │
│                  │     │  🔗 Stacked ........ 78-95%    │     │              │
└──────────────────┘     └─────────────────────────────┘     └──────────────┘

7 Compression Options

Mode Savings Technique Best For
Off 0% No compression When you need exact prompts
🪶 Lite ~15% Whitespace collapse, dedup system prompts, image URL shortening Always-on safe default
🪨 Standard (Caveman) ~30% 30+ regex rules: filler removal, context condensation, structural compression, multi-turn dedup Daily coding with Claude/Codex
⚡ Aggressive ~50% All standard + progressive message aging + tool result summarization + LLM-based compression Long sessions with many tool calls
🔥 Ultra ~75% All aggressive + heuristic token pruning + stopword removal + score-based filtering Maximum savings when tokens are scarce
🧰 RTK 60-90% 49 command-aware filters, RTK-style JSON DSL, verify gate, trust-gated custom filters Shell/test/build/git output in agents
🔗 Stacked 78-95% RTK first, then Caveman input condensation; ~89% with upstream average math Mixed prompts with tool logs + prose

RTK + Caveman Savings Math

These numbers are based on the upstream project READMEs under _references/_outros:

Source Upstream claim used by OmniRoute docs
Caveman ~75% fewer output tokens; benchmark average 65% output savings, range 22-87%; ~46% input compression tool
RTK 60-90% command-output token savings; sample session ~118,000 -> ~23,900 tokens, which is 79.7% saved (~80%)

For the default stacked compression combo, OmniRoute runs:

RTK -> Caveman

When both engines can act on the same tool/context payload, the savings compound:

combined = 1 - (1 - RTK savings) * (1 - Caveman input savings)
average  = 1 - (1 - 0.80) * (1 - 0.46) = 89.2%
range    = 1 - (1 - 0.60..0.90) * (1 - 0.46) = 78.4-94.6%

Caveman output mode is separate from prompt compression. When enabled for responses, use Caveman's own upstream output numbers: 65% average, ~75% headline, 22-87% observed range. Total bill savings depend on the prompt/output mix, but coding-agent sessions are often tool-context heavy, so the RTK -> Caveman combo is the best default for maximum context savings.

Before & After (Standard/Caveman Mode)

🗣️ Before compression (69 tokens):

"The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I would recommend using useMemo to memoize the object."

🪨 After compression (19 tokens):

"New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo."

Same answer. 72% less tokens. Zero accuracy loss.

Architecture

Request Body
  │
  ├─ strategySelector.ts ─── Picks mode (config / combo override / auto-trigger)
  │
  ├─ lite.ts ─────────────── Whitespace, dedup, image URLs, redundant content
  ├─ caveman.ts ──────────── 30+ regex rules via cavemanRules.ts
  │   └─ preservation.ts ─── Protects code blocks, URLs, JSON from compression
  ├─ engines/rtk/ ────────── Command detection + JSON DSL filters + raw-output recovery
  ├─ engines/registry.ts ─── Shared engine registry for caveman, RTK, and stacked
  ├─ aggressive.ts ───────── Summarizer + tool result compressor + progressive aging
  │   ├─ summarizer.ts ───── Rule-based message summarization
  │   ├─ toolResultCompressor.ts ── file/grep/shell/JSON/error compression
  │   └─ progressiveAging.ts ──── Older messages → shorter summaries
  └─ ultra.ts ────────────── Heuristic token scoring + pruning
      └─ ultraHeuristic.ts ─ Stopword detection, score thresholds, force-preserve

Configuration

Dashboard → Context & Cache → Caveman / RTK / Compression Combos

Or per-combo override:

{
  "comboOverrides": {
    "my-coding-combo": "standard",
    "my-cheap-combo": "ultra"
  }
}

Auto-trigger: set autoTriggerTokens to automatically enable compression when a request exceeds a token threshold.

Compression combos can also assign a named compression pipeline to routing combos, so a coding combo can use RTK + Caveman while a paid subscription combo stays on lite mode.

🪨 Fun fact: The standard/caveman mode is inspired by Caveman — the viral project that reports 65% average output-token savings while keeping technical accuracy. OmniRoute takes this further with a 7-option pipeline and a default RTK -> Caveman combo that can reach ~89% average savings on eligible tool/context payloads.

📖 Full compression documentation: docs/COMPRESSION_GUIDE.mddocs/RTK_COMPRESSION.mddocs/COMPRESSION_ENGINES.mddocs/COMPRESSION_RULES_FORMAT.mddocs/COMPRESSION_LANGUAGE_PACKS.md


🎯 What OmniRoute Solves

Every developer using AI tools faces these problems daily. OmniRoute solves them all.

# Problem OmniRoute Solution
💸 Subscription quota expires mid-coding Smart 4-Tier Fallback — auto-routes Subscription → API Key → Cheap → Free
🔌 Each provider has a different API format Format Translation — unified endpoint translates OpenAI ↔ Claude ↔ Gemini ↔ Responses
🌐 AI providers block my country/region 3-Level Proxy — global, per-provider, and per-key proxy with TLS fingerprint spoofing
🆓 Can't afford AI subscriptions 11 Free Providers — Kiro, Qoder, Pollinations, LongCat, Cloudflare AI, NVIDIA NIM...
🔒 Gateway is exposed without protection API Key Management — scoping, rotation, IP filtering, rate limiting, prompt injection guard
🛑 Provider went down, lost coding flow Circuit Breakers — auto-failover with cooldown, retry, anti-thundering herd
🔧 Configuring each CLI tool is tedious CLI Tools Dashboard — one-click setup for Claude Code, Codex, Cursor, OpenClaw, Kilo
🔑 Managing OAuth tokens is hell Auto Token Refresh — OAuth PKCE for 8 providers, multi-account, LAN/remote fix
📊 Don't know how much I'm spending Cost Analytics — per-token tracking, budget limits, usage stats per API key
🐛 Can't diagnose errors in AI calls Unified Logs — 4-tab dashboard (request, proxy, audit, console) + p50/p95/p99 telemetry
📖 See all 31 problems OmniRoute solves
# Problem Solution
11 Deploying/maintaining is complex npm global, Docker multi-arch, Electron, Termux — deploy anywhere
12 Interface is English-only 40+ languages with RTL support
13 Need more than chat (images, audio, video) 10 multi-modal APIs: embeddings, images, video, music, TTS, STT, moderation, rerank, search, batch
14 No way to test/compare models LLM Evals, Translator Playground, Chat Tester, Live Monitor
15 Need to scale without losing performance Semantic cache, request dedup, rate limit detection, queue & pacing
16 Want to control model behavior globally System prompt injection, thinking budget, wildcard routing
17 Need MCP tools as first-class features 29 MCP tools, 3 transports (stdio/SSE/HTTP), 10 scopes, audit trail
18 Need A2A orchestration JSON-RPC 2.0 + SSE streaming, task lifecycle, sync + stream paths
19 Need real MCP process health Runtime heartbeat, PID tracking, UI status cards
20 Need auditable MCP execution SQLite-backed audit with filters, pagination, stats
21 Need scoped MCP permissions 10 granular scopes per integration
22 Need operational controls without redeploying Combo switches, resilience tuning, breaker resets from dashboard
23 Need A2A task lifecycle visibility Task listing/filtering, drill-down, cancellation
24 Need active stream metrics Active stream counters, per-state counts, A2A dashboard cards
25 Need standard agent discovery Agent Card at /.well-known/agent.json
26 Need protocol discoverability Consolidated Endpoints page with Proxy, MCP, A2A, API tabs
27 Need E2E protocol validation Real MCP SDK + A2A client flows in test:protocols:e2e
28 Need unified observability Health + audit + telemetry across OpenAI, MCP, and A2A layers
29 Need one runtime for proxy + tools + agents OpenAI proxy + MCP + A2A in one stack with shared auth/resilience
30 Need agentic workflows without glue-code Unified endpoint, protocol UIs, production-ready foundations
31 Long sessions crash with context limits Proactive context compression, structural integrity guards, multi-layer dropping

📖 Deep dives: Resilience GuideProxy GuideSetup GuideCompression Guide


🆓 Start Free — Zero Configuration Cost

Setup AI coding in minutes at $0/month. Connect these free accounts and use the built-in Free Stack combo.

Step Action Providers Unlocked
1 Connect Kiro (AWS Builder ID OAuth) Claude Sonnet 4.5, Haiku 4.5 — unlimited
2 Connect Qoder (Google OAuth) kimi-k2-thinking, qwen3-coder-plus, deepseek-r1... — unlimited
3 Connect Qwen (Device Code) qwen3-coder-plus, qwen3-coder-flash... — unlimited
4 Connect Gemini CLI (Google OAuth) gemini-3-flash, gemini-2.5-pro — 180K/mo free
5 /dashboard/combosFree Stack ($0) template Round-robin all free providers automatically

Point any IDE/CLI to: http://localhost:20128/v1 · API Key: any-string · Done.

Optional extra coverage (also free): Groq API key (30 RPM free), NVIDIA NIM (40 RPM free, 70+ models), Cerebras (1M tok/day), LongCat API key (50M tokens/day!), Cloudflare Workers AI (10K Neurons/day, 50+ models).

⚡ Quick Start

1) Install and run

npm install -g omniroute
omniroute

Dashboard opens at http://localhost:20128 · API at http://localhost:20128/v1.

2) Connect providers

  1. Dashboard → Providers → connect at least one provider (OAuth or API key)
  2. Dashboard → Endpoints → create an API key
  3. Dashboard → Combos → set your fallback chain (optional)

3) Point your coding tool

Base URL: http://localhost:20128/v1
API Key:  [copy from Endpoint page]
Model:    if/kimi-k2-thinking (or any provider/model)

Works with Claude Code, Codex CLI, Gemini CLI, Cursor, Cline, OpenClaw, OpenCode, and any OpenAI-compatible tool.

📦 More install methods (Docker, source, Arch, Void, pnpm)

Docker:

docker run -d --name omniroute --restart unless-stopped -p 20128:20128 -v omniroute-data:/app/data diegosouzapw/omniroute:latest

From source:

cp .env.example .env && npm install
PORT=20128 DASHBOARD_PORT=20129 NEXT_PUBLIC_BASE_URL=http://localhost:20129 npm run dev

pnpm: pnpm install -g omniroute && pnpm approve-builds -g && omniroute

Arch Linux (AUR): yay -S omniroute-bin && systemctl --user enable --now omniroute.service

MCP: omniroute --mcp (stdio transport)

CLI options: omniroute --port 3000, omniroute --no-open, omniroute --help

Split-port mode: PORT=20128 DASHBOARD_PORT=20129 omniroute

Uninstall: npm run uninstall (keeps data) or npm run uninstall:full (removes everything)

📖 Full details: Setup Guide · Docker · Void Linux template


🐳 Docker

OmniRoute is available as a public Docker image on Docker Hub.

Quick run:

docker run -d \
  --name omniroute \
  --restart unless-stopped \
  --stop-timeout 40 \
  -p 20128:20128 \
  -v omniroute-data:/app/data \
  diegosouzapw/omniroute:latest

With environment file:

# Copy and edit .env first
cp .env.example .env

docker run -d \
  --name omniroute \
  --restart unless-stopped \
  --stop-timeout 40 \
  --env-file .env \
  -p 20128:20128 \
  -v omniroute-data:/app/data \
  diegosouzapw/omniroute:latest

Using Docker Compose:

# Base profile (no CLI tools)
docker compose --profile base up -d

# CLI profile (Claude Code, Codex, OpenClaw built-in)
docker compose --profile cli up -d

Dashboard support for Docker deployments now includes a one-click Cloudflare Quick Tunnel on Dashboard → Endpoints. The first enable downloads cloudflared only when needed, starts a temporary tunnel to your current /v1 endpoint, and shows the generated https://*.trycloudflare.com/v1 URL directly below your normal public URL. Endpoint tunnel panels, including Cloudflare, Tailscale, and ngrok, can be shown or hidden from Settings → Appearance without changing active tunnel state.

Notes:

  • Quick Tunnel URLs are temporary and change after every restart.
  • Quick Tunnels are not auto-restored after an OmniRoute or container restart. Re-enable them from the dashboard when needed.
  • Managed install currently supports Linux, macOS, and Windows on x64 / arm64.
  • Managed Quick Tunnels default to HTTP/2 transport to avoid noisy QUIC UDP buffer warnings in constrained container environments. Set CLOUDFLARED_PROTOCOL=quic or auto if you want a different transport.
  • Docker images bundle system CA roots and pass them to managed cloudflared, which avoids TLS trust failures when the tunnel bootstraps inside the container.
  • SQLite runs in WAL mode. docker stop should be allowed to finish so OmniRoute can checkpoint the latest changes back into storage.sqlite.
  • The bundled Compose files already set a 40s stop grace period. If you run the image directly, keep --stop-timeout 40 (or similar) so manual stops do not cut off shutdown cleanup.
  • Set CLOUDFLARED_BIN=/absolute/path/to/cloudflared if you want OmniRoute to use an existing binary instead of downloading one.

Using Docker Compose with Caddy (HTTPS Auto-TLS):

OmniRoute can be securely exposed using Caddy's automatic SSL provisioning. Ensure your domain's DNS A record points to your server's IP.

services:
  omniroute:
    image: diegosouzapw/omniroute:latest
    container_name: omniroute
    restart: unless-stopped
    volumes:
      - omniroute-data:/app/data
    environment:
      - PORT=20128
      - NEXT_PUBLIC_BASE_URL=https://your-domain.com

  caddy:
    image: caddy:latest
    container_name: caddy
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    command: caddy reverse-proxy --from https://your-domain.com --to http://omniroute:20128

volumes:
  omniroute-data:
Image Tag Size Description
diegosouzapw/omniroute latest ~250MB Latest stable release
diegosouzapw/omniroute 3.7.8 ~250MB Current version

📖 Full Docker documentation: docs/DOCKER_GUIDE.md — Compose profiles, Caddy HTTPS, Cloudflare tunnels, and more.


📱 Multi-Platform — Run Anywhere

OmniRoute runs on Web, Desktop (Electron), Android (Termux), and as a Progressive Web App (PWA).

Platform Install Highlights
🖥️ Desktop npm run electron:build Native window, system tray, auto-start, offline mode — Windows/macOS/Linux
📱 Android pkg install nodejs-lts && npx -y omniroute ARM native, no root, 24/7 via Termux:Boot — your phone is an AI server
📲 PWA "Add to Home Screen" in browser Fullscreen, offline page, service worker caching — Android/iOS/Desktop
🖥️ Desktop App details
  • Native Electron app with system tray, auto-start, native notifications
  • One-click install: NSIS (Windows), DMG (macOS), AppImage (Linux)
  • Dev: npm run electron:dev · Build: npm run electron:build
  • 📖 Full docs: electron/README.md
📱 Android (Termux) details
pkg update && pkg install nodejs-lts python build-essential git
npx -y omniroute@latest

Access from any device on the same network: http://PHONE_IP:20128/v1

📲 PWA details
  • Android (Chrome): ⋮ → "Add to Home screen"
  • iOS (Safari): Share → "Add to Home Screen"
  • Desktop (Chrome/Edge): Install icon in address bar
  • 📖 Full docs: docs/PWA_GUIDE.md

🌍 Bypass Geographic Blocks — Use AI From Any Country

🇷🇺 🇨🇳 🇮🇷 🇨🇺 🇹🇷 In Russia, China, Iran, or any blocked region? OmniRoute's 3-level proxy system solves this completely.

Level Badge Configure In Use Case
Global 🟢 Settings → Proxy All traffic through one proxy
Per-Provider 🟡 Provider → Proxy Only specific providers proxied
Per-Connection 🔵 Connection → Proxy Each API key uses its own proxy

What gets proxied: API requests ✅ • OAuth flows ✅ • Connection tests ✅ • Token refresh ✅ • Model sync ✅

Protocols: HTTP/HTTPS, SOCKS5 (ENABLE_SOCKS5_PROXY=true), Authenticated proxies

🆓 1proxy — Free Proxy Marketplace

Contributed by @oyi77#1847

No proxy? Use the built-in 1proxy integration for hundreds of free, validated proxies worldwide:

  • One-click sync (up to 500 proxies) • Quality scores (0-100) • Country filter • Auto-rotation (quality/random/sequential) • Auto-degradation • Circuit breaker

Anti-Detection

  • 🔒 TLS Fingerprint Spoofing — browser-like TLS via wreq-js
  • 🔏 CLI Fingerprint Matching — matches native CLI binary signatures
  • 🏠 Proxy IP Preservation — stealth + IP masking simultaneously

📖 Full proxy documentation: docs/PROXY_GUIDE.md



💰 Pricing at a Glance

Tier Provider Cost Quota Reset Best For
💳 SUBSCRIPTION Claude Code (Pro) $20/mo 5h + weekly Already subscribed
Codex (Plus/Pro) $20-200/mo 5h + weekly OpenAI users
Gemini CLI FREE 180K/mo + 1K/day Everyone!
GitHub Copilot $10-19/mo Monthly GitHub users
🔑 API KEY NVIDIA NIM FREE (dev forever) ~40 RPM 70+ open models
Cerebras FREE (1M tok/day) 60K TPM / 30 RPM World's fastest
Groq FREE (30 RPM) 14.4K RPD Ultra-fast Llama/Gemma
DeepSeek V3.2 $0.27/$1.10 per 1M None Best price/quality reasoning
xAI Grok-4 Fast $0.20/$0.50 per 1M 🆕 None Fastest + tool calling, ultralow
xAI Grok-4 (standard) $0.20/$1.50 per 1M 🆕 None Reasoning flagship from xAI
Mistral Free trial + paid Rate limited European AI
OpenRouter Pay-per-use None 100+ models aggr.
AgentRouter 🆕 Pay-per-use None $200 free credits at signup
💰 CHEAP GLM-5 (via Z.AI) 🆕 $0.5/1M Daily 10AM 128K output, newest flagship
GLM-4.7 $0.6/1M Daily 10AM Budget backup
MiniMax M2.5 🆕 $0.3/1M input 5-hour rolling Reasoning + agentic tasks
MiniMax M2.1 $0.2/1M 5-hour rolling Cheapest option
Kimi K2.5 (Moonshot API) 🆕 Pay-per-use None Direct Moonshot API access
Kimi K2 $9/mo flat 10M tokens/mo Predictable cost
🆓 FREE Qoder $0 Unlimited 5 models unlimited
Qwen $0 Unlimited 4 models unlimited
Kiro $0 Unlimited Claude Sonnet/Haiku (AWS Builder)
LongCat Flash-Lite 🆕 $0 (50M tok/day 🔥) 1 RPS Largest free quota on Earth
Pollinations AI 🆕 $0 (no key needed) 1 req/15s GPT-5, Claude, DeepSeek, Llama 4
Cloudflare Workers AI 🆕 $0 (10K Neurons/day) ~150 resp/day 50+ models, global edge
Scaleway AI 🆕 $0 (1M tokens total) Rate limited EU/GDPR, Qwen3 235B, Llama 70B

🆕 New models added (Mar 2026): Grok-4 Fast family at $0.20/$0.50/M (benchmarked at 1143ms — 30% faster than Gemini 2.5 Flash), GLM-5 via Z.AI with 128K output, MiniMax M2.5 reasoning, DeepSeek V3.2 updated pricing, Kimi K2.5 via Moonshot direct API.

💡 See the full $0 Free Stack (11 providers) below.

💡 Understanding Dashboard Costs:

The "cost" displayed in the Usage Analytics page is for tracking and comparison purposes only. OmniRoute itself never charges you anything — it's free, open-source software running on your machine. If your dashboard shows "$290 total cost" while using free models, that's how much you saved compared to paid API pricing. Think of it as a savings tracker, not a bill.


🆓 Free Models — 11 Providers, $0 Forever

Combine all free providers into one unbreakable combo — OmniRoute auto-routes between them when quota runs out.

Provider Prefix Free Models Quota
Kiro kr/ Claude Sonnet 4.5, Haiku 4.5, Opus 4.6 50 CREDITS per month
Qoder if/ kimi-k2-thinking, qwen3-coder-plus, deepseek-r1, minimax-m2.1 ♾️ Unlimited
Qwen qw/ qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-next ♾️ Unlimited
Pollinations pol/ GPT-5, Claude, Gemini, DeepSeek, Llama 4, Mistral No key needed
LongCat lc/ LongCat-Flash-Lite 50M tokens/day 🔥
Gemini CLI gc/ gemini-3-flash, gemini-2.5-pro 180K tok/mo
Cloudflare AI cf/ 50+ models (Llama, Gemma, Mistral, Whisper) 10K Neurons/day
Groq groq/ Llama 3.3 70B, Qwen3 32B, Kimi K2 14.4K RPD
NVIDIA NIM nvidia/ 129 models (DeepSeek, Llama, GLM, Kimi) ~40 RPM
Cerebras cerebras/ Qwen3 235B, GPT-OSS 120B, Llama 3.1 1M tok/day
Scaleway scw/ Qwen3 235B, Llama 70B, DeepSeek V3 1M tokens (EU)
📖 25+ more free providers — Groq, Cerebras, Mistral, GitHub Models, OpenRouter, and more

Also free (API Key required): Mistral (1B tok/month) · OpenRouter (35+ :free models) · GitHub Models (GPT-5, 45+ models) · Cohere (1K calls/month) · Z.AI/GLM (permanent free Flash models) · SiliconFlow (1K RPM, 50K TPM) · Kilo Code (~200 req/hr auto-router) · HuggingFace ($0.10/mo credits) · Ollama Cloud (400+ models) · LLM7.io (30+ models) · Kluster AI · IBM watsonx (300K tok/month) · OpenCode Zen · Vercel AI Gateway ($5/mo)

Trial credits (one-time): Baseten ($30) · NLP Cloud ($15) · AI21 ($10) · Upstage ($10) · SambaNova ($5) · Modal ($5/mo) · Fireworks ($1) · Nebius ($1) · Inference.net ($1 + $25 survey) · Hyperbolic ($1) · Novita ($0.50)

China-based (free tiers): ModelScope · Tencent Hunyuan · Volcengine · ChatAnywhere · InternAI · Bigmodel

Combined capacity: ~31,000+ RPD · ~32B+ tokens/month · 500+ models · $0

📖 Complete free provider directory: docs/FREE_TIERS.md — 25+ providers, quotas, base URLs, model tables, and OmniRoute combo setup.


🎙️ Free Transcription Combo

Transcribe any audio/video for $0 — Deepgram leads with $200 free, AssemblyAI $50 fallback, Groq Whisper as unlimited emergency backup.

Provider Free Credits Best Model Rate Limit
🟢 Deepgram $200 free (signup) nova-3 — best accuracy, 30+ languages No RPM limit on free credits
🔵 AssemblyAI $50 free (signup) universal-3-pro — chapters, sentiment, PII No RPM limit on free credits
🔴 Groq Free forever whisper-large-v3 — OpenAI Whisper 30 RPM (rate limited)

Suggested combo in /dashboard/combos:

Name: free-transcription
Strategy: Priority
Nodes:
  [1] deepgram/nova-3          → uses $200 free first
  [2] assemblyai/universal-3-pro → fallback when Deepgram credits run out
  [3] groq/whisper-large-v3    → free forever, emergency fallback

Then in /dashboard/mediaTranscription tab: upload any audio or video file → select your combo endpoint → get transcription in supported formats.

💡 Key Features

4,690+ automated tests across 517 test files. Not just a relay — a full operational platform.

Feature Why It Matters
🧠 Smart 4-Tier Fallback — Subscription → API → Cheap → Free Never stop coding, zero downtime
🔄 Format Translation — OpenAI ↔ Claude ↔ Gemini ↔ Responses API Works with ANY CLI tool
🗜️ Prompt Compression — 7 options including Caveman, RTK, and stacked pipelines Save 15-95% eligible tokens
🤖 MCP Server — 37 tools, 3 transports (stdio/SSE/HTTP), 10 scopes IDE/agent tool integration
🛡️ Resilience Engine — circuit breakers, cooldowns, TLS spoofing, anti-thundering herd Auto-recovery from any failure
🎵 10 Multi-Modal APIs — chat, embed, images, video, music, TTS, STT, moderation, rerank, search One endpoint for everything
🌍 3-Level Proxy — global, per-provider, per-key + 1proxy free marketplace Access AI from any country
📊 Full Observability — unified logs, p50/p95/p99 telemetry, cost tracking, budget controls Know exactly what's happening
📋 Complete feature list — 30+ capabilities

Routing & Intelligence

  • 13 balancing strategies (priority, weighted, round-robin, P2C, cost-optimized, context-relay...)
  • Task-aware smart routing (coding/vision/analysis) · Context relay session handoffs
  • Thinking budget controls (passthrough/auto/custom) · Wildcard routing · System prompt injection

Translation & Compatibility

  • Auto token refresh (OAuth PKCE for 8 providers) · Multi-account round-robin
  • Responses API — full /v1/responses for Codex · Batch API with Files API
  • OpenAPI 3.0 live spec + Try-It UI

Protocols

  • A2A Server — JSON-RPC 2.0, SSE streaming, task lifecycle, skills
  • ACP — CLI agent discovery (14 agents + custom)

Platform

  • Desktop (Electron) · Android (Termux) · PWA · Docker (AMD64 + ARM64)
  • Cloudflare / Tailscale / ngrok tunnels · 40+ languages with RTL
  • Semantic + signature cache (two-tier) · Request idempotency + deduplication

Observability

  • Health dashboard — uptime, breakers, cache, lockouts
  • Evaluation framework — golden set testing · Webhooks · Compliance audit

v3.6+ Highlights: V1 WebSocket Bridge · Sync Tokens & Config Bundle · GLM Thinking (glmt) · Hybrid Token Counting · Safe Outbound Fetch · Wait For Cooldown · Runtime Env Validation · Vision Bridge · Grok-4 Fast · GLM-5 via Z.AI · MiniMax M2.5 · toolCalling flag · Multilingual Intent Detection · Benchmark-Driven Fallbacks · Request Deduplication

Architecture Examples:

Combo: "my-coding-stack"              Format Translation:
  1. cc/claude-opus-4-7                 CLI → OpenAI format
  2. nvidia/llama-3.3-70b               OmniRoute → translates
  3. glm/glm-4.7                        Provider → native format
  4. if/kimi-k2-thinking

📖 MCP Server README · A2A Server README · Resilience Guide · Features Gallery


🎯 Use Cases — Ready-Made Combo Playbooks

Case 1: "I have a Claude Pro subscription"

Problem: Quota expires unused, rate limits during heavy coding sessions.

Combo: "maximize-claude"
  1. cc/claude-opus-4-7        (use subscription fully)
  2. glm/glm-5.1               (cheap backup when quota out — $0.5/1M)
  3. kr/claude-sonnet-4.5      (free emergency fallback via Kiro)

Compression: standard (caveman) — saves 30% tokens = stretch quota further
Monthly cost: $20 (subscription) + ~$3 (backup) = $23 total
vs. $20 + hitting limits + lost productivity = frustration

Case 2: "I want $0 forever"

Problem: Can't afford subscriptions, need reliable AI for coding.

Combo: "free-forever"
  1. kr/claude-sonnet-4.5      (Claude 4.5 free unlimited via Kiro)
  2. if/kimi-k2-thinking       (reasoning model free via Qoder)
  3. pol/gpt-5                 (GPT-5 free via Pollinations — no key)
  4. lc/longcat-flash-lite     (50M tokens/day free backup)

Compression: aggressive — saves 50% tokens = double your free quota
Monthly cost: $0
Quality: Production-ready models + 50% token savings

Case 3: "I need 24/7 coding, no interruptions"

Problem: Deadlines, can't afford any downtime.

Combo: "always-on"
  1. cc/claude-opus-4-7        (best quality — subscription)
  2. cx/gpt-5.5                (second subscription — OpenAI)
  3. glm/glm-5.1               (cheap, resets daily — $0.5/1M)
  4. minimax/MiniMax-M2.5      (cheapest paid — $0.3/1M)
  5. kr/claude-sonnet-4.5      (free unlimited — never fails)

Compression: lite — saves 15% tokens passively, zero risk
Result: 5 layers of fallback = zero downtime
Monthly cost: $20-200 (subscriptions) + $5-10 (backup)

Case 4: "I'm in a blocked region (Russia, China, Iran...)"

Problem: AI providers block my country, VPNs are slow.

Combo: "unblocked-ai"
  1. kr/claude-sonnet-4.5      (free via Kiro + proxy)
  2. pol/deepseek-r1           (Pollinations — no geo-block)
  3. groq/llama-3.3-70b       (Groq + proxy)

Proxy: Global proxy set in Settings → or per-provider proxy override
Result: Access ALL providers from ANY country
Monthly cost: $0 (free providers) + $0 (1proxy free marketplace)

Case 5: "I want maximum token savings"

Problem: Token costs are eating my budget, need to squeeze every token.

Combo: "ultra-saver"
  1. cc/claude-opus-4-7        (subscription — best quality)
  2. glm/glm-5.1               (cheap backup)

Compression: ultra — saves 75% tokens
Result: 10K token prompt → 2.5K tokens sent
Montly savings: ~$150-300/month in token costs for heavy users

🧪 Evaluations (Evals)

OmniRoute includes a built-in evaluation framework to test LLM response quality against a golden set. Access it via Analytics → Evals in the dashboard.

Built-in Golden Set

The pre-loaded "OmniRoute Golden Set" contains test cases for:

  • Greetings, math, geography, code generation
  • JSON format compliance, translation, markdown generation
  • Safety refusal (harmful content), counting, boolean logic

Evaluation Strategies

Strategy Description Example
exact Output must match exactly "4"
contains Output must contain substring (case-insensitive) "Paris"
regex Output must match regex pattern "1.*2.*3"
custom Custom JS function returns true/false (output) => output.length > 10

📖 Setup Guide

Connect Your Coding Tool

Point any OpenAI-compatible tool to OmniRoute:

Base URL: http://localhost:20128/v1
API Key:  [from Dashboard → Endpoints]
Tool Config Location
Claude Code claude mcp add-server omniroute --type http --url http://localhost:20128/api/mcp/stream
Codex CLI OPENAI_BASE_URL=http://localhost:20128/v1 OPENAI_API_KEY=your-key codex
Cursor Settings → Models → Add Model → Override Base URL
Cline Extension settings → Custom API Base URL
OpenClaw OPENAI_BASE_URL=http://localhost:20128/v1 openclaw
Gemini CLI Uses native OAuth via OmniRoute — connect in Providers

Protocols (MCP + A2A)

# MCP (stdio transport)
omniroute --mcp

# A2A (JSON-RPC 2.0)
curl http://localhost:20128/.well-known/agent.json

Key Environment Variables

Variable Default Purpose
PORT 20128 API and dashboard port
DASHBOARD_PORT Separate dashboard port (split-port mode)
REQUIRE_API_KEY false Require API key for all requests
DATA_DIR ~/.omniroute Database and config storage
REQUEST_TIMEOUT_MS 600000 Upstream response timeout
📖 Full Setup Guide — All CLI tools, protocols, and environment variables

📖 Complete documentation:


❓ Frequently Asked Questions

📊 Why does my dashboard show high costs if I'm using free models?

The dashboard tracks your token usage and displays estimated costs as if you were using paid APIs directly. This is not actual billing — it's a reference to show how much you're saving.

Example:

  • Dashboard shows: "$290 total cost"
  • Reality: You're using Kiro + Qoder (FREE unlimited)
  • Your actual cost: $0.00
  • What $290 means: Amount you saved by using free models instead of paid APIs!

The cost display is a "savings tracker" to help you understand your usage patterns and optimization opportunities.

💳 Will I be charged by OmniRoute?

No. OmniRoute is free, open-source software that runs on your own computer. It never charges you anything.

You only pay:

  • Subscription providers (Claude Code $20/mo, Codex $20-200/mo) → Pay them directly on their websites
  • API key providers (DeepSeek, xAI, etc.) → Pay them directly, OmniRoute just routes your requests
  • OmniRoute itselfNever charges anything, ever

OmniRoute is a local proxy/router. It doesn't have your credit card, can't send invoices, and has no billing system. It's completely free software.

🆓 Are FREE providers really unlimited?

Yes! The current FREE providers are genuinely free with no hidden charges:

  • Kiro AI: Free unlimited Claude Sonnet/Haiku via AWS Builder ID / Google / GitHub OAuth
  • Qoder: Free unlimited kimi-k2-thinking, qwen3-coder-plus, deepseek-r1 via PAT token
  • Pollinations AI: No API key needed — GPT-5, Claude, DeepSeek, Llama 4
  • LongCat Flash-Lite: 50M tokens/day — largest free quota available
  • Cloudflare Workers AI: 10K Neurons/day — 50+ models at the edge

OmniRoute just routes your requests to them — there's no "catch" or future billing.

💰 How do I minimize my actual AI costs?

Free-First Strategy:

  1. Start with 100% free combo:

    1. kr/claude-sonnet-4.5    (Kiro — unlimited free)
    2. if/kimi-k2-thinking     (Qoder — unlimited free)
    3. pol/gpt-5               (Pollinations — no key needed)
    

    Cost: $0/month

  2. Enable Prompt Compression — even lite mode saves ~15% passively

  3. Add cheap backup only if you need it:

    4. glm/glm-5.1  ($0.5/1M tokens)
    

    Additional cost: Only pay for what you actually use

  4. Use subscription providers last — only if you already have them. OmniRoute helps maximize their value through quota tracking.

Result: Most users can operate at $0/month using only free tiers!

🗜️ Will compression affect response quality?

No. Compression only affects the input (your prompt), not the model's response. Each mode has been designed to preserve technical accuracy:

  • Lite (~15%): Only whitespace/formatting — zero semantic change
  • Standard (~30%): Removes filler words ("please", "I think", "basically") — same meaning
  • Aggressive (~50%): Summarizes old messages + compresses tool outputs — core context preserved
  • Ultra (~75%): Heuristic pruning — use only when token budget is critical

Code blocks, URLs, JSON, and structured data are always protected from compression via the preservation engine.

🌍 Does OmniRoute work in countries where AI is blocked?

Yes! OmniRoute has a 3-level proxy system:

  1. Global proxy — all requests go through your proxy
  2. Per-provider proxy — different proxy per provider
  3. Per-API-key proxy — different proxy per key

Plus the 1proxy free marketplace for community-shared proxies. Users in Russia, China, Iran, and other restricted regions can access all 160+ providers through OmniRoute's proxy infrastructure.

See the Proxy Guide for setup instructions.


🐛 Troubleshooting

Problem Quick Fix
"Language model did not provide messages" Provider quota exhausted → check quota tracker, use combo fallback
Rate limiting (429) Add fallback combo: cc/claude → glm/glm-4.7 → if/kimi-k2-thinking
OAuth token expired Auto-refreshed by OmniRoute. If stuck: delete + re-auth in Providers
unsupported_country_region_territory Configure proxy in Settings → Proxy (see Proxy Guide)
Docker SQLite locks Use --stop-timeout 40 for clean WAL checkpoint on shutdown
Node.js runtime errors Use Node.js >=20.20.2 <21, >=22.22.2 <23, or >=24.0.0 <25 (24 LTS recommended)
system-info for bug reports Run npm run system-info and attach system-info.txt to your issue

📖 Full troubleshooting guide: docs/TROUBLESHOOTING.md

🛠️ Tech Stack

Click to expand tech stack details
  • Runtime: Node.js 20.20.2+, 22.22.2+, or 24.x LTS (24 LTS recommended)
  • Language: TypeScript 5.9 — 100% TypeScript across src/ and open-sse/ (zero any in core modules since v2.0)
  • Framework: Next.js 16 + React 19 + Tailwind CSS 4
  • Database: better-sqlite3 (SQLite) + LowDB (JSON legacy) — domain state, proxy logs, MCP audit, routing decisions, memory, skills
  • Schemas: Zod (MCP tool I/O validation, API contracts)
  • Protocols: MCP (stdio/HTTP) + A2A v0.3 (JSON-RPC 2.0 + SSE)
  • Streaming: Server-Sent Events (SSE) + WebSocket bridge (/v1/ws)
  • Auth: OAuth 2.0 (PKCE) + JWT + API Keys + MCP Scoped Authorization
  • Testing: Node.js test runner + Vitest (4,690+ test cases across 517 files — unit, integration, E2E, security, ecosystem)
  • Platforms: Desktop (Electron), Android (Termux), PWA (any browser)
  • CI/CD: GitHub Actions (auto npm publish + Docker Hub on release)
  • Website: omniroute.online
  • Package: npmjs.com/package/omniroute
  • Docker: hub.docker.com/r/diegosouzapw/omniroute
  • Resilience: Circuit breaker, exponential backoff, anti-thundering herd, TLS spoofing, auto-combo self-healing

📖 Documentation

📘 Getting Started

Document Description
User Guide Providers, combos, CLI integration, deployment
Setup Guide Full install methods, CLI tool configs, protocol setup, timeout tuning
CLI Tools Guide Per-tool setup for Claude Code, Codex, Cursor, Cline, OpenClaw, Kilo, Copilot
Quick Start 3-step install → connect → configure

🔧 Operations & Deployment

Document Description
Docker Guide Docker run, Compose profiles, Caddy HTTPS, tunnels, image tags
VM Deployment Complete guide: VM + nginx + Cloudflare setup
Fly.io Deployment Deploy to Fly.io with persistent storage
Termux Guide Run OmniRoute on Android via Termux
PWA Guide Progressive Web App install, caching, architecture
Uninstall Guide Clean removal for all install methods
Environment Config Complete .env variables and references

🧠 Features & Architecture

Document Description
Architecture System architecture, data flow, and internals
Compression Guide 7-option pipeline: off / lite / standard / aggressive / ultra / RTK / stacked
RTK Compression Command-output compression, filters, trust, verify, raw-output recovery
Compression Engines Caveman, RTK, stacked pipelines, dashboard/API/MCP surfaces
Compression Rules Format JSON rule-pack schemas for Caveman and RTK filters
Compression Language Packs Language detection and Caveman rule-pack authoring
Resilience Guide Circuit breakers, cooldowns, queue, anti-thundering herd, TLS spoofing
Auto-Combo Engine 6-factor scoring, mode packs, self-healing
Proxy Guide 3-level proxy system, 1proxy marketplace, registry CRUD
Free Tiers 25+ free API providers consolidated directory
Features Gallery Visual dashboard tour with screenshots
Codebase Documentation Beginner-friendly codebase walkthrough

🤖 Protocols & APIs

Document Description
API Reference All endpoints with examples
OpenAPI Spec OpenAPI 3.0 specification
MCP Server 29 MCP tools, IDE configs, Python/TS/Go clients
MCP Server Guide MCP installation, transports, and tool reference
A2A Server JSON-RPC 2.0 protocol, skills, streaming, task mgmt
A2A Server Guide A2A agent card, tasks, skills, and streaming

📋 Project & Quality

Document Description
Contributing Development setup and guidelines
Security Policy Vulnerability reporting and security practices
i18n Guide 40+ language support, translation workflow, RTL
Release Checklist Pre-release validation steps
Coverage Plan Test coverage strategy and 4,690+ test suite

⭐ Top Contributors

OmniRoute is shaped by a passionate open-source community. These individuals have made exceptional contributions that directly impact the quality, stability, and reach of the project. Thank you.

oyi77
oyi77

🥇 190 commits • +72K lines
Analytics engine, SQL aggregations,
proxy marketplace, test coverage
Chris Staley
Chris Staley

🥈 72 commits • +5.7K lines
SSE stream hardening, Responses API,
Gemini pagination, test regression fixes
zenobit
zenobit

🥉 62 commits • +24K lines
CI/CD pipeline, i18n for 33 languages,
Void Linux package, platform fixes
R.D. & Randi
R.D. & Randi

🏅 107 commits • +28K lines
Endpoints page, tunnel integrations,
Docker workflows, A2A status, compression UI
benzntech
benzntech

🏅 20 commits • +7.5K lines
Electron desktop app, auto-updater,
release build workflows, cross-platform CI

🙏 These contributors' features, bug fixes, and infrastructure improvements are a core part of what makes OmniRoute reliable and feature-rich. Every pull request, every test case, and every i18n translation file matters. Open source is built by people like them.


👥 Contributors

Contributors

How to Contribute

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.

Releasing a New Version

# Create a release — npm publish happens automatically
gh release create v2.0.0 --title "v2.0.0" --generate-notes

📊 Star History

Star History Chart

🌍 StarMapper

StarMapper

🙏 Acknowledgments

Special thanks to 9router by decolua — the original project that inspired this fork. OmniRoute builds upon that incredible foundation with additional features, multi-modal APIs, and a full TypeScript rewrite.

Special thanks to CLIProxyAPI by router-for-me — the original Go implementation that inspired this JavaScript port.

Special thanks to Caveman by JuliusBrussee (⭐ 51K+) — the viral "why use many token when few token do trick" project whose caveman-speak compression philosophy inspired OmniRoute's standard compression mode and 30+ filler/condensation regex rules.

Special thanks to RTK - Rust Token Killer by RTK AI — the high-performance command-output compression project whose terminal, build, test, git, and tool-output filtering model inspired OmniRoute's RTK engine, JSON filter DSL, raw-output recovery, and stacked RTK → Caveman compression pipeline.


📄 License

MIT License - see LICENSE for details.


Built with ❤️ for developers who code 24/7
omniroute.online

About

OmniRoute is an AI gateway for multi-provider LLMs: an OpenAI-compatible endpoint with smart routing, load balancing, retries, and fallbacks. Add policies, rate limits, caching, and observability for reliable, cost-aware inference.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • TypeScript 96.7%
  • JavaScript 2.8%
  • Other 0.5%