Multi-provider LLM gateway with automatic fallback and cost tracking. Provides a single HTTP API that routes requests across DeepSeek, Gemini, OpenAI, Anthropic, Ollama — and any OpenAI-compatible API — trying cheaper providers first and falling back automatically on failure.
# Install dependencies
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Set up at least one provider
export LLM_PROVIDER=deepseek
export DEEPSEEK_API_KEY=your-key
# Start the server
python main.pyThe server runs on http://localhost:8090 by default.
All endpoints are served under /api/v1.0/.
| Endpoint | Method | Description |
|---|---|---|
/api/v1.0/classify |
POST | Classify items using AI (returns JSON) |
/api/v1.0/plan |
POST | Generate structured plans using AI (returns JSON) |
/api/v1.0/embed |
POST | Generate text embeddings (requires OPENAI_API_KEY) |
/api/v1.0/chat/completions |
POST | OpenAI-compatible chat with optional tool call support |
/api/v1.0/health |
GET | Health check with provider status |
Deprecated endpoints: The following unversioned routes still work but are deprecated and will be removed in a future release. Migrate to the
/api/v1.0/equivalents above.
Legacy Endpoint Replacement POST /classifyPOST /api/v1.0/classifyPOST /planPOST /api/v1.0/planPOST /embedPOST /api/v1.0/embedPOST /v1/chat/completionsPOST /api/v1.0/chat/completionsGET /healthGET /api/v1.0/health
Every endpoint that accepts a model field resolves the model using a 3-tier priority:
- Request body
modelfield (highest priority — client override) - Environment variable per provider (e.g.
DEEPSEEK_MODEL,OPENAI_MODEL) providers.jsondefault_model(fallback)
Omit model or pass "default" to use the server-configured default. Pass any model name to override per-request.
Send a prompt, get back a JSON classification response.
curl -X POST http://localhost:8090/api/v1.0/classify \
-H "Content-Type: application/json" \
-d '{"prompt": "Classify these items: ...", "model": "deepseek-chat"}'Request body:
prompt(string, required) — The classification promptmodel(string, optional) — Model override
Response:
{
"classification": {
"owned": [{"subdomain": "app.example.com", "category": "owned"}],
"third_party": [],
"interesting": [],
"ignore": []
},
"ai_call_log": {
"provider": "deepseek",
"model": "deepseek-chat",
"prompt_tokens": 100,
"completion_tokens": 50,
"cost_microcents": 12,
"latency_ms": 500,
"success": true
}
}Generate a structured plan from context and a system prompt.
curl -X POST http://localhost:8090/api/v1.0/plan \
-H "Content-Type: application/json" \
-d '{
"context": {"task": "...", "constraints": []},
"system_prompt": "You are a planner. Return JSON.",
"model": "deepseek-chat"
}'Request body:
context(object, required) — Arbitrary context dict for the planning tasksystem_prompt(string, required) — System prompt for the LLMmodel(string, optional) — Model override
Response:
{
"plan": {
"assessment": "needs_followup",
"confidence": "medium",
"rationale": "Discovered open ports need verification",
"probes": [
{
"cmd": "curl -s -o /dev/null -w '%{http_code}' https://192.168.1.1:443",
"timeout_s": 30,
"note": "Verify HTTPS service"
}
]
},
"ai_call_log": {
"provider": "deepseek",
"model": "deepseek-chat",
"prompt_tokens": 200,
"completion_tokens": 100,
"cost_microcents": 24,
"latency_ms": 800,
"success": true
}
}Generate text embeddings using OpenAI's embedding models.
curl -X POST http://localhost:8090/api/v1.0/embed \
-H "Content-Type: application/json" \
-d '{"text": "text to embed"}'Request body:
text(string or array of strings, required) — Text to embedmodel(string, optional) — Embedding model (default:text-embedding-ada-002)
Response:
{
"embeddings": [[0.1, 0.2, ...]],
"model": "text-embedding-ada-002",
"dimensions": 1536,
"ai_call_log": {
"provider": "openai",
"model": "text-embedding-ada-002",
"prompt_tokens": 5,
"completion_tokens": 0,
"cost_microcents": 1,
"latency_ms": 150,
"success": true
}
}OpenAI-compatible endpoint supporting optional tool calls. Provider-specific translation (e.g. Anthropic tool format) is handled transparently.
curl -X POST http://localhost:8090/api/v1.0/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-chat",
"messages": [{"role": "user", "content": "Hello"}]
}'Request body:
model(string, optional) — Model to use ("default"or omit for server default)messages(array, required) — Chat messages, each withroleandcontenttools(array, optional) — Tool definitions for function calling
Request with tools:
{
"model": "deepseek-chat",
"messages": [
{"role": "user", "content": "scan 192.168.1.1"}
],
"tools": [
{
"type": "function",
"function": {
"name": "run_nmap",
"description": "Run nmap scan",
"parameters": {
"type": "object",
"properties": {
"target": {"type": "string"}
}
}
}
}
]
}Response:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "Hello! How can I help?",
"reasoning_content": null,
"tool_calls": null
},
"finish_reason": "stop",
"index": 0
}
],
"usage": {
"prompt_tokens": 50,
"completion_tokens": 20,
"total_tokens": 70
},
"model": "deepseek-chat"
}Response with tool calls:
{
"choices": [
{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_xyz789",
"type": "function",
"function": {
"name": "run_nmap",
"arguments": "{\"target\": \"192.168.1.1\"}"
}
}
]
},
"finish_reason": "tool_calls",
"index": 0
}
],
"usage": {
"prompt_tokens": 80,
"completion_tokens": 30,
"total_tokens": 110
},
"model": "deepseek-chat"
}Check service health and provider status.
curl http://localhost:8090/api/v1.0/healthResponse:
{
"status": "healthy",
"providers": [{"name": "deepseek", "model": "deepseek-chat"}],
"embeddings_available": true
}All endpoints return a 500 with detail when all providers fail:
{
"detail": "All providers failed:\ndeepseek: timeout\ngemini: api error"
}All configuration is via environment variables. Copy .env.example to .env and fill in your keys. Provider definitions (pricing, timeouts, features) live in providers.json.
| Variable | Default | Description |
|---|---|---|
LLM_PROVIDER |
auto |
Provider: auto, ollama, deepseek, gemini, openai, anthropic |
When LLM_PROVIDER=auto, providers are tried in the priority order defined in providers.json (default: cheapest first). Only providers with configured env vars are used.
| Variable | Description |
|---|---|
OLLAMA_HOST |
Ollama server URL (e.g., http://localhost:11434) |
OLLAMA_MODEL |
Ollama model (e.g., qwen2.5-coder:14b) |
DEEPSEEK_API_KEY |
DeepSeek API key |
DEEPSEEK_MODEL |
DeepSeek model (default: deepseek-chat) |
GEMINI_API_KEY |
Google Gemini API key |
GEMINI_MODEL |
Gemini model (default: gemini-2.0-flash) |
OPENAI_API_KEY |
OpenAI API key (also required for /embed) |
OPENAI_MODEL |
OpenAI model (default: gpt-4o-mini) |
ANTHROPIC_API_KEY |
Anthropic API key |
ANTHROPIC_MODEL |
Anthropic model (default: claude-3-5-sonnet-20241022) |
At least one provider must have its required env vars configured (API key, or host for Ollama). Model env vars are optional — defaults come from providers.json.
| Variable | Default | Description |
|---|---|---|
PORT |
8090 |
HTTP port |
LOG_LEVEL |
INFO |
Logging level |
Any provider with an OpenAI-compatible API (Groq, Together, Mistral, etc.) can be added with just a JSON entry. Add to providers.json:
{
"providers": {
"groq": {
"kind": "openai_compatible",
"base_url": "https://api.groq.com/openai/v1",
"env_key": "GROQ_API_KEY",
"env_model": "GROQ_MODEL",
"default_model": "llama-3.3-70b-versatile",
"timeout": 60,
"features": { "tool_calls": true, "json_mode": true },
"pricing": { "input_per_1k_microcents": 0.59, "output_per_1k_microcents": 0.79 }
}
}
}Then set GROQ_API_KEY in your environment. That's it — no Python changes needed.
For providers with non-OpenAI APIs (like Anthropic or Gemini), create a provider class in providers/ that extends Provider, then register its kind in providers/registry.py's _KIND_MAP.
| Field | Required | Description |
|---|---|---|
kind |
Yes | Provider class: openai_compatible, anthropic, gemini, ollama |
env_key |
Yes* | Env var for API key (*or env_host for Ollama) |
env_model |
No | Env var to override default model |
default_model |
Yes | Fallback model if env var is unset |
base_url |
No | API base URL (omit for default OpenAI endpoint) |
timeout |
No | Request timeout in seconds (default: 300) |
api_params |
No | Extra API params: max_tokens, temperature, etc. |
features |
No | tool_calls, json_mode, reasoning_content |
pricing |
No | input_per_1k_microcents, output_per_1k_microcents |
# Run all tests
pytest -v
# Run with coverage
pytest --cov=. --cov-report=term-missing
# Run specific test file
pytest tests/test_providers.py -vPre-built images are published to GitHub Container Registry on every release.
# Pull and run
docker run -p 8090:8090 \
-e LLM_PROVIDER=auto \
-e DEEPSEEK_API_KEY=key \
ghcr.io/nullrabbitlabs/llm-gateway:latestPin to a specific version in production:
docker pull ghcr.io/nullrabbitlabs/llm-gateway:1.0.0To build locally instead:
docker build -t llm-gateway .
docker run -p 8090:8090 \
-e LLM_PROVIDER=auto \
-e DEEPSEEK_API_KEY=key \
llm-gateway┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Your Svc A │ │ Your Svc B │ │ Your Svc C │
│ │ │ │ │ │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ HTTP │ HTTP │ HTTP
▼ ▼ ▼
┌──────────────────────────────────────────────────────┐
│ llm-gateway (Python) │
│ ┌────────────────────────────────────────────────┐ │
│ │ providers.json (registry) │ │
│ │ ┌──────────────────────────────────────────┐ │ │
│ │ │ OpenAI-compatible: DeepSeek, OpenAI, ... │ │ │
│ │ │ Custom: Anthropic, Gemini, Ollama │ │ │
│ │ └──────────────────────────────────────────┘ │ │
│ │ Features: Auto-fallback, Cost tracking, Retries │ │
│ │ Endpoints: /api/v1.0/classify, /plan, /embed │ │
│ │ /chat/completions, /health │ │
│ └────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
See CONTRIBUTING.md.
LLM Gateway is the provider abstraction layer used by NullRabbit's AI agents for autonomous threat analysis across validator infrastructure and decentralised networks.
It is open-sourced as a standalone tool because multi-provider routing with cost tracking and automatic fallback is useful beyond security — if you're building AI agents or pipelines that need resilient LLM access, this does the job.
MIT — see LICENSE.