Skip to content

SAY-5/token-budget-proxy

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

token-budget-proxy

Never get a surprise API bill again.

A local HTTP proxy that sits between your code and any OpenAI-compatible API. It counts tokens in every request and blocks calls that would exceed your configured budget.

Python Zero Dependencies License: MIT PRs Welcome


The problem

You're building a feature that calls an LLM. You run a test loop. You forget to add a break. You wake up to a $40 bill for 2 million tokens. Or worse — a runaway agent keeps retrying a broken prompt and burns through your monthly budget in an hour.

token-budget-proxy is a one-command local proxy that enforces hard token limits on every API call. No SDK changes, no code changes — just point your OPENAI_BASE_URL at the proxy.


How it works

Your code  →  http://localhost:8080  →  token-budget-proxy  →  api.openai.com
                                              |
                                    counts tokens in request
                                    checks against budget
                                    blocks if over limit (HTTP 429)
                                    or forwards if within budget

The proxy:

  1. Intercepts every POST to /v1/chat/completions or /v1/completions
  2. Estimates the token count from the request body (prompt + max_tokens)
  3. Checks against your configured limits (per-request, per-minute, session total)
  4. Either forwards the request to the real API or returns a 429 with a clear error message

Install

pip install token-budget-proxy

No external dependencies. Works with Python 3.8+.


Quick start

1. Start the proxy

# Basic: block any single request over 2000 tokens
tokenproxy start --max-tokens-per-request 2000 --api-key sk-...

# Full budget control
tokenproxy start \
  --max-tokens-per-request 4096 \
  --max-tokens-per-minute  20000 \
  --max-tokens-total       100000 \
  --api-key sk-...

2. Point your client at the proxy

# Environment variable (works with any OpenAI SDK)
export OPENAI_BASE_URL=http://127.0.0.1:8080/v1
export OPENAI_API_KEY=sk-...  # still needed for the proxy to forward

python your_script.py

Or in code:

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8080/v1",
    api_key="sk-...",
)

3. See it in action

  [INFO] Token-budget proxy listening on http://127.0.0.1:8080
  [INFO] Upstream: https://api.openai.com/v1
  [INFO] Per-request limit: 4096 tokens

  [INFO] POST /v1/chat/completions — prompt=312t max_out=512t
  [INFO] POST /v1/chat/completions — prompt=891t max_out=1024t
  [BLOCK] /v1/chat/completions — Request would use 5200 tokens, exceeding per-request limit of 4096.

CLI reference

tokenproxy start

--host HOST                     Bind address (default: 127.0.0.1)
--port PORT                     Listen port (default: 8080)
--upstream URL                  Real API base URL (default: https://api.openai.com/v1)
--api-key KEY                   API key (or set OPENAI_API_KEY)
--max-tokens-per-request N      Hard limit per single request (default: 4096)
--max-tokens-per-minute N       Rolling 60-second window budget (default: unlimited)
--max-tokens-total N            Session lifetime budget (default: unlimited)
--warn-only                     Log violations but forward requests anyway
--quiet                         Suppress per-request logging

Budget modes

Mode Flag Behaviour
Per-request --max-tokens-per-request Blocks any single call over the limit
Per-minute --max-tokens-per-minute Rolling 60s window — rate limiting
Session total --max-tokens-total Hard cap for the entire proxy session
Warn-only --warn-only Logs violations but never blocks

All three limits can be combined. The strictest matching limit wins.


Compatible APIs

Any API that follows the OpenAI REST format works:

Provider Base URL
OpenAI https://api.openai.com/v1
Groq https://api.groq.com/openai/v1
Together AI https://api.together.xyz/v1
Ollama http://localhost:11434/v1
LM Studio http://localhost:1234/v1
Any OpenAI-compatible your URL

Use as a library

from tokenproxy import BudgetConfig, start_proxy

config = BudgetConfig(
    max_tokens_per_request=2000,
    max_tokens_per_minute=10000,
    upstream_url="https://api.openai.com/v1",
    upstream_api_key="sk-...",
)

# Blocking (runs until Ctrl+C)
start_proxy(config, port=8080)

# Non-blocking (background thread)
server = start_proxy(config, port=8080, block=False)
print(server.stats)
server.stop()

Token counting

The proxy uses a character-based approximation (1 token ≈ 4 characters) which requires no external libraries. This is accurate enough for budget enforcement — it may be off by 10–15% compared to the exact tokenizer.

To use exact counting, swap count_tokens_approx in tokenproxy/tokenizer.py with a call to tiktoken:

import tiktoken

def count_tokens_approx(text: str) -> int:
    enc = tiktoken.get_encoding("cl100k_base")
    return len(enc.encode(text))

Project structure

token-budget-proxy/
├── tokenproxy/
│   ├── __init__.py          # Public API exports
│   ├── tokenizer.py         # Token counting (approx + request body parsing)
│   ├── budget.py            # BudgetConfig, BudgetTracker (thread-safe)
│   ├── server.py            # ProxyServer, HTTP handler, request forwarding
│   ├── cli.py               # CLI entry point
│   └── middleware/
│       └── __init__.py      # Placeholder for future middleware
├── tests/
│   ├── test_tokenizer.py
│   └── test_budget.py
├── docs/
│   └── index.html
└── pyproject.toml

Contributing

See CONTRIBUTING.md. Good first issues are labelled in the issue tracker.


License

MIT © 2026 Jishanahmed AR Shaikh

About

Local proxy that enforces token budgets on OpenAI-compatible API calls. No SDK changes needed.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%