token-budget-proxy

Never get a surprise API bill again.

A local HTTP proxy that sits between your code and any OpenAI-compatible API. It counts tokens in every request and blocks calls that would exceed your configured budget.

The problem

You're building a feature that calls an LLM. You run a test loop. You forget to add a break. You wake up to a $40 bill for 2 million tokens. Or worse — a runaway agent keeps retrying a broken prompt and burns through your monthly budget in an hour.

token-budget-proxy is a one-command local proxy that enforces hard token limits on every API call. No SDK changes, no code changes — just point your OPENAI_BASE_URL at the proxy.

How it works

Your code  →  http://localhost:8080  →  token-budget-proxy  →  api.openai.com
                                              |
                                    counts tokens in request
                                    checks against budget
                                    blocks if over limit (HTTP 429)
                                    or forwards if within budget

The proxy:

Intercepts every POST to /v1/chat/completions or /v1/completions
Estimates the token count from the request body (prompt + max_tokens)
Checks against your configured limits (per-request, per-minute, session total)
Either forwards the request to the real API or returns a 429 with a clear error message

Install

pip install token-budget-proxy

No external dependencies. Works with Python 3.8+.

Quick start

1. Start the proxy

# Basic: block any single request over 2000 tokens
tokenproxy start --max-tokens-per-request 2000 --api-key sk-...

# Full budget control
tokenproxy start \
  --max-tokens-per-request 4096 \
  --max-tokens-per-minute  20000 \
  --max-tokens-total       100000 \
  --api-key sk-...

2. Point your client at the proxy

# Environment variable (works with any OpenAI SDK)
export OPENAI_BASE_URL=http://127.0.0.1:8080/v1
export OPENAI_API_KEY=sk-...  # still needed for the proxy to forward

python your_script.py

Or in code:

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8080/v1",
    api_key="sk-...",
)

3. See it in action

  [INFO] Token-budget proxy listening on http://127.0.0.1:8080
  [INFO] Upstream: https://api.openai.com/v1
  [INFO] Per-request limit: 4096 tokens

  [INFO] POST /v1/chat/completions — prompt=312t max_out=512t
  [INFO] POST /v1/chat/completions — prompt=891t max_out=1024t
  [BLOCK] /v1/chat/completions — Request would use 5200 tokens, exceeding per-request limit of 4096.

CLI reference

`tokenproxy start`

--host HOST                     Bind address (default: 127.0.0.1)
--port PORT                     Listen port (default: 8080)
--upstream URL                  Real API base URL (default: https://api.openai.com/v1)
--api-key KEY                   API key (or set OPENAI_API_KEY)
--max-tokens-per-request N      Hard limit per single request (default: 4096)
--max-tokens-per-minute N       Rolling 60-second window budget (default: unlimited)
--max-tokens-total N            Session lifetime budget (default: unlimited)
--warn-only                     Log violations but forward requests anyway
--quiet                         Suppress per-request logging

Budget modes

Mode	Flag	Behaviour
Per-request	`--max-tokens-per-request`	Blocks any single call over the limit
Per-minute	`--max-tokens-per-minute`	Rolling 60s window — rate limiting
Session total	`--max-tokens-total`	Hard cap for the entire proxy session
Warn-only	`--warn-only`	Logs violations but never blocks

All three limits can be combined. The strictest matching limit wins.

Compatible APIs

Any API that follows the OpenAI REST format works:

Provider	Base URL
OpenAI	`https://api.openai.com/v1`
Groq	`https://api.groq.com/openai/v1`
Together AI	`https://api.together.xyz/v1`
Ollama	`http://localhost:11434/v1`
LM Studio	`http://localhost:1234/v1`
Any OpenAI-compatible	your URL

Use as a library

from tokenproxy import BudgetConfig, start_proxy

config = BudgetConfig(
    max_tokens_per_request=2000,
    max_tokens_per_minute=10000,
    upstream_url="https://api.openai.com/v1",
    upstream_api_key="sk-...",
)

# Blocking (runs until Ctrl+C)
start_proxy(config, port=8080)

# Non-blocking (background thread)
server = start_proxy(config, port=8080, block=False)
print(server.stats)
server.stop()

Token counting

The proxy uses a character-based approximation (1 token ≈ 4 characters) which requires no external libraries. This is accurate enough for budget enforcement — it may be off by 10–15% compared to the exact tokenizer.

To use exact counting, swap count_tokens_approx in tokenproxy/tokenizer.py with a call to tiktoken:

import tiktoken

def count_tokens_approx(text: str) -> int:
    enc = tiktoken.get_encoding("cl100k_base")
    return len(enc.encode(text))

Project structure

token-budget-proxy/
├── tokenproxy/
│   ├── __init__.py          # Public API exports
│   ├── tokenizer.py         # Token counting (approx + request body parsing)
│   ├── budget.py            # BudgetConfig, BudgetTracker (thread-safe)
│   ├── server.py            # ProxyServer, HTTP handler, request forwarding
│   ├── cli.py               # CLI entry point
│   └── middleware/
│       └── __init__.py      # Placeholder for future middleware
├── tests/
│   ├── test_tokenizer.py
│   └── test_budget.py
├── docs/
│   └── index.html
└── pyproject.toml

Contributing

See CONTRIBUTING.md. Good first issues are labelled in the issue tracker.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
docs		docs
tests		tests
tokenproxy		tokenproxy
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

token-budget-proxy

The problem

How it works

Install

Quick start

1. Start the proxy

2. Point your client at the proxy

3. See it in action

CLI reference

`tokenproxy start`

Budget modes

Compatible APIs

Use as a library

Token counting

Project structure

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

token-budget-proxy

The problem

How it works

Install

Quick start

1. Start the proxy

2. Point your client at the proxy

3. See it in action

CLI reference

tokenproxy start

Budget modes

Compatible APIs

Use as a library

Token counting

Project structure

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`tokenproxy start`

Packages