Know what your agents cost. Meter. Budget. Control.
AgentLedger is a reverse proxy that sits between your AI agents and LLM providers, tracking every token, calculating costs, and enforcing budgets — all without changing a single line of your application code.
export OPENAI_BASE_URL=http://localhost:8787/v1
# That's it. Your agents now have cost tracking and budget enforcement.AI agents make dozens of LLM calls per task. Costs compound fast, loops happen silently, and provider dashboards only show you the damage after the fact.
AgentLedger gives you:
- Real-time cost tracking — every request metered, every token counted
- Budget enforcement — daily and monthly limits with automatic blocking
- Pre-flight estimation — rejects requests that would exceed your budget before they hit the API
- Agent session tracking — group multi-call agent runs into sessions, detect loops and ghost agents
- MCP tool metering — track costs of MCP tool calls alongside LLM usage
- Dashboard — embedded web UI for real-time cost visibility
- Observability — OpenTelemetry metrics with Prometheus endpoint
- Circuit breaker — automatic upstream failure protection
- Multi-provider — 15 providers: OpenAI, Anthropic, Azure OpenAI, Gemini, Groq, Mistral, DeepSeek, Cohere, xAI, Perplexity, Together AI, Fireworks AI, OpenRouter, Cerebras, SambaNova
- Multi-tenancy — isolate costs by team/org with tenant-scoped budgets
- Alerting — Slack and webhook notifications for budget warnings and anomalies
- Rate limiting — per-key request throttling with sliding window counters
- Admin API — runtime budget rule management without restarts
- Zero code changes — works with any OpenAI/Anthropic SDK via base URL override
Homebrew:
brew install wdz-dev/tap/agentledgerBinary download — grab the latest release from GitHub Releases.
From source:
go install github.com/WDZ-Dev/agent-ledger/cmd/agentledger@latestDocker:
docker run --rm -p 8787:8787 ghcr.io/wdz-dev/agent-ledger:latestHelm (Kubernetes):
helm install agentledger deploy/helm/agentledger# Start the proxy with defaults (listens on :8787)
agentledger serve
# Or with a config file
agentledger serve -c configs/agentledger.example.yaml# Python (OpenAI SDK)
export OPENAI_BASE_URL=http://localhost:8787/v1
# Node.js
const openai = new OpenAI({ baseURL: 'http://localhost:8787/v1' });
# Claude Code
export ANTHROPIC_BASE_URL=http://localhost:8787
# Other providers — route via path prefix
# curl http://localhost:8787/groq/v1/chat/completions
# curl http://localhost:8787/mistral/v1/chat/completions
# curl http://localhost:8787/deepseek/v1/chat/completions
# curl http://localhost:8787/gemini/v1beta/models/gemini-2.5-pro:generateContent
# curl http://localhost:8787/cohere/v2/chat
# curl http://localhost:8787/xai/v1/chat/completions
# curl http://localhost:8787/together/v1/chat/completions
# curl http://localhost:8787/fireworks/v1/chat/completions
# curl http://localhost:8787/perplexity/v1/chat/completions
# curl http://localhost:8787/openrouter/v1/chat/completions
# curl http://localhost:8787/cerebras/v1/chat/completions
# curl http://localhost:8787/sambanova/v1/chat/completions# Last 24 hours, grouped by model
agentledger costs
# Last 7 days, grouped by API key
agentledger costs --last 7d --by keyPROVIDER MODEL REQUESTS INPUT TOKENS OUTPUT TOKENS COST (USD)
-------- ----- -------- ------------ ------------- ----------
openai gpt-4.1-mini 142 28400 14200 $0.0341
openai gpt-4.1 38 19000 9500 $0.1140
anthropic claude-sonnet-4 12 6000 3000 $0.0630
-------- ----- -------- ------------ ------------- ----------
TOTAL 192 53400 26700 $0.2111
cd deploy && docker compose up┌─────────────┐ ┌──────────────────────┐ ┌──────────────┐
│ Agents │──────▶│ AgentLedger :8787 │──────▶│ OpenAI │
│ (any SDK) │ │ │ │ Anthropic │
└─────────────┘ │ ┌────────────────┐ │ │ Azure OpenAI│
│ │ Rate Limiting │ │ │ Gemini │
┌─────────────┐ │ │ Budget Check │ │ │ Groq │
│ MCP Servers │◀─────▶│ │ Token Metering │ │ │ Mistral │
│(stdio/HTTP) │ │ │ Agent Sessions │ │ │ DeepSeek │
└─────────────┘ │ │ Cost Calc │ │ │ + 8 more │
│ │ Async Record │ │ └──────────────┘
│ └────────────────┘ │ ┌──────────────┐
│ │ │──────▶│ Slack │
│ ┌───────▼────────┐ │ │ Webhooks │
│ │ SQLite/Postgres │ │ └──────────────┘
│ └────────────────┘ │
│ │ │
│ ┌───────▼────────┐ │
│ │ Dashboard :8787 │ │
│ │ Admin API │ │
│ │ Prometheus │ │
│ └────────────────┘ │
└──────────────────────┘
Request flow:
- Agent sends request to AgentLedger
- Budget check — reject immediately if over limit
- Pre-flight estimation — reject if
max_tokenscost exceeds remaining budget - Forward to upstream provider (API key passes through untouched)
- Parse response for token usage
- Calculate cost from model pricing table
- Record asynchronously (never blocks the response)
- Return response to agent with optional budget warning headers
Every request is metered with provider-reported token counts. When streaming responses don't include usage data, AgentLedger falls back to tiktoken estimation (flagged as estimated: true).
Supported providers and models:
| Provider | Routing | Models |
|---|---|---|
| OpenAI | /v1/ (default) |
gpt-5 family, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini, o3, o3-pro, o3-mini, o4-mini, o1, o1-pro, o1-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo |
| Anthropic | /v1/messages |
claude-opus-4.6, claude-sonnet-4.6, claude-opus-4.5, claude-sonnet-4.5, claude-haiku-4.5, claude-opus-4, claude-sonnet-4, claude-3.7-sonnet, claude-3.5-sonnet, claude-3.5-haiku, claude-3-opus, claude-3-haiku |
| Azure OpenAI | /azure/ |
Same as OpenAI (custom deployment names) |
| Gemini | /gemini/ |
gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash |
| Groq | /groq/v1/ |
llama-3.3-70b-versatile, llama-3.1-8b-instant, mixtral-8x7b-32768, gemma2-9b-it |
| Mistral | /mistral/v1/ |
mistral-large-latest, mistral-small-latest, codestral-latest, open-mistral-nemo |
| DeepSeek | /deepseek/v1/ |
deepseek-chat, deepseek-reasoner |
| Cohere | /cohere/ |
command-r-plus, command-r, command-light |
| xAI | /xai/v1/ |
grok-3, grok-3-mini, grok-2 |
| Perplexity | /perplexity/v1/ |
sonar-pro, sonar, sonar-reasoning |
| Together AI | /together/v1/ |
Llama 3.3 70B, Llama 3.1 405B/8B, Qwen 2.5 72B, DeepSeek V3 |
| Fireworks AI | /fireworks/v1/ |
Llama 3.3 70B, Llama 3.1 8B, Qwen 2.5 72B |
| OpenRouter | /openrouter/v1/ |
Any model via OpenRouter routing |
| Cerebras | /cerebras/v1/ |
llama-3.3-70b, llama-3.1-8b |
| SambaNova | /sambanova/v1/ |
Llama 3.3 70B, Llama 3.1 8B |
83+ models with built-in pricing. Groq, Mistral, DeepSeek, xAI, Perplexity, Together, Fireworks, OpenRouter, Cerebras, and SambaNova use the OpenAI-compatible API format. Gemini and Cohere have custom parsers. Versioned model names (e.g., gpt-4o-2024-11-20) are matched via longest prefix.
Set daily and monthly spend limits. When exceeded, requests are rejected with a 429 status and a clear JSON error:
{
"error": {
"type": "budget_exceeded",
"message": "spending limit exceeded",
"daily_spent": 12.50,
"daily_limit": 10.00,
"monthly_spent": 45.00,
"monthly_limit": 500.00
}
}Soft limits add an X-AgentLedger-Budget-Warning response header when you're approaching the threshold, without blocking.
Pre-flight estimation calculates worst-case cost from max_tokens before the request reaches the API. If it would exceed the remaining budget, it's rejected immediately — no wasted spend.
Per-key rules let you set different limits for different API keys using glob patterns:
budgets:
default:
daily_limit_usd: 50.0
monthly_limit_usd: 500.0
soft_limit_pct: 0.8
action: "block"
rules:
- api_key_pattern: "sk-proj-dev-*"
daily_limit_usd: 5.0
action: "block"AgentLedger groups multi-call agent runs into sessions, enabling per-execution cost attribution. Tag requests with agent metadata using headers (stripped before forwarding to the provider):
X-Agent-Id: code-reviewer
X-Agent-Session: sess_abc123
X-Agent-User: user@example.com
X-Agent-Task: "Review PR #456"
Loop detection identifies runaway agents making repetitive calls:
agent:
loop_threshold: 20 # same path N times in window = loop
loop_window_mins: 5
loop_action: "warn" # "warn" or "block"Ghost detection finds forgotten agents still burning tokens:
agent:
ghost_max_age_mins: 60
ghost_min_calls: 50
ghost_min_cost_usd: 1.0Meter MCP (Model Context Protocol) tool calls alongside LLM costs. Two modes:
HTTP proxy — forward JSON-RPC to an upstream MCP server:
mcp:
enabled: true
upstream: "http://localhost:3000"
pricing:
- server: "filesystem"
tool: "read_file"
cost_per_call: 0.01Stdio wrapper — wrap any MCP server process:
agentledger mcp-wrap -- npx @modelcontextprotocol/server-filesystem /tmpEmbedded web UI at the root URL (http://localhost:8787/) with real-time cost breakdowns, session views, and spending trends.
OpenTelemetry metrics exported via Prometheus at /metrics:
- Request latency, token counts, cost totals
- Session lifecycle, loop/ghost alerts
- MCP tool call counts and costs
Protects against upstream failures. After a configurable number of consecutive 5xx responses, the circuit opens and rejects requests immediately. Auto-recovers after a timeout.
circuit_breaker:
max_failures: 5
timeout_secs: 30Isolate costs, budgets, and dashboards by team or organization. Enable tenancy and map API keys to tenants:
tenants:
enabled: true
key_mappings:
- api_key_pattern: "sk-proj-team-alpha-*"
tenant_id: "alpha"
- api_key_pattern: "sk-proj-team-beta-*"
tenant_id: "beta"Or set the tenant per-request via header: X-AgentLedger-Tenant: alpha.
All dashboard and cost endpoints accept an optional ?tenant= filter.
Get notified when budgets are approaching limits or agents are misbehaving:
alerts:
slack:
webhook_url: "https://hooks.slack.com/services/..."
webhooks:
- url: "https://api.example.com/alerts"
headers:
Authorization: "Bearer token"
cooldown_mins: 5 # deduplication window per alertAlert types: budget_warning, budget_exceeded, loop_detected, ghost_detected.
Throttle request volume per API key with sliding window counters:
rate_limits:
default:
requests_per_minute: 60
requests_per_hour: 1000
rules:
- api_key_pattern: "sk-proj-dev-*"
requests_per_minute: 10Returns 429 Too Many Requests with a Retry-After header when exceeded.
Manage budget rules at runtime without restarting. Protected by Bearer token auth:
admin:
enabled: true
token: "your-secret-admin-token"Endpoints:
| Method | Path | Description |
|---|---|---|
| GET | /api/admin/budgets/rules |
List budget rules |
| POST | /api/admin/budgets/rules |
Create a budget rule |
| DELETE | /api/admin/budgets/rules?pattern=... |
Delete a rule by pattern |
| GET | /api/admin/api-keys |
List API key hashes with monthly spend |
| GET | /api/admin/providers |
List provider status |
Runtime rules take effect immediately and persist across restarts.
Raw API keys are never stored. AgentLedger creates a SHA-256 fingerprint from the first 8 and last 4 characters of the key. The full key passes through to the upstream provider untouched.
AgentLedger looks for config in these locations (in order):
- Path passed via
--config/-cflag ./agentledger.yaml./configs/agentledger.yaml~/.config/agentledger/agentledger.yaml/etc/agentledger/agentledger.yaml
All settings can be overridden with environment variables prefixed AGENTLEDGER_:
AGENTLEDGER_LISTEN=":9090"
AGENTLEDGER_STORAGE_DSN="/tmp/ledger.db"
AGENTLEDGER_LOG_LEVEL="debug"See configs/agentledger.example.yaml for the full reference.
agentledger serve Start the proxy
-c, --config Path to config file
agentledger costs Show cost report
-c, --config Path to config file
--last Time window: 1h, 24h, 7d, 30d (default: 24h)
--by Group by: model, provider, key (default: model)
agentledger export Export cost data as CSV or JSON
-c, --config Path to config file
--last Time window (default: 30d)
--by Group by: model, provider, key, agent, session
-f, --format Output format: csv or json (default: csv)
--tenant Filter by tenant ID
agentledger mcp-wrap Wrap an MCP server process for tool call metering
-c, --config Path to config file
-- command [args...] MCP server command to wrap
agentledger version Print version
AgentLedger adds minimal overhead. Cost recording is fully async — it never blocks responses.
| Benchmark | Latency | Allocations |
|---|---|---|
| Non-streaming proxy | ~115 us | moderate |
| Streaming proxy (SSE) | ~110 us | moderate |
| Health check | ~2 us | minimal |
| Cost calculation | ~192 ns | 0 allocs |
| Token estimation (tiktoken) | ~16 us | cached |
Target: <1ms proxy overhead per request. Actual: ~0.1ms.
- Go 1.25+
- golangci-lint v2
- lefthook (git hooks)
make setup # Install dev tools and git hooksmake build # Build binary to bin/agentledger
make test # Run all tests with race detection
make lint # Run golangci-lint
make dev # Build and run with example config
make check # Run all checks (fmt, vet, lint, test, vulncheck)
make docker # Build Docker image
make docker-run # Build and run in Docker
make helm-lint # Lint Helm chart
make release-dry # GoReleaser snapshot
make docs # Build documentation site
make docs-serve # Serve docs locally with live reloadagent-ledger/
├── cmd/agentledger/ CLI entrypoint
│ ├── main.go Root command + healthcheck (cobra)
│ ├── serve.go Proxy server command
│ ├── costs.go Cost report command
│ ├── export.go CSV/JSON export command
│ └── mcpwrap.go MCP stdio wrapper command
├── internal/
│ ├── proxy/ Reverse proxy core
│ │ ├── proxy.go httputil.ReverseProxy + budget integration
│ │ └── streaming.go SSE stream interception
│ ├── provider/ LLM provider parsers
│ │ ├── provider.go Provider interface + API key handling
│ │ ├── openai_compat.go OpenAI-compatible base (shared by Groq/Mistral/DeepSeek)
│ │ ├── openai.go OpenAI provider constructor
│ │ ├── anthropic.go Anthropic messages API
│ │ ├── gemini.go Google Gemini custom parser
│ │ ├── cohere.go Cohere custom parser
│ │ ├── groq.go Groq (OpenAI-compatible)
│ │ ├── mistral.go Mistral (OpenAI-compatible)
│ │ ├── deepseek.go DeepSeek (OpenAI-compatible)
│ │ ├── azure.go Azure OpenAI
│ │ ├── xai.go xAI/Grok (OpenAI-compatible)
│ │ ├── perplexity.go Perplexity (OpenAI-compatible)
│ │ ├── together.go Together AI (OpenAI-compatible)
│ │ ├── fireworks.go Fireworks AI (OpenAI-compatible)
│ │ ├── openrouter.go OpenRouter (OpenAI-compatible)
│ │ ├── cerebras.go Cerebras (OpenAI-compatible)
│ │ ├── sambanova.go SambaNova (OpenAI-compatible)
│ │ └── registry.go Auto-detect provider from request + path prefix routing
│ ├── meter/ Cost calculation
│ │ ├── meter.go Token-to-USD conversion
│ │ ├── pricing.go Model pricing table (83+ models)
│ │ └── estimator.go Tiktoken fallback estimation
│ ├── ledger/ Storage layer
│ │ ├── ledger.go Ledger interface
│ │ ├── models.go UsageRecord, CostFilter, CostEntry
│ │ ├── sqlite.go SQLite impl (CGO-free)
│ │ ├── postgres.go PostgreSQL impl
│ │ ├── recorder.go Async buffered recording
│ │ └── migrations/ Embedded SQL migrations (goose)
│ │ ├── sqlite/ SQLite-specific migrations
│ │ └── postgres/ PostgreSQL-specific migrations
│ ├── budget/ Budget enforcement
│ │ ├── budget.go Per-key spend limits + caching
│ │ └── circuit_breaker.go Transport wrapper for upstream failures
│ ├── agent/ Agent session tracking
│ │ ├── session.go Session lifecycle management
│ │ └── detector.go Loop/ghost detection
│ ├── mcp/ MCP tool metering
│ │ ├── httpproxy.go HTTP proxy for MCP servers
│ │ ├── interceptor.go JSON-RPC interception
│ │ ├── stdio.go Stdio wrapper for MCP processes
│ │ └── pricing.go Per-call cost rules
│ ├── otel/ Observability
│ │ ├── metrics.go OTel metrics recording
│ │ └── provider.go Prometheus exporter setup
│ ├── dashboard/ Web UI
│ │ ├── handlers.go REST API handlers
│ │ ├── server.go HTTP server + embedded assets
│ │ └── static/ Embedded JS/CSS/HTML assets
│ ├── tenant/ Multi-tenancy
│ │ └── tenant.go Tenant resolver (header/config/chain)
│ ├── alert/ Alerting
│ │ ├── alert.go Alert types + multi-notifier
│ │ ├── slack.go Slack webhook notifier
│ │ ├── webhook.go Generic webhook notifier
│ │ └── ratelimit.go Deduplication wrapper
│ ├── ratelimit/ Request rate limiting
│ │ └── limiter.go Sliding window counter
│ ├── admin/ Admin API
│ │ ├── handlers.go REST API (budget CRUD, key listing)
│ │ └── store.go Runtime config persistence
│ └── config/ YAML/env config (viper)
├── deploy/
│ ├── docker-compose.yml One-command local dev
│ ├── Dockerfile.goreleaser Slim image for releases
│ └── helm/agentledger/ Kubernetes Helm chart
├── configs/
│ └── agentledger.example.yaml
├── docs/ MkDocs Material documentation site
│ ├── getting-started/ Installation, quickstart, CLI reference
│ ├── configuration/ Config overview and full reference
│ ├── features/ Per-feature documentation
│ ├── deployment/ Docker and Kubernetes guides
│ └── stylesheets/ Custom CSS overrides
├── .github/workflows/
│ ├── ci.yml Lint, test, build, vulncheck
│ ├── release.yml GoReleaser on tag push
│ └── docs.yml Build and deploy docs to GitHub Pages
├── Dockerfile Multi-stage Docker build
├── .goreleaser.yml Cross-platform release config
├── mkdocs.yml Documentation site config
├── Makefile
├── go.mod
└── lefthook.yml Pre-commit and pre-push hooks
- Phase 1: Core Proxy — Reverse proxy, token metering, cost calculation, SQLite storage, CLI
- Phase 2: Budget Enforcement — Per-key budgets, pre-flight estimation, circuit breaker
- Phase 3: Agent Attribution — Session tracking, loop detection, ghost agent detection
- Phase 4: Observability — OpenTelemetry metrics, Prometheus endpoint, web dashboard
- Phase 5: MCP Integration — Meter MCP tool calls alongside LLM costs
- Phase 6: Polish & Launch — Docker, GoReleaser, Helm chart, docs
- Phase 7: Multi-Provider — 15 providers with path-prefix routing (Groq, Mistral, DeepSeek, Gemini, Cohere, xAI, Perplexity, Together, Fireworks, OpenRouter, Cerebras, SambaNova, Azure)
- Phase 8: Postgres — Production-grade PostgreSQL storage backend
- Phase 9: Multi-Tenancy — Tenant isolation with header and config-based resolution
- Phase 10: Alerting — Slack and webhook notifications with deduplication
- Phase 11: Rate Limiting — Per-key request throttling + Homebrew tap
- Phase 12: Admin API — Runtime budget rule management
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-feature) - Run checks (
make check) - Commit your changes
- Open a pull request
Apache 2.0