You're spending money on LLM APIs but have no idea:
- Which feature, route, or user is burning the most tokens
- When your spend is spiking (before the bill arrives)
- Which model is giving you the best cost/quality ratio
- Whether your prompts have quietly gotten 3x longer over the past month
Every team using OpenAI, Anthropic, Gemini, or local models (Ollama) faces this. The only "solutions" are expensive SaaS dashboards that require you to route your API calls through their servers — sending your prompts to a third party.
TokenWatcher is self-hosted, privacy-first, and completely open source.
- 📊 Real-time dashboard — Token usage, cost, and latency per model, route, user, and tag
- 🚨 Budget alerts — Slack, email, or webhook notifications when spend crosses your threshold
- 🔌 Drop-in SDK — Wrap any LLM call in one line. Works with OpenAI, Anthropic, Gemini, Ollama
- 🏷️ Tagging system — Tag calls by feature, user, session, environment, or anything you want
- 📈 Trend analysis — See token usage over time, spot prompt drift, compare models
- 🔒 100% self-hosted — Your prompts never leave your infrastructure
- 🐳 Docker-first — One command to run locally or in production
- 🌐 REST API — Ingest from any language or framework via HTTP
┌─────────────────────────────────────────────────────────────┐
│ Your Application │
│ │
│ import { track } from '@tokenwatcher/sdk' │
│ │
│ const result = await track( │
│ () => openai.chat.completions.create({...}), │
│ { model: 'gpt-4o', tags: { feature: 'chat' } } │
│ ) │
└─────────────────────┬───────────────────────────────────────┘
│ HTTP POST /api/ingest
▼
┌─────────────────────────────────────────────────────────────┐
│ TokenWatcher Server │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ Ingest API │──▶│ PostgreSQL │──▶│ Dashboard │ │
│ │ (Next.js) │ │ + Prisma │ │ (Next.js) │ │
│ └──────────────┘ └──────────────┘ └───────────────┘ │
│ │ │
│ ┌──────────▼──────────┐ │
│ │ Alert Engine │ │
│ │ (cron + webhooks) │ │
│ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
git clone https://github.com/sandip-sol/token-watcher.git
cd tokenwatcher
cp .env.example .env
docker compose up -dOpen http://localhost:3000 — done.
git clone https://github.com/sandip-sol/token-watcher.git
cd tokenwatcher
# Install dependencies
npm install
# Set up environment
cp .env.example .env
# Edit .env with your DATABASE_URL and DIRECT_URL
# Set up database
npx prisma migrate dev
# Start dev server
npm run devnpm install @tokenwatcher/sdk// tokenwatcher.ts
import { TokenWatcher } from '@tokenwatcher/sdk'
export const tw = new TokenWatcher({
endpoint: 'http://localhost:3000', // your self-hosted instance
apiKey: process.env.TOKENWATCHER_API_KEY,
})import { tw } from './tokenwatcher'
import OpenAI from 'openai'
const openai = new OpenAI()
// Wrap any LLM call — TokenWatcher captures tokens, cost, and latency automatically
const response = await tw.track(
() => openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello!' }],
}),
{
model: 'gpt-4o',
provider: 'openai',
tags: {
feature: 'chat',
userId: 'user_123',
environment: 'production',
}
}
)import Anthropic from '@anthropic-ai/sdk'
const anthropic = new Anthropic()
const response = await tw.track(
() => anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hello!' }],
}),
{
model: 'claude-sonnet-4-20250514',
provider: 'anthropic',
tags: { feature: 'summarizer' }
}
)curl -X POST http://localhost:3000/api/ingest \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-4o",
"provider": "openai",
"inputTokens": 512,
"outputTokens": 128,
"latencyMs": 1240,
"tags": { "feature": "chat", "userId": "user_123" }
}'The dashboard gives you:
| View | What you see |
|---|---|
| Overview | Total spend, token volume, avg latency — today vs yesterday |
| By Model | Cost breakdown per model, token efficiency comparison |
| By Tag | Which features/users/routes cost the most |
| Trends | 30-day spend and token usage over time |
| Alerts | Budget rules — get notified before you're surprised |
| Provider | Models | Cost Data |
|---|---|---|
| OpenAI | GPT-4o, GPT-4o-mini, o1, o3 | ✅ Auto-calculated |
| Anthropic | Claude Opus, Sonnet, Haiku | ✅ Auto-calculated |
| Gemini 1.5 Pro, Flash | ✅ Auto-calculated | |
| Ollama | Any local model | ✅ (estimated, configurable) |
| Custom | Any model | ✅ Set price per 1M tokens |
Cost data is kept up to date by the community. Submit a PR to add new models!
# Database
DATABASE_URL="postgresql://user:password@localhost:5432/tokenwatcher"
# For Prisma migrations. With local Postgres this can match DATABASE_URL.
# With Supabase, use the Direct connection URL instead of the pooler URL.
DIRECT_URL="postgresql://user:password@localhost:5432/tokenwatcher"
# Auth (generate with: openssl rand -base64 32)
NEXTAUTH_SECRET="your-secret-here"
NEXTAUTH_URL="http://localhost:3000"
# API key for SDK authentication
TOKENWATCHER_API_KEY="your-api-key-here"
# Alerts (optional)
SLACK_WEBHOOK_URL="https://hooks.slack.com/..."
SMTP_HOST="smtp.example.com"
SMTP_PORT="587"
SMTP_USER="alerts@example.com"
SMTP_PASS="your-smtp-password"- Core ingest API
- PostgreSQL persistence via Prisma
- Dashboard with cost breakdown
- TypeScript SDK
- Budget alerts (Slack + webhook)
- Per-user cost attribution
- Prompt diff tracking (detect prompt growth over time)
- Multi-workspace support
- Grafana datasource plugin
- Python SDK
- AI-powered cost optimization suggestions
We love contributions! See CONTRIBUTING.md for how to get started.
Good first issues are tagged help wanted.
See the deployment guide for instructions on deploying to:
- Railway (one click)
- Render
- Fly.io
- Your own VPS (Docker Compose)
MIT — use it however you want. See LICENSE.
Built with ❤️ by the community. If TokenWatcher saves you money, please ⭐ the repo!
