English | 中文
Self-hosted LLM performance testing and monitoring platform.
Benchmark, monitor, and compare LLM API providers in one place — measure latency, TTFT, throughput, and reliability across OpenAI, Anthropic, Google Gemini, and any OpenAI-compatible endpoint. Deploy with a single command via Docker or script.
Running LLMs in production means juggling multiple providers, each with different latency profiles, rate limits, and reliability. Public benchmarks don't reflect your network, your prompts, or your traffic patterns. LLM API Bench is a self-hosted tool that lets you:
- Benchmark — Compare providers head-to-head with identical prompts, configurable concurrency (1–5000), and warmup runs
- Monitor — Continuously health-check your providers with four-tier status (healthy / slow / very slow / down) and 24h history
- Playground — Interactively test any model with streaming, vision, and full token-level metrics
- Track — Persistent history of all runs with side-by-side comparison and CSV/JSON export
- Deploy — One-command setup via Docker Compose or shell script, single
.envfor all config
| Workflow — Configure & Run | Monitor — Health Checks |
![]() |
![]() |
| Playground — Test Models | History — Past Runs |
![]() |
![]() |
- OpenAI — GPT-4o, GPT-4o-mini, o1, o3, and more
- Anthropic — Claude 4 Sonnet, Opus, Haiku
- Google Gemini — 2.5 Pro, Flash
- OpenAI-Compatible — DeepSeek, Mistral, local LLMs (Ollama, vLLM), and other endpoints that speak the OpenAI protocol
| Metric | Description |
|---|---|
| Response Time | Average / P50 / P95 / P99 latency |
| Token Speed | Input & output tokens per second |
| TTFT | Time to First Token (streaming) |
| Throughput | Requests/sec under concurrent load |
| Success Rate | Completed vs failed requests |
| Cost Estimation | Per-provider cost breakdown |
- Live streaming metrics during benchmark execution
- Per-provider area charts (response time, token speed)
- Radar comparison across all dimensions
- Color-coded provider identity throughout
- Multi-task workflows with sequential execution
- Per-task prompt, concurrency, and iteration config
- Quick presets — 512 / 4K / 16K tokens, up to 5000 concurrency, up to 10M iterations
- Warmup runs to eliminate cold-start bias
- Send a prompt to a specific provider/model and inspect the response
- Supports streaming and non-streaming modes
- Vision support — test multimodal models with image URLs or uploads
- Shows token counts, TTFT, TPS, and response time
- Periodic health checks for selected provider/model combinations
- Rich metrics per probe: TTFT, output tokens, response validation
- Configurable health thresholds (latency, TTFT, min output tokens)
- Per-model check intervals (5 min – 6 hours), global default
- Provider-parallel, model-serial scheduling
- 24h history bar with color-coded health status
- Auto-refresh dashboard with summary stats
- JWT-based login with configurable credentials
- Auto-generated persistent secrets (JWT, encryption key, salt) — no hardcoded defaults
- Forced password change on first login with default credentials
- Login rate limiting (5 attempts per 5 minutes)
- Helmet security headers with Content Security Policy
- One-time tokens for SSE/download URLs (no JWT in query strings)
- Session-bound token storage (
sessionStorage, cleared on tab close) - CORS restricted to configured origin (same-origin by default)
- Protected API routes and frontend routing
- Auto-redirect to login page on session expiry
- Persistent run history with full result details
- Side-by-side comparison of past runs
- Export results as JSON or CSV
- One-click build script (
start.sh) - Docker Compose with SQLite volume persistence
- Single
.envfile for all configuration
git clone https://github.com/idemerge/llm-api-bench.git
cd llm-api-bench
cp .env.example .env # edit .env to set credentials
chmod +x start.sh && ./start.shgit clone https://github.com/idemerge/llm-api-bench.git
cd llm-api-bench
cp .env.example .env # edit .env to set credentials
docker compose up -dgit clone https://github.com/idemerge/llm-api-bench.git
cd llm-api-bench
cp .env.example .env
# Backend
cd backend && npm install && npm run dev &
# Frontend
cd ../frontend && npm install && npm run devOpen http://localhost:5173 (dev) or http://localhost:3001 (production) to access the dashboard.
All configuration is managed via a single .env file in the project root:
| Variable | Default | Description |
|---|---|---|
PORT |
3001 |
Server port |
AUTH_USERNAME |
admin |
Login username |
AUTH_PASSWORD |
changeme |
Login password (must change on first login) |
JWT_SECRET |
auto-generated | JWT signing secret (leave empty to auto-generate) |
JWT_EXPIRES_IN |
24h |
JWT token expiry |
ENCRYPTION_SECRET |
auto-generated | API key encryption secret (leave empty to auto-generate) |
CORS_ORIGIN |
same-origin only | Allowed CORS origin (e.g. https://your-domain.com) |
- Log in with your credentials
- Go to Settings
- Click Add Provider
- Select a format (OpenAI / Anthropic / Gemini / OpenAI-Compatible)
- Enter your API endpoint and key
- Click Test Connection to verify
- Start benchmarking!
┌─────────────────────────────────────────────────────────┐
│ Browser │
│ React 19 · Ant Design 5 · Recharts · Tailwind CSS v4 │
└────────────────────────┬────────────────────────────────┘
│ REST / SSE
┌────────────────────────▼────────────────────────────────┐
│ Express Server │
│ ┌──────────┐ ┌───────────┐ ┌──────────────────────┐ │
│ │ Auth │ │ REST API │ │ SSE Stream │ │
│ │ (JWT) │ │ /api/* │ │ /api/workflows/:id │ │
│ └──────────┘ └─────┬─────┘ └──────────┬───────────┘ │
│ │ │ │
│ ┌───────────────────▼───────────────────▼───────────┐ │
│ │ Service Layer │ │
│ │ Benchmark Engine · Workflow Runner · Playground │ │
│ │ Monitor Scheduler (node-cron) │ │
│ └───────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌───────────────────▼───────────────────────────────┐ │
│ │ Provider Adapters │ │
│ │ OpenAI · Anthropic · Gemini · OpenAI-Compatible │ │
│ └───────────────────┬───────────────────────────────┘ │
└──────────────────────┼──────────────────────────────────┘
│
┌────────────▼────────────┐
│ SQLite (better-sqlite3) │
│ Single-file database │
└─────────────────────────┘
The entire stack runs as a single Node.js process — no Redis, no Postgres, no external dependencies. The frontend is built by Vite and served as static files by Express. SQLite (WAL mode) stores all benchmarks, workflows, monitor history, and provider config in one file, making backup and migration trivial.
| Layer | Stack |
|---|---|
| Frontend | React 19, Vite 8, TypeScript, Tailwind CSS v4 |
| UI | Ant Design 5 (dark theme), Recharts, Framer Motion |
| Backend | Node.js, Express 4, TypeScript |
| Auth | JWT (jsonwebtoken + bcryptjs) |
| Storage | SQLite (better-sqlite3, raw SQL, no ORM) |
| Scheduler | node-cron |
| Deploy | Docker (multi-stage alpine) / Shell script |
See CONTRIBUTING.md for development setup and guidelines.




