LLM API Bench

English | 中文

Self-hosted LLM performance testing and monitoring platform.

Benchmark, monitor, and compare LLM API providers in one place — measure latency, TTFT, throughput, and reliability across OpenAI, Anthropic, Google Gemini, and any OpenAI-compatible endpoint. Deploy with a single command via Docker or script.

Why LLM API Bench?

Running LLMs in production means juggling multiple providers, each with different latency profiles, rate limits, and reliability. Public benchmarks don't reflect your network, your prompts, or your traffic patterns. LLM API Bench is a self-hosted tool that lets you:

Benchmark — Compare providers head-to-head with identical prompts, configurable concurrency (1–5000), and warmup runs
Monitor — Continuously health-check your providers with four-tier status (healthy / slow / very slow / down) and 24h history
Playground — Interactively test any model with streaming, vision, and full token-level metrics
Track — Persistent history of all runs with side-by-side comparison and CSV/JSON export
Deploy — One-command setup via Docker Compose or shell script, single .env for all config

Demo

Screenshots

Workflow — Configure & Run	Monitor — Health Checks

Playground — Test Models	History — Past Runs

Features

Multi-Provider Support

OpenAI — GPT-4o, GPT-4o-mini, o1, o3, and more
Anthropic — Claude 4 Sonnet, Opus, Haiku
Google Gemini — 2.5 Pro, Flash
OpenAI-Compatible — DeepSeek, Mistral, local LLMs (Ollama, vLLM), and other endpoints that speak the OpenAI protocol

Comprehensive Metrics

Metric	Description
Response Time	Average / P50 / P95 / P99 latency
Token Speed	Input & output tokens per second
TTFT	Time to First Token (streaming)
Throughput	Requests/sec under concurrent load
Success Rate	Completed vs failed requests
Cost Estimation	Per-provider cost breakdown

Real-Time Visualization

Live streaming metrics during benchmark execution
Per-provider area charts (response time, token speed)
Radar comparison across all dimensions
Color-coded provider identity throughout

Workflow Engine

Multi-task workflows with sequential execution
Per-task prompt, concurrency, and iteration config
Quick presets — 512 / 4K / 16K tokens, up to 5000 concurrency, up to 10M iterations
Warmup runs to eliminate cold-start bias

Playground

Send a prompt to a specific provider/model and inspect the response
Supports streaming and non-streaming modes
Vision support — test multimodal models with image URLs or uploads
Shows token counts, TTFT, TPS, and response time

Monitor

Periodic health checks for selected provider/model combinations
Rich metrics per probe: TTFT, output tokens, response validation
Configurable health thresholds (latency, TTFT, min output tokens)
Per-model check intervals (5 min – 6 hours), global default
Provider-parallel, model-serial scheduling
24h history bar with color-coded health status
Auto-refresh dashboard with summary stats

Authentication & Security

JWT-based login with configurable credentials
Auto-generated persistent secrets (JWT, encryption key, salt) — no hardcoded defaults
Forced password change on first login with default credentials
Login rate limiting (5 attempts per 5 minutes)
Helmet security headers with Content Security Policy
One-time tokens for SSE/download URLs (no JWT in query strings)
Session-bound token storage (sessionStorage, cleared on tab close)
CORS restricted to configured origin (same-origin by default)
Protected API routes and frontend routing
Auto-redirect to login page on session expiry

History & Export

Persistent run history with full result details
Side-by-side comparison of past runs
Export results as JSON or CSV

Deployment

One-click build script (start.sh)
Docker Compose with SQLite volume persistence
Single .env file for all configuration

Quick Start

Option 1: One-Click Script (Production)

git clone https://github.com/idemerge/llm-api-bench.git
cd llm-api-bench
cp .env.example .env    # edit .env to set credentials
chmod +x start.sh && ./start.sh

Option 2: Docker Compose (Production)

git clone https://github.com/idemerge/llm-api-bench.git
cd llm-api-bench
cp .env.example .env    # edit .env to set credentials
docker compose up -d

Option 3: Development

git clone https://github.com/idemerge/llm-api-bench.git
cd llm-api-bench
cp .env.example .env

# Backend
cd backend && npm install && npm run dev &

# Frontend
cd ../frontend && npm install && npm run dev

Open http://localhost:5173 (dev) or http://localhost:3001 (production) to access the dashboard.

Configuration

All configuration is managed via a single .env file in the project root:

Variable	Default	Description
`PORT`	`3001`	Server port
`AUTH_USERNAME`	`admin`	Login username
`AUTH_PASSWORD`	`changeme`	Login password (must change on first login)
`JWT_SECRET`	auto-generated	JWT signing secret (leave empty to auto-generate)
`JWT_EXPIRES_IN`	`24h`	JWT token expiry
`ENCRYPTION_SECRET`	auto-generated	API key encryption secret (leave empty to auto-generate)
`CORS_ORIGIN`	same-origin only	Allowed CORS origin (e.g. `https://your-domain.com`)

Connect Real Providers

Log in with your credentials
Go to Settings
Click Add Provider
Select a format (OpenAI / Anthropic / Gemini / OpenAI-Compatible)
Enter your API endpoint and key
Click Test Connection to verify
Start benchmarking!

Architecture

┌─────────────────────────────────────────────────────────┐
│                        Browser                          │
│  React 19 · Ant Design 5 · Recharts · Tailwind CSS v4  │
└────────────────────────┬────────────────────────────────┘
                         │ REST / SSE
┌────────────────────────▼────────────────────────────────┐
│                   Express Server                        │
│  ┌──────────┐  ┌───────────┐  ┌──────────────────────┐  │
│  │ Auth     │  │ REST API  │  │ SSE Stream           │  │
│  │ (JWT)    │  │ /api/*    │  │ /api/workflows/:id   │  │
│  └──────────┘  └─────┬─────┘  └──────────┬───────────┘  │
│                      │                   │              │
│  ┌───────────────────▼───────────────────▼───────────┐  │
│  │              Service Layer                        │  │
│  │  Benchmark Engine · Workflow Runner · Playground   │  │
│  │  Monitor Scheduler (node-cron)                    │  │
│  └───────────────────┬───────────────────────────────┘  │
│                      │                                  │
│  ┌───────────────────▼───────────────────────────────┐  │
│  │           Provider Adapters                       │  │
│  │  OpenAI · Anthropic · Gemini · OpenAI-Compatible  │  │
│  └───────────────────┬───────────────────────────────┘  │
└──────────────────────┼──────────────────────────────────┘
                       │
          ┌────────────▼────────────┐
          │   SQLite (better-sqlite3) │
          │   Single-file database  │
          └─────────────────────────┘

The entire stack runs as a single Node.js process — no Redis, no Postgres, no external dependencies. The frontend is built by Vite and served as static files by Express. SQLite (WAL mode) stores all benchmarks, workflows, monitor history, and provider config in one file, making backup and migration trivial.

Layer	Stack
Frontend	React 19, Vite 8, TypeScript, Tailwind CSS v4
UI	Ant Design 5 (dark theme), Recharts, Framer Motion
Backend	Node.js, Express 4, TypeScript
Auth	JWT (jsonwebtoken + bcryptjs)
Storage	SQLite (better-sqlite3, raw SQL, no ORM)
Scheduler	node-cron
Deploy	Docker (multi-stage alpine) / Shell script

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
.github		.github
.husky		.husky
backend		backend
design		design
docs		docs
frontend		frontend
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README-cn.md		README-cn.md
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM API Bench

Why LLM API Bench?

Demo

Screenshots

Features

Multi-Provider Support

Comprehensive Metrics

Real-Time Visualization

Workflow Engine

Playground

Monitor

Authentication & Security

History & Export

Deployment

Quick Start

Option 1: One-Click Script (Production)

Option 2: Docker Compose (Production)

Option 3: Development

Configuration

Connect Real Providers

Architecture

Star History

Contributing

License

About

Uh oh!

Releases 20

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM API Bench

Why LLM API Bench?

Demo

Screenshots

Features

Multi-Provider Support

Comprehensive Metrics

Real-Time Visualization

Workflow Engine

Playground

Monitor

Authentication & Security

History & Export

Deployment

Quick Start

Option 1: One-Click Script (Production)

Option 2: Docker Compose (Production)

Option 3: Development

Configuration

Connect Real Providers

Architecture

Star History

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 20

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages