Veeksha is a high-fidelity benchmarking framework for LLM inference systems. Whether you're optimizing a production deployment, comparing serving backends, or running capacity planning experiments, Veeksha lets you measure what matters to you: realistic multi-turn conversations, agentic workflows, high-frequency stress tests, or targeted microbenchmarks. One tool, any workload.
From isolated requests to complex agentic sessions, Veeksha captures the full complexity of modern LLM workloads.
👉 Why Veeksha? — Learn what sets Veeksha apart
📚 Documentation — Full guides and API reference
No install needed; run directly with uvx:
uvx -p 3.14t veeksha benchmark \
--client.type openai_chat_completions \
--client.api_base http://localhost:8000/v1 \
--client.model meta-llama/Llama-3.2-1B-Instruct \
--traffic_scheduler.type rate \
--traffic_scheduler.interval_generator.type poisson \
--traffic_scheduler.interval_generator.arrival_rate 5.0 \
--runtime.benchmark_timeout 60Or use a YAML configuration file:
uvx -p 3.14t veeksha benchmark --config my_benchmark.veeksha.ymlOr install with uv pip install veeksha / pip install veeksha and use veeksha directly.
We require free-threaded Python for worker parallelism.
git clone https://github.com/project-vajra/veeksha.git
cd veeksha
# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create an environment
uv venv --python 3.14t
source .venv/bin/activate
uv pip install -e .