nimakai

NVIDIA NIM model latency benchmarker, written in Nim.

nimakai (నిమ్మకాయి) = lemon in Telugu. NIM + Nim = nimakai.

A focused, single-binary tool that continuously pings NVIDIA NIM models and reports latency metrics. Includes an 80-model tiered catalog, recommendation engine for oh-my-opencode routing, watch mode with alerts, CI health checks, live model discovery, and full sync mode. No bloat, no TUI framework, no telemetry. Just latency numbers.

Also includes nimaproxy — a Rust-based key-rotation proxy for production use.

Metrics

Latest — most recent round-trip time
Avg — rolling average (ring buffer, last 100 samples)
P50 — median latency
P95 — 95th percentile (tail spikes)
P99 — 99th percentile (worst case)
Jitter — standard deviation (consistency)
Stability — composite score 0-100 (P95 + jitter + spike rate + reliability)
Health — UP / TIMEOUT / OVERLOADED / ERROR / NO_KEY / NOT_FOUND
Verdict — Perfect / Normal / Slow / Spiky / Very Slow / Unstable / Not Active / Not Found
Up% — uptime percentage
Tier — S+ / S / A+ / A / A- / B+ / B / C (based on SWE-bench Verified scores)

Install

git clone https://github.com/dirmacs/nimakai.git
cd nimakai
nimble build

Requires Nim 2.0+ and OpenSSL.

Usage

export NVIDIA_API_KEY="nvapi-..."

# Continuous monitoring (S+ and S tier models by default)
nimakai

# Single round, then exit
nimakai --once

# Specific models only
nimakai -m qwen/qwen3.5-122b-a10b,qwen/qwen3.5-397b-a17b

# Filter by tier
nimakai --tier A --once

# Sort by stability score
nimakai --sort stability

# Benchmark models from opencode.json
nimakai --opencode --once

# JSON output
nimakai --once --json

Commands

nimakai                    Continuous benchmark (default)
nimakai catalog            List all 46 known models with tiers and metadata
nimakai recommend          Benchmark and recommend routing changes
nimakai watch              Monitor OMO-routed models with alerts
nimakai check              CI health check with exit codes
nimakai discover           Compare API models against catalog
nimakai history            Show historical benchmark data
nimakai trends             Show latency trend analysis (improving/degrading/stable)
nimakai opencode           Show models from opencode.json + OMO routing
nimakai proxy start        Start nimaproxy daemon (FFI integration)
nimakai proxy stop         Stop nimaproxy daemon
nimakai proxy status       Show nimaproxy live stats

Recommendation Engine

nimakai can benchmark models and recommend optimal routing for oh-my-opencode categories:

# Advisory: show recommendations
nimakai recommend --rounds 3

# Full sync: backup -> diff -> apply to oh-my-opencode.json
nimakai recommend --rounds 5 --apply

# Rollback to previous config
nimakai recommend --rollback

Each OMO category is scored using weighted criteria:

Category Need	SWE Weight	Speed Weight	Stability Weight
Speed (quick)	0.15	0.55	0.20
Quality (deep, artistry)	0.45	0.10	0.20
Reliability (ultrabrain)	0.25	0.20	0.40
Vision (visual-engineering)	0.30	0.20	0.30
Balance (writing, default)	0.30	0.30	0.25

Interactive Keys (continuous mode)

Key	Action
`A`	Sort by average latency
`P`	Sort by P95 latency
`S`	Sort by stability score
`T`	Sort by tier
`N`	Sort by model name
`U`	Sort by uptime %
`1-9`	Toggle favorite on Nth model
`Q`	Quit

Proxy Commands (FFI Integration)

nimakai v0.10.0 includes FFI integration with nimaproxy, allowing you to start/stop/query the Rust key-rotation proxy directly from the Nim CLI:

# Start the proxy daemon
nimakai proxy start --proxy-config /path/to/nimaproxy.toml --proxy-port 8080

# Check live status
nimakai proxy status

# Stop the daemon
nimakai proxy stop

Requirements:

libnimaproxy.so must be in the same directory as nimakai binary, or LD_LIBRARY_PATH must be set
nimaproxy config file with API keys (see nimaproxy section below)

Status output shows:

Overall health status
Active key count
Routing and racing configuration
Per-key status (active/cooldown, key hint)
Per-model latency stats (avg, P95, success rate, degradation)

Options

Flag	Short	Description	Default
`--once`	`-1`	Single round, then exit	continuous
`--models`	`-m`	Comma-separated model IDs	S/S+ tier
`--interval`	`-i`	Ping interval in seconds	5
`--timeout`	`-t`	Request timeout in seconds	15
`--json`	`-j`	JSON output	table
`--tier`		Filter by tier family (S, A, B, C)
`--sort`		Sort: avg, p95, stability, tier, name, uptime	avg
`--opencode`		Use models from opencode.json
`--rounds`	`-r`	Benchmark rounds for recommend	3
`--apply`		Apply recommendations to oh-my-opencode.json
`--rollback`		Rollback oh-my-opencode.json from backup
`--quiet`	`-q`	Suppress stderr status messages
`--no-history`		Don't write to history file
`--dry-run`		Preview recommend changes without applying
`--rec-history`		Show recommendation history
`--throughput`		Measure output token throughput
`--alert-threshold`		Alert threshold for watch mode	50
`--fail-if-degraded`		Exit 1 if any model is degraded (check mode)
`--days`	`-d`	Days of history to show	7
`--profile`		Load named profile from config
`--help`	`-h`	Show help
`--version`	`-v`	Show version

Configuration

Optional config at ~/.config/nimakai/config.json:

{
  "interval": 5,
  "timeout": 15,
  "thresholds": {
    "perfect_avg": 400,
    "perfect_p95": 800,
    "normal_avg": 1000,
    "normal_p95": 2000,
    "spike_ms": 3000
  },
  "profiles": {
    "work": { "interval": 10, "tier_filter": "S", "rounds": 5 },
    "fast": { "timeout": 5 }
  },
  "favorites": []
}

Use profiles with nimakai --profile work to load pre-configured settings.

Custom models can be added via ~/.config/nimakai/models.json to extend the built-in catalog.

History is persisted to ~/.local/share/nimakai/history.jsonl (30-day auto-prune).

Architecture

nimakai (Nim)

src/
  nimakai.nim              Entry point, main loop, SIGINT handler
  nimakai/
    types.nim              Types, enums, constants
    cli.nim                CLI argument parsing with profiles
    metrics.nim            Pure metric functions (avg, p50, p95, p99, jitter, stability)
    ping.nim               HTTP ping + throughput measurement
    catalog.nim            80-model catalog with SWE-bench tiers, O(1) index
    display.nim            Table/JSON rendering, ANSI helpers
    config.nim             Config file persistence + profile loading
    history.nim            JSONL history persistence + trend detection
    opencode.nim           OpenCode + oh-my-opencode integration
    recommend.nim          Recommendation engine (categories + agents + uptime)
    rechistory.nim         Recommendation history tracking (JSONL)
    sync.nim               Backup, apply, rollback for OMO config
    watch.nim              Watch mode alerting (down/recovered/degraded)
    discovery.nim          Live model discovery from NVIDIA API
tests/
    test_types.nim         9 tests
    test_metrics.nim       41 tests
    test_display.nim       32 tests
    test_ping.nim          15 tests
    test_catalog.nim       22 tests
    test_config.nim        12 tests
    test_opencode.nim      5 tests
    test_recommend.nim     34 tests
    test_sync.nim          17 tests
    test_history.nim       28 tests
    test_rechistory.nim    9 tests
    test_watch.nim         8 tests
    test_integration.nim   12 tests
    test_discovery.nim     9 tests
    test_cli.nim           72 tests

nimaproxy (Rust)

nimaproxy/
  Cargo.toml               lib + bin + tests
  nimaproxy.toml           Config (NOT committed - contains API keys)
  nimaproxy.toml.example   Template for users
  .gitignore               Excludes nimaproxy.toml
  src/
    lib.rs                 Exports modules + AppState
    main.rs                Binary entry point
    config.rs              TOML config parsing
    key_pool.rs            Key rotation, rate-limit tracking
    model_stats.rs          Per-model latency tracking
    model_router.rs        Latency-aware model selection
    proxy.rs               HTTP handlers
  tests/
    integration.rs         12 integration tests
    e2e_live.rs            6 E2E tests with real NVIDIA API
    stress_test.rs         25-turn live stress test

nimaproxy — Key-Rotation Proxy

Standalone Rust binary for production use. Provides OpenAI-compatible API with key rotation and latency-aware routing.

cd nimaproxy
cargo build --release

# Copy and edit config
cp nimaproxy.toml.example nimaproxy.toml
# Edit nimaproxy.toml with your NVIDIA API keys

# Run
./target/release/nimaproxy --config nimaproxy.toml

Endpoints:

GET /health — Key pool status
GET /stats — Per-model latency stats
GET /v1/models — Passthrough to NVIDIA
POST /v1/chat/completions — Proxy with key rotation

Features:

Round-robin key rotation across multiple API keys
Automatic 429 handling with per-key cooldown
Latency-aware model routing ("model": "auto")
Per-model stats tracking (TTFC, success rate, degradation detection)
x-key-label response header: tracks which key was used for rotation debugging

Model Routing (V2):

[routing]
strategy = "latency_aware"
spike_threshold_ms = 3000
models = [
  "nvidia/meta/llama-3.3-70b-instruct",
  "nvidia/qwen/qwen2.5-coder-32b-instruct",
  "nvidia/moonshotai/kimi-k2-instruct",
  "nvidia/mistralai/devstral-2-123b-instruct-2512",
]

When a request arrives with "model": "auto", the proxy picks the best model from this list. Untried models (< 3 samples) get priority. Degraded models (≥3 consecutive failures or avg > spike_threshold_ms) are skipped.

Model Racing (Speculative Execution):

[racing]
enabled = true
models = [
  "moonshotai/kimi-k2",
  "minimaxai/minimax-m2.5",
  "qwen/qwen3-next-80b-a3b-thinking",
  "stepfun-ai/step-3.5-flash",
  "qwen/qwen3.5-122b-a10b",
  "qwen/qwen3-coder-480b-a35b-instruct",
  "google/gemma-3-27b-it",
  "qwen/qwen2.5-coder-32b-instruct",
]
max_parallel = 8
timeout_ms = 15000
strategy = "complete"

Fires N parallel requests to N models, returns first response. Trades N×token budget for min(P50 latency). Keys are pre-allocated per race task to avoid 429 rate-limit collisions. Models are selected in round-robin order via racing_cursor to prevent a single fast model from dominating and breaking inference loops. Dead models (≥20 consecutive failures or 0 samples) are filtered out automatically.

Model Compatibility (Developer Role Transformation):

[model_compat]
# Models that support the 'developer' role (don't need transformation)
# All models NOT in this list will have 'developer' role transformed to 'user'
supports_developer_role = []

# Models that support tool messages (don't need transformation)
# All models NOT in this list will have 'tool' role transformed to 'assistant'
supports_tool_messages = []

Transforms OpenAI-style developer and tool roles to user and assistant for models that don't support them. This fixes 400 "Unknown message role" errors when using OMP or other agents that send developer role messages. By default, all models have roles transformed (empty lists = transform all).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
.planning		.planning
assets		assets
nimaproxy		nimaproxy
rustping		rustping
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
COVERAGE-FINAL.md		COVERAGE-FINAL.md
COVERAGE.md		COVERAGE.md
Justfile		Justfile
LICENSE		LICENSE
NIMAPROXY_PUBLIC_ENDPOINT.md		NIMAPROXY_PUBLIC_ENDPOINT.md
README.md		README.md
mise.toml		mise.toml
nim.cfg		nim.cfg
nimakai.nimble		nimakai.nimble

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nimakai

Metrics

Install

Usage

Commands

Recommendation Engine

Interactive Keys (continuous mode)

Proxy Commands (FFI Integration)

Options

Configuration

Architecture

nimakai (Nim)

nimaproxy (Rust)

nimaproxy — Key-Rotation Proxy

License

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

nimakai

Metrics

Install

Usage

Commands

Recommendation Engine

Interactive Keys (continuous mode)

Proxy Commands (FFI Integration)

Options

Configuration

Architecture

nimakai (Nim)

nimaproxy (Rust)

nimaproxy — Key-Rotation Proxy

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages