Skip to content

dirmacs/nimakai

nimakai logo

nimakai

NVIDIA NIM model latency benchmarker, written in Nim.

nimakai (నిమ్మకాయి) = lemon in Telugu. NIM + Nim = nimakai.


A focused, single-binary tool that continuously pings NVIDIA NIM models and reports latency metrics. Includes an 80-model tiered catalog, recommendation engine for oh-my-opencode routing, watch mode with alerts, CI health checks, live model discovery, and full sync mode. No bloat, no TUI framework, no telemetry. Just latency numbers.

Also includes nimaproxy — a Rust-based key-rotation proxy for production use.

Metrics

  • Latest — most recent round-trip time
  • Avg — rolling average (ring buffer, last 100 samples)
  • P50 — median latency
  • P95 — 95th percentile (tail spikes)
  • P99 — 99th percentile (worst case)
  • Jitter — standard deviation (consistency)
  • Stability — composite score 0-100 (P95 + jitter + spike rate + reliability)
  • Health — UP / TIMEOUT / OVERLOADED / ERROR / NO_KEY / NOT_FOUND
  • Verdict — Perfect / Normal / Slow / Spiky / Very Slow / Unstable / Not Active / Not Found
  • Up% — uptime percentage
  • Tier — S+ / S / A+ / A / A- / B+ / B / C (based on SWE-bench Verified scores)

Install

git clone https://github.com/dirmacs/nimakai.git
cd nimakai
nimble build

Requires Nim 2.0+ and OpenSSL.

Usage

export NVIDIA_API_KEY="nvapi-..."

# Continuous monitoring (S+ and S tier models by default)
nimakai

# Single round, then exit
nimakai --once

# Specific models only
nimakai -m qwen/qwen3.5-122b-a10b,qwen/qwen3.5-397b-a17b

# Filter by tier
nimakai --tier A --once

# Sort by stability score
nimakai --sort stability

# Benchmark models from opencode.json
nimakai --opencode --once

# JSON output
nimakai --once --json

Commands

nimakai                    Continuous benchmark (default)
nimakai catalog            List all 46 known models with tiers and metadata
nimakai recommend          Benchmark and recommend routing changes
nimakai watch              Monitor OMO-routed models with alerts
nimakai check              CI health check with exit codes
nimakai discover           Compare API models against catalog
nimakai history            Show historical benchmark data
nimakai trends             Show latency trend analysis (improving/degrading/stable)
nimakai opencode           Show models from opencode.json + OMO routing
nimakai proxy start        Start nimaproxy daemon (FFI integration)
nimakai proxy stop         Stop nimaproxy daemon
nimakai proxy status       Show nimaproxy live stats

Recommendation Engine

nimakai can benchmark models and recommend optimal routing for oh-my-opencode categories:

# Advisory: show recommendations
nimakai recommend --rounds 3

# Full sync: backup -> diff -> apply to oh-my-opencode.json
nimakai recommend --rounds 5 --apply

# Rollback to previous config
nimakai recommend --rollback

Each OMO category is scored using weighted criteria:

Category Need SWE Weight Speed Weight Stability Weight
Speed (quick) 0.15 0.55 0.20
Quality (deep, artistry) 0.45 0.10 0.20
Reliability (ultrabrain) 0.25 0.20 0.40
Vision (visual-engineering) 0.30 0.20 0.30
Balance (writing, default) 0.30 0.30 0.25

Interactive Keys (continuous mode)

Key Action
A Sort by average latency
P Sort by P95 latency
S Sort by stability score
T Sort by tier
N Sort by model name
U Sort by uptime %
1-9 Toggle favorite on Nth model
Q Quit

Proxy Commands (FFI Integration)

nimakai v0.10.0 includes FFI integration with nimaproxy, allowing you to start/stop/query the Rust key-rotation proxy directly from the Nim CLI:

# Start the proxy daemon
nimakai proxy start --proxy-config /path/to/nimaproxy.toml --proxy-port 8080

# Check live status
nimakai proxy status

# Stop the daemon
nimakai proxy stop

Requirements:

  • libnimaproxy.so must be in the same directory as nimakai binary, or LD_LIBRARY_PATH must be set
  • nimaproxy config file with API keys (see nimaproxy section below)

Status output shows:

  • Overall health status
  • Active key count
  • Routing and racing configuration
  • Per-key status (active/cooldown, key hint)
  • Per-model latency stats (avg, P95, success rate, degradation)

Options

Flag Short Description Default
--once -1 Single round, then exit continuous
--models -m Comma-separated model IDs S/S+ tier
--interval -i Ping interval in seconds 5
--timeout -t Request timeout in seconds 15
--json -j JSON output table
--tier Filter by tier family (S, A, B, C)
--sort Sort: avg, p95, stability, tier, name, uptime avg
--opencode Use models from opencode.json
--rounds -r Benchmark rounds for recommend 3
--apply Apply recommendations to oh-my-opencode.json
--rollback Rollback oh-my-opencode.json from backup
--quiet -q Suppress stderr status messages
--no-history Don't write to history file
--dry-run Preview recommend changes without applying
--rec-history Show recommendation history
--throughput Measure output token throughput
--alert-threshold Alert threshold for watch mode 50
--fail-if-degraded Exit 1 if any model is degraded (check mode)
--days -d Days of history to show 7
--profile Load named profile from config
--help -h Show help
--version -v Show version

Configuration

Optional config at ~/.config/nimakai/config.json:

{
  "interval": 5,
  "timeout": 15,
  "thresholds": {
    "perfect_avg": 400,
    "perfect_p95": 800,
    "normal_avg": 1000,
    "normal_p95": 2000,
    "spike_ms": 3000
  },
  "profiles": {
    "work": { "interval": 10, "tier_filter": "S", "rounds": 5 },
    "fast": { "timeout": 5 }
  },
  "favorites": []
}

Use profiles with nimakai --profile work to load pre-configured settings.

Custom models can be added via ~/.config/nimakai/models.json to extend the built-in catalog.

History is persisted to ~/.local/share/nimakai/history.jsonl (30-day auto-prune).

Architecture

nimakai (Nim)

src/
  nimakai.nim              Entry point, main loop, SIGINT handler
  nimakai/
    types.nim              Types, enums, constants
    cli.nim                CLI argument parsing with profiles
    metrics.nim            Pure metric functions (avg, p50, p95, p99, jitter, stability)
    ping.nim               HTTP ping + throughput measurement
    catalog.nim            80-model catalog with SWE-bench tiers, O(1) index
    display.nim            Table/JSON rendering, ANSI helpers
    config.nim             Config file persistence + profile loading
    history.nim            JSONL history persistence + trend detection
    opencode.nim           OpenCode + oh-my-opencode integration
    recommend.nim          Recommendation engine (categories + agents + uptime)
    rechistory.nim         Recommendation history tracking (JSONL)
    sync.nim               Backup, apply, rollback for OMO config
    watch.nim              Watch mode alerting (down/recovered/degraded)
    discovery.nim          Live model discovery from NVIDIA API
tests/
    test_types.nim         9 tests
    test_metrics.nim       41 tests
    test_display.nim       32 tests
    test_ping.nim          15 tests
    test_catalog.nim       22 tests
    test_config.nim        12 tests
    test_opencode.nim      5 tests
    test_recommend.nim     34 tests
    test_sync.nim          17 tests
    test_history.nim       28 tests
    test_rechistory.nim    9 tests
    test_watch.nim         8 tests
    test_integration.nim   12 tests
    test_discovery.nim     9 tests
    test_cli.nim           72 tests

nimaproxy (Rust)

nimaproxy/
  Cargo.toml               lib + bin + tests
  nimaproxy.toml           Config (NOT committed - contains API keys)
  nimaproxy.toml.example   Template for users
  .gitignore               Excludes nimaproxy.toml
  src/
    lib.rs                 Exports modules + AppState
    main.rs                Binary entry point
    config.rs              TOML config parsing
    key_pool.rs            Key rotation, rate-limit tracking
    model_stats.rs          Per-model latency tracking
    model_router.rs        Latency-aware model selection
    proxy.rs               HTTP handlers
  tests/
    integration.rs         12 integration tests
    e2e_live.rs            6 E2E tests with real NVIDIA API
    stress_test.rs         25-turn live stress test

nimaproxy — Key-Rotation Proxy

Standalone Rust binary for production use. Provides OpenAI-compatible API with key rotation and latency-aware routing.

cd nimaproxy
cargo build --release

# Copy and edit config
cp nimaproxy.toml.example nimaproxy.toml
# Edit nimaproxy.toml with your NVIDIA API keys

# Run
./target/release/nimaproxy --config nimaproxy.toml

Endpoints:

  • GET /health — Key pool status
  • GET /stats — Per-model latency stats
  • GET /v1/models — Passthrough to NVIDIA
  • POST /v1/chat/completions — Proxy with key rotation

Features:

  • Round-robin key rotation across multiple API keys
  • Automatic 429 handling with per-key cooldown
  • Latency-aware model routing ("model": "auto")
  • Per-model stats tracking (TTFC, success rate, degradation detection)
  • x-key-label response header: tracks which key was used for rotation debugging

Model Routing (V2):

[routing]
strategy = "latency_aware"
spike_threshold_ms = 3000
models = [
  "nvidia/meta/llama-3.3-70b-instruct",
  "nvidia/qwen/qwen2.5-coder-32b-instruct",
  "nvidia/moonshotai/kimi-k2-instruct",
  "nvidia/mistralai/devstral-2-123b-instruct-2512",
]

When a request arrives with "model": "auto", the proxy picks the best model from this list. Untried models (< 3 samples) get priority. Degraded models (≥3 consecutive failures or avg > spike_threshold_ms) are skipped.

Model Racing (Speculative Execution):

[racing]
enabled = true
models = [
  "moonshotai/kimi-k2",
  "minimaxai/minimax-m2.5",
  "qwen/qwen3-next-80b-a3b-thinking",
  "stepfun-ai/step-3.5-flash",
  "qwen/qwen3.5-122b-a10b",
  "qwen/qwen3-coder-480b-a35b-instruct",
  "google/gemma-3-27b-it",
  "qwen/qwen2.5-coder-32b-instruct",
]
max_parallel = 8
timeout_ms = 15000
strategy = "complete"

Fires N parallel requests to N models, returns first response. Trades N×token budget for min(P50 latency). Keys are pre-allocated per race task to avoid 429 rate-limit collisions. Models are selected in round-robin order via racing_cursor to prevent a single fast model from dominating and breaking inference loops. Dead models (≥20 consecutive failures or 0 samples) are filtered out automatically.

Model Compatibility (Developer Role Transformation):

[model_compat]
# Models that support the 'developer' role (don't need transformation)
# All models NOT in this list will have 'developer' role transformed to 'user'
supports_developer_role = []

# Models that support tool messages (don't need transformation)
# All models NOT in this list will have 'tool' role transformed to 'assistant'
supports_tool_messages = []

Transforms OpenAI-style developer and tool roles to user and assistant for models that don't support them. This fixes 400 "Unknown message role" errors when using OMP or other agents that send developer role messages. By default, all models have roles transformed (empty lists = transform all).

License

MIT

About

NVIDIA NIM model latency benchmarker, written in Nim

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages