Skip to content

M2: prefix cache (radix trie) + /v1/prefix_cache_stats endpoint#2

Merged
AIdevsmartdata merged 6 commits intomainfrom
m2-prefix-cache-rebased
Apr 25, 2026
Merged

M2: prefix cache (radix trie) + /v1/prefix_cache_stats endpoint#2
AIdevsmartdata merged 6 commits intomainfrom
m2-prefix-cache-rebased

Conversation

@AIdevsmartdata
Copy link
Copy Markdown
Owner

Summary

  • Cherry-picks the 5 M2 commits from m2-j2-wiring rebased cleanly on current main.
  • Adds the missing /v1/prefix_cache_stats endpoint (J6) — closes the "not landed" gap flagged at ARCHITECTURE.md:511-512.

What's in

  • prefix_cache.rs (775L) — PATRICIA radix trie with byte-budget LRU eviction, CacheStats (atomic counters), KVBlock (token + KV handle), CacheConfig::from_env().
  • slot_scheduler.rsNativeScheduler now optionally owns Arc<RwLock<PrefixTrie>>, admission attempts longest-prefix lookup before scheduling, reap on slot return.
  • llama_backend.rs — public FFI aliases for prefix cache (KV block reuse).
  • chimere-server.rs main() — reads CacheConfig::from_env(), builds trie, passes to NativeScheduler::with_prefix_cache(...).
  • NEW server.rsprefix_cache_stats_handler returning JSON (hits, misses, hit_rate, len, cached_bytes, evictions, etc.). Non-blocking via try_read.
  • NEW route /v1/prefix_cache_stats registered in build_router.

Why a separate branch (not direct merge of m2-j2-wiring)

The original m2-j2-wiring branch diverged from main before the recent docs/benchmarks landed (~28k lines added on main since), so a brute merge would delete journey-2026-04-24.md, theoretical-ceiling-2026-04-24.md, perf-tuning.md, sweep-bench.sh, stop_token_imend.rs (162L tests) and others. This branch cherry-picks only the 5 M2 commits.

Validation

  • cargo check --release --features server: clean (8 pre-existing warnings unchanged).
  • 22 prefix-cache unit tests inherited from M2 (pass on the source branch; re-run once merged).
  • Endpoint behaviour matrix:
    • Native scheduler absent → {enabled: false, reason: "native scheduler not active"}
    • CHIMERE_PREFIX_CACHE=0{enabled: false, reason: "prefix_trie absent"}
    • Kill switch → {enabled: false, reason: "kill switch"}
    • Trie write-locked → {enabled: true, busy: true}
    • Normal → full snapshot

Test plan

  • Build with cargo build --release --features server.
  • Run with CHIMERE_PREFIX_CACHE=0 (default) → endpoint returns disabled.
  • Run with CHIMERE_PREFIX_CACHE=1 CHIMERE_MULTISLOT=4 CHIMERE_MULTISLOT_NATIVE=1 → endpoint returns live stats.
  • Send 4 identical prompts back-to-back → expect hits >= 3 after the first.
  • Run cargo test --features server prefix_cache (22 tests).

Follow-up (not in this PR)

  • Wire the Prometheus exporter to also surface chimere_prefix_cache_* metrics from the same trie snapshot.
  • Add a focused BENCHMARKS.md row: prefix-cache OFF vs ON for repeated-prefix workloads.

🤖 Generated with Claude Code

AIdevsmartdata and others added 6 commits April 25, 2026 19:47
Adds the missing observability endpoint for the M2 prefix cache. Wires
NativeScheduler::prefix_cache_stats_json() to a non-blocking GET handler
on /v1/prefix_cache_stats.

Behaviour:
- Native scheduler not active   → {enabled: false, reason: "..."}
- prefix_trie absent (env gate) → {enabled: false, reason: "..."}
- kill switch on                → {enabled: false, reason: "..."}
- trie write-locked             → {enabled: true, busy: true}
- normal                        → full snapshot (hits, misses, hit_rate,
                                  len, cached_bytes, total_query_tokens,
                                  avg_hit_tokens)

Uses try_read on the RwLock so the driver thread never blocks on the
endpoint. Closes the gap flagged in the audit
(ARCHITECTURE.md:511-512 "not landed" → now landed on this branch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AIdevsmartdata AIdevsmartdata marked this pull request as ready for review April 25, 2026 19:40
@AIdevsmartdata AIdevsmartdata merged commit ec9fccf into main Apr 25, 2026
1 check failed
@AIdevsmartdata AIdevsmartdata deleted the m2-prefix-cache-rebased branch April 25, 2026 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant