M2: prefix cache (radix trie) + /v1/prefix_cache_stats endpoint#2
Merged
AIdevsmartdata merged 6 commits intomainfrom Apr 25, 2026
Merged
M2: prefix cache (radix trie) + /v1/prefix_cache_stats endpoint#2AIdevsmartdata merged 6 commits intomainfrom
AIdevsmartdata merged 6 commits intomainfrom
Conversation
…) [cherry-picked from 08141da]
…rt [cherry-picked from f457e36]
…erry-picked from 79f8e5c]
… reap [cherry-picked from c848e11]
…NativeScheduler [cherry-picked from c0c8931]
Adds the missing observability endpoint for the M2 prefix cache. Wires
NativeScheduler::prefix_cache_stats_json() to a non-blocking GET handler
on /v1/prefix_cache_stats.
Behaviour:
- Native scheduler not active → {enabled: false, reason: "..."}
- prefix_trie absent (env gate) → {enabled: false, reason: "..."}
- kill switch on → {enabled: false, reason: "..."}
- trie write-locked → {enabled: true, busy: true}
- normal → full snapshot (hits, misses, hit_rate,
len, cached_bytes, total_query_tokens,
avg_hit_tokens)
Uses try_read on the RwLock so the driver thread never blocks on the
endpoint. Closes the gap flagged in the audit
(ARCHITECTURE.md:511-512 "not landed" → now landed on this branch).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
m2-j2-wiringrebased cleanly on currentmain./v1/prefix_cache_statsendpoint (J6) — closes the "not landed" gap flagged atARCHITECTURE.md:511-512.What's in
prefix_cache.rs(775L) — PATRICIA radix trie with byte-budget LRU eviction,CacheStats(atomic counters),KVBlock(token + KV handle),CacheConfig::from_env().slot_scheduler.rs—NativeSchedulernow optionally ownsArc<RwLock<PrefixTrie>>, admission attempts longest-prefix lookup before scheduling, reap on slot return.llama_backend.rs— public FFI aliases for prefix cache (KV block reuse).chimere-server.rs main()— readsCacheConfig::from_env(), builds trie, passes toNativeScheduler::with_prefix_cache(...).server.rs—prefix_cache_stats_handlerreturning JSON (hits, misses, hit_rate, len, cached_bytes, evictions, etc.). Non-blocking viatry_read./v1/prefix_cache_statsregistered inbuild_router.Why a separate branch (not direct merge of m2-j2-wiring)
The original
m2-j2-wiringbranch diverged frommainbefore the recent docs/benchmarks landed (~28k lines added on main since), so a brute merge would deletejourney-2026-04-24.md,theoretical-ceiling-2026-04-24.md,perf-tuning.md,sweep-bench.sh,stop_token_imend.rs(162L tests) and others. This branch cherry-picks only the 5 M2 commits.Validation
cargo check --release --features server: clean (8 pre-existing warnings unchanged).{enabled: false, reason: "native scheduler not active"}CHIMERE_PREFIX_CACHE=0→{enabled: false, reason: "prefix_trie absent"}{enabled: false, reason: "kill switch"}{enabled: true, busy: true}Test plan
cargo build --release --features server.CHIMERE_PREFIX_CACHE=0(default) → endpoint returns disabled.CHIMERE_PREFIX_CACHE=1 CHIMERE_MULTISLOT=4 CHIMERE_MULTISLOT_NATIVE=1→ endpoint returns live stats.hits >= 3after the first.cargo test --features server prefix_cache(22 tests).Follow-up (not in this PR)
chimere_prefix_cache_*metrics from the same trie snapshot.BENCHMARKS.mdrow: prefix-cache OFF vs ON for repeated-prefix workloads.🤖 Generated with Claude Code