You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: blog/2025-09-24_kvcache-wins-you-can-see.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -120,9 +120,9 @@ This is precisely what llm-d provides (pun intended). It creates a **global view
120
120
121
121
### **How It Works: A Global Cache View via KVEvents**
122
122
123
-
The global cache view is built upon a continuous stream of [**`KVEvents`**](https://docs.vllm.ai/en/latest/api/vllm/config/kv_events.html) from each vLLM pod, which are processed efficiently by the open-source [**`llm-d-kv-cache-manager`**](https://github.com/llm-d/llm-d-kv-cache-manager) library.
123
+
The global cache view is built upon a continuous stream of [**`KVEvents`**](https://docs.vllm.ai/en/latest/api/vllm/config/kv_events.html) from each vLLM pod, which are processed efficiently by the open-source [**`llm-d-kv-cache`**](https://github.com/llm-d/llm-d-kv-cache) library.
124
124
125
-
The `KVEvents` provide a live feed of all physical cache changes across the cluster, firing every time a cache block is created or evicted. This stream is then ingested and organized by the llm-d-kv-cache-manager library's components:
125
+
The `KVEvents` provide a live feed of all physical cache changes across the cluster, firing every time a cache block is created or evicted. This stream is then ingested and organized by the llm-d-kv-cache library's components:
126
126
127
127
1.**`kvevents.Pool`**: This component consumes the high-throughput stream of events. As it digests them, it continuously updates a low-level **KV-Block Index**, which maintains a simple, real-time map of block-hashes to the pod and memory-medium (GPU/CPU) it resides on.
128
128
2.**`kvcache.Index`**: This is the higher-level index used by the scheduler. It uses the underlying KV-Block Index to map logical sequences of tokens (i.e., prefixes) to the pods that hold them. This provides the direct answer to the question, "what percentage of this request's prefix is on the accessible Pods?"
@@ -316,7 +316,7 @@ For this workload, in an ideal state, caching the shared prefixes for all active
316
316
317
317
This benchmark, therefore, tests the scheduler's ability to efficiently manage the disaggregated KV-cache. In a real-world scenario, if the total cache demand were to exceed the cluster's capacity, an autoscaling system would be responsible for spinning up more replicas to maintain SLOs. Here, we focus on **maximizing the performance of the existing hardware**\- a task where cache-blind configurations create massive queues and high latency.
318
318
319
-
The tools and specifics of the experiment are captured in this [llm-d-kv-cache-manager benchmarking report](https://github.com/llm-d/llm-d-kv-cache-manager/blob/main/benchmarking/73-capacity/README.md).
319
+
The tools and specifics of the experiment are captured in this [llm-d-kv-cache benchmarking report](https://github.com/llm-d/llm-d-kv-cache/blob/main/benchmarking/73-capacity/README.md).
Copy file name to clipboardExpand all lines: remote-content/remote-sources/components-data.yaml
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -37,6 +37,7 @@ components:
37
37
sidebarPosition: 5
38
38
version: v1.3.4
39
39
- name: llm-d-kv-cache-manager
40
+
# note: this is renamed to llm-d-kv-cache in > v0.4.0
40
41
org: llm-d
41
42
sidebarLabel: KV Cache Manager
42
43
description: This repository contains the llm-d-kv-cache-manager, a pluggable service designed to enable KV-Cache Aware Routing and lay the foundation for advanced, cross-node cache coordination in vLLM-based serving platforms.
0 commit comments