Skip to content

Commit a792cc2

Browse files
authored
Rename llm-d-kv-cache-manager (#132)
Signed-off-by: Pete Cheslock <pete.cheslock@redhat.com>
1 parent 39288a4 commit a792cc2

File tree

2 files changed

+4
-3
lines changed

2 files changed

+4
-3
lines changed

blog/2025-09-24_kvcache-wins-you-can-see.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -120,9 +120,9 @@ This is precisely what llm-d provides (pun intended). It creates a **global view
120120

121121
### **How It Works: A Global Cache View via KVEvents**
122122

123-
The global cache view is built upon a continuous stream of [**`KVEvents`**](https://docs.vllm.ai/en/latest/api/vllm/config/kv_events.html) from each vLLM pod, which are processed efficiently by the open-source [**`llm-d-kv-cache-manager`**](https://github.com/llm-d/llm-d-kv-cache-manager) library.
123+
The global cache view is built upon a continuous stream of [**`KVEvents`**](https://docs.vllm.ai/en/latest/api/vllm/config/kv_events.html) from each vLLM pod, which are processed efficiently by the open-source [**`llm-d-kv-cache`**](https://github.com/llm-d/llm-d-kv-cache) library.
124124

125-
The `KVEvents` provide a live feed of all physical cache changes across the cluster, firing every time a cache block is created or evicted. This stream is then ingested and organized by the llm-d-kv-cache-manager library's components:
125+
The `KVEvents` provide a live feed of all physical cache changes across the cluster, firing every time a cache block is created or evicted. This stream is then ingested and organized by the llm-d-kv-cache library's components:
126126

127127
1. **`kvevents.Pool`**: This component consumes the high-throughput stream of events. As it digests them, it continuously updates a low-level **KV-Block Index**, which maintains a simple, real-time map of block-hashes to the pod and memory-medium (GPU/CPU) it resides on.
128128
2. **`kvcache.Index`**: This is the higher-level index used by the scheduler. It uses the underlying KV-Block Index to map logical sequences of tokens (i.e., prefixes) to the pods that hold them. This provides the direct answer to the question, "what percentage of this request's prefix is on the accessible Pods?"
@@ -316,7 +316,7 @@ For this workload, in an ideal state, caching the shared prefixes for all active
316316

317317
This benchmark, therefore, tests the scheduler's ability to efficiently manage the disaggregated KV-cache. In a real-world scenario, if the total cache demand were to exceed the cluster's capacity, an autoscaling system would be responsible for spinning up more replicas to maintain SLOs. Here, we focus on **maximizing the performance of the existing hardware** \- a task where cache-blind configurations create massive queues and high latency.
318318

319-
The tools and specifics of the experiment are captured in this [llm-d-kv-cache-manager benchmarking report](https://github.com/llm-d/llm-d-kv-cache-manager/blob/main/benchmarking/73-capacity/README.md).
319+
The tools and specifics of the experiment are captured in this [llm-d-kv-cache benchmarking report](https://github.com/llm-d/llm-d-kv-cache/blob/main/benchmarking/73-capacity/README.md).
320320

321321
### **A.3: Indexing Scale Analysis**
322322

remote-content/remote-sources/components-data.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ components:
3737
sidebarPosition: 5
3838
version: v1.3.4
3939
- name: llm-d-kv-cache-manager
40+
# note: this is renamed to llm-d-kv-cache in > v0.4.0
4041
org: llm-d
4142
sidebarLabel: KV Cache Manager
4243
description: This repository contains the llm-d-kv-cache-manager, a pluggable service designed to enable KV-Cache Aware Routing and lay the foundation for advanced, cross-node cache coordination in vLLM-based serving platforms.

0 commit comments

Comments
 (0)