Skip to content

Fix BatchRotatingKVCache.merge() OOB when prompt exceeds max_size#1052

Open
LxYuan0420 wants to merge 3 commits intoml-explore:mainfrom
LxYuan0420:fix/batch-rotating-kv-cache-merge-oob
Open

Fix BatchRotatingKVCache.merge() OOB when prompt exceeds max_size#1052
LxYuan0420 wants to merge 3 commits intoml-explore:mainfrom
LxYuan0420:fix/batch-rotating-kv-cache-merge-oob

Conversation

@LxYuan0420
Copy link
Copy Markdown

This PR is:

  • To fix BatchRotatingKVCache.merge() writing past the merged buffer when the prompt length exceeds max_size, and add regression tests covering the prefill and autoregressive wrap-around paths

Context:
merge() used c._idx (raw prompt length) to size the destination slice, but the merged buffer is shaped to max_size. When prompt_length > max_size, this caused an out-of-bounds write. The fix uses c.size() (min(offset, max_size)) and takes the trailing n entries from _temporal_order() output.

BatchRotatingKVCache.merge() crashes when a constituent RotatingKVCache
received a prompt longer than max_size on its first fill.

_update_concat stores every token without trimming on first fill (trimming
is deferred to the next call), leaving _idx == prompt_len > max_size.
merge() used _idx as the output-slice width, writing prompt_len tokens
into a max_size-wide buffer -> out-of-bounds write.

Reproducer: RotatingKVCache(max_size=70) fed a 128-token prefill raises
an index error when merge() is called.

Signed-off-by: Yuan Lik Xun <lxyuan0420@gmail.com>
_update_concat defers trimming on first fill, so after a prompt longer than
max_size, _idx == prompt_len > max_size. Using _idx as the output-slice width
writes past the end of the max_size-wide buffer.

c.size() (= min(offset, max_size)) is the correct width. The slice is taken
from the tail because _temporal_order returns tokens oldest-first; the sliding
window must retain the most-recent n, not the oldest.

Signed-off-by: Yuan Lik Xun <lxyuan0420@gmail.com>
Three tests cover the fix:
- Shape is capped at max_size when prompt exceeds max_size
- Most-recent tokens land in the merged cache, not the oldest
- Ring buffer is rolled into temporal order after autoregressive wrap-around

Signed-off-by: Yuan Lik Xun <lxyuan0420@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant