Fix BatchRotatingKVCache.merge() OOB when prompt exceeds max_size#1052
Open
LxYuan0420 wants to merge 3 commits intoml-explore:mainfrom
Open
Fix BatchRotatingKVCache.merge() OOB when prompt exceeds max_size#1052LxYuan0420 wants to merge 3 commits intoml-explore:mainfrom
LxYuan0420 wants to merge 3 commits intoml-explore:mainfrom
Conversation
BatchRotatingKVCache.merge() crashes when a constituent RotatingKVCache received a prompt longer than max_size on its first fill. _update_concat stores every token without trimming on first fill (trimming is deferred to the next call), leaving _idx == prompt_len > max_size. merge() used _idx as the output-slice width, writing prompt_len tokens into a max_size-wide buffer -> out-of-bounds write. Reproducer: RotatingKVCache(max_size=70) fed a 128-token prefill raises an index error when merge() is called. Signed-off-by: Yuan Lik Xun <lxyuan0420@gmail.com>
_update_concat defers trimming on first fill, so after a prompt longer than max_size, _idx == prompt_len > max_size. Using _idx as the output-slice width writes past the end of the max_size-wide buffer. c.size() (= min(offset, max_size)) is the correct width. The slice is taken from the tail because _temporal_order returns tokens oldest-first; the sliding window must retain the most-recent n, not the oldest. Signed-off-by: Yuan Lik Xun <lxyuan0420@gmail.com>
Three tests cover the fix: - Shape is capped at max_size when prompt exceeds max_size - Most-recent tokens land in the merged cache, not the oldest - Ring buffer is rolled into temporal order after autoregressive wrap-around Signed-off-by: Yuan Lik Xun <lxyuan0420@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is:
Context:
merge() used c._idx (raw prompt length) to size the destination slice, but the merged buffer is shaped to max_size. When prompt_length > max_size, this caused an out-of-bounds write. The fix uses c.size() (min(offset, max_size)) and takes the trailing n entries from _temporal_order() output.