[BugFix][KVCache] Add enc_dec_block_num to prefill_kvcache_block_num check by kevincheng2 · Pull Request #1 · kevincheng2/FastDeploy

kevincheng2 · 2026-03-30T12:49:27Z

Motivation

Cherry-pick from release/2.5: the original assertion only checked
prefill_kvcache_block_num >= max_block_num_per_seq, but for
encoder-decoder models the kvcache must also reserve blocks for the
encoder side (enc_dec_block_num). Without this check, the service
could silently allocate insufficient blocks for enc-dec sequences.

Modifications

CacheConfig.postprocess: tighten assertion to
prefill_kvcache_block_num >= max_block_num_per_seq + enc_dec_block_num,
error message guides user to reduce max_model_len or increase
num_gpu_blocks_override
CacheConfig.reset: same tightening, error message guides user to
reduce max_model_len or replace with larger GPU cards (override
is not applicable here)

Usage or Command

No change to launch command. If the assertion fires, adjust:

# Option 1: reduce max_model_len
python -m fastdeploy.entrypoints.openai.api_server \
  --max-model-len <smaller_value> ...

# Option 2 (postprocess only): increase GPU block count
python -m fastdeploy.entrypoints.openai.api_server \
  --num-gpu-blocks-override <larger_value> ...

Checklist

Add at least a tag in the PR title.
Format your code, run pre-commit before commit.
Add unit tests. No new logic introduced, only assertion tightened.
Provide accuracy results. Not applicable.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

…he_block_num check ## Motivation Cherry-pick from release/2.5: the original assertion only checked `prefill_kvcache_block_num >= max_block_num_per_seq`, but for encoder-decoder models the kvcache must also reserve blocks for the encoder side (`enc_dec_block_num`). Without this check, the service could silently allocate insufficient blocks for enc-dec sequences. ## Modifications - `CacheConfig.postprocess`: tighten assertion to `prefill_kvcache_block_num >= max_block_num_per_seq + enc_dec_block_num` - `CacheConfig.reset`: same tightening - Improve error message to guide users to reduce `max_model_len` or increase `num_gpu_blocks_override` ## Usage or Command No change to launch command. If the assertion fires, adjust: ```bash # Option 1: reduce max_model_len python -m fastdeploy.entrypoints.openai.api_server \ --max-model-len <smaller_value> ... # Option 2: increase GPU block count python -m fastdeploy.entrypoints.openai.api_server \ --num-gpu-blocks-override <larger_value> ... ``` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…he_block_num check ## Motivation Cherry-pick from release/2.5: the original assertion only checked `prefill_kvcache_block_num >= max_block_num_per_seq`, but for encoder-decoder models the kvcache must also reserve blocks for the encoder side (`enc_dec_block_num`). Without this check, the service could silently allocate insufficient blocks for enc-dec sequences. ## Modifications - `CacheConfig.postprocess`: tighten assertion to `prefill_kvcache_block_num >= max_block_num_per_seq + enc_dec_block_num`, error message guides user to reduce `max_model_len` or increase `num_gpu_blocks_override` - `CacheConfig.reset`: same tightening, error message guides user to reduce `max_model_len` or replace with larger GPU cards (override is not applicable here) ## Usage or Command No change to launch command. If the assertion fires, adjust: ```bash # Option 1: reduce max_model_len python -m fastdeploy.entrypoints.openai.api_server \ --max-model-len <smaller_value> ... # Option 2 (postprocess only): increase GPU block count python -m fastdeploy.entrypoints.openai.api_server \ --num-gpu-blocks-override <larger_value> ... ``` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kevincheng2 · 2026-03-30T12:54:04Z

Closing to resubmit with a clean branch.

kevincheng2 and others added 2 commits March 30, 2026 20:47

kevincheng2 had a problem deploying to Metax_ci March 30, 2026 12:49 — with GitHub Actions Failure

kevincheng2 changed the title ~~[Cherry-Pick][BugFix][KVCache] Add enc_dec_block_num to prefill_kvcache_block_num check~~ [BugFix][KVCache] Add enc_dec_block_num to prefill_kvcache_block_num check Mar 30, 2026

kevincheng2 closed this Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix][KVCache] Add enc_dec_block_num to prefill_kvcache_block_num check#1

[BugFix][KVCache] Add enc_dec_block_num to prefill_kvcache_block_num check#1
kevincheng2 wants to merge 2 commits intorelease/2.4from
fix/prefill-kvcache-block-check

kevincheng2 commented Mar 30, 2026

Uh oh!

kevincheng2 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kevincheng2 commented Mar 30, 2026

Motivation

Modifications

Usage or Command

Checklist

Uh oh!

kevincheng2 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant