Skip to content

evict largest sequence on KV exhaustion and retry with deferred-commi…#211

Merged
mcharytoniuk merged 2 commits intomainfrom
kv-cache-pressure-eviction
Apr 16, 2026
Merged

evict largest sequence on KV exhaustion and retry with deferred-commi…#211
mcharytoniuk merged 2 commits intomainfrom
kv-cache-pressure-eviction

Conversation

@mcharytoniuk
Copy link
Copy Markdown
Contributor

…t scheduler loop

Copilot AI review requested due to automatic review settings April 16, 2026 19:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the continuous batch scheduler to handle KV cache exhaustion by evicting the largest sequence and retrying decode without unbounded recursion, and adjusts/streamlines related model tests and build targets.

Changes:

  • Reworked ContinuousBatchScheduler::execute_one_iteration to use a retry loop with “deferred commit” of per-request state until decode succeeds.
  • Added a KV-pressure eviction regression test and reformatted several existing continuous-batch tests.
  • Tweaked build ergonomics in the Makefile (new build / build.cuda targets) and simplified KV-cache-dtype tests.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
paddler_model_tests/tests/continuous_batch_shutdown_during_generation.rs Removed a shutdown-during-generation test.
paddler_model_tests/tests/continuous_batch_clean_shutdown.rs Removed a clean-shutdown-after-inference test.
paddler_model_tests/tests/continuous_batch_partial_offload.rs Formatting-only changes to the partial-offload test.
paddler_model_tests/tests/continuous_batch_mixed_multimodal_plain.rs Formatting-only changes to mixed plain/multimodal test.
paddler_model_tests/tests/continuous_batch_concurrent_multimodal.rs Formatting-only changes to concurrent multimodal test.
paddler_model_tests/tests/continuous_batch_kv_pressure_eviction.rs Added regression test targeting KV pressure eviction behavior.
paddler_model_tests/tests/continuous_batch_kv_cache_dtypes.rs Reduced KV-cache-dtype coverage to a single test case.
paddler_integration_tests/tests/agent_cuda_clean_shutdown.rs Formatting-only assertion change.
paddler/src/agent/continuous_batch_scheduler.rs Core scheduler logic change: retry loop + deferred state commit after successful decode; eviction on NoKvCacheSlot.
paddler/src/agent/continuous_batch_embedding_processor.rs Removed a redundant KV cache clear (still cleared per embedding-batch decode).
Makefile Added build/build.cuda targets; adjusted target placement.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread paddler_model_tests/tests/continuous_batch_kv_cache_dtypes.rs
Comment thread paddler_model_tests/tests/continuous_batch_kv_pressure_eviction.rs
Comment thread Makefile Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread paddler/src/agent/continuous_batch_scheduler.rs
Comment thread paddler_model_tests/tests/continuous_batch_kv_pressure_eviction.rs
@mcharytoniuk mcharytoniuk merged commit 9ab21ff into main Apr 16, 2026
14 checks passed
@mcharytoniuk mcharytoniuk deleted the kv-cache-pressure-eviction branch April 16, 2026 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants