Skip to content

Conversation

@wooway777
Copy link
Collaborator

resolves #125

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a cache interface abstraction (issue #125) to enable support for multiple cache implementations (Dynamic, Paged, etc.) through a common interface. The refactoring extracts the existing DynamicCache implementation into a separate directory and establishes a factory pattern for cache creation.

Key Changes:

  • Introduced CacheInterface abstract base class with factory method for polymorphic cache creation
  • Refactored DynamicCache from a standalone class to implement CacheInterface
  • Updated cache references throughout the codebase to use the interface type instead of concrete types

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
csrc/cache/cache_interface.hpp New abstract interface defining the contract for all cache implementations
csrc/cache/cache_factory.cpp Factory implementation for creating cache instances based on configuration
csrc/cache/cache.hpp Updated to include both interface and implementation headers
csrc/cache/dynamic_cache/dynamic_cache.hpp Refactored DynamicCache header to inherit from CacheInterface
csrc/cache/dynamic_cache/dynamic_cache.cpp Implementation moved from inline to separate compilation unit
csrc/cache/kv_cache.hpp Removed (functionality moved to dynamic_cache/)
csrc/models/llama/llama_model.hpp Updated external_cache_ member to use interface pointer
csrc/models/llama/llama_attention.hpp Updated include to use new cache.hpp header
csrc/engine/rank_worker.hpp Updated cache_ptr_ to use interface type
csrc/engine/rank_worker.cpp Updated to use factory method with logging
csrc/pybind11/models/llama.hpp Updated includes and reformatted method definitions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


// Copy new k/v into cache at current position
bool all_equal = cache_positions.empty() || std::equal(cache_positions.begin() + 1, cache_positions.end(), cache_positions.begin());
if (all_equal) {
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cache_positions is empty after ensure_capacity (which could happen if initial_batch_size is 0), accessing cache_positions[0] on line 143 would cause undefined behavior. Consider adding a check to ensure cache_positions is not empty before accessing it, or validate that initial_batch_size is at least 1 in ensure_capacity.

Suggested change
if (all_equal) {
if (all_equal) {
if (cache_positions.empty()) {
throw std::runtime_error("KVCache update: cache_positions is empty after ensure_capacity, cannot update cache.");
}

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

*/
std::pair<infinicore::Tensor, infinicore::Tensor> update(
const infinicore::Tensor &k_new,
const infinicore::Tensor &v_new);
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update method that takes two parameters should be marked with the override keyword for consistency and clarity. The base class CacheInterface provides a virtual implementation of this method, and marking overrides explicitly helps prevent errors and improves code maintainability.

Suggested change
const infinicore::Tensor &v_new);
const infinicore::Tensor &v_new) override;

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DEV] 统一缓存2.0

2 participants