Conversation
📝 WalkthroughWalkthroughThis pull request introduces CXL (Compute Express Link) memory allocation support to the Mooncake system. It adds a new Changes
Sequence DiagramsequenceDiagram
participant Client
participant MasterService
participant SegmentManager
participant CxlAllocationStrategy
participant Allocator
Client->>MasterService: Create (with protocol="cxl")
MasterService->>MasterService: Load CXL config (enable_cxl, cxl_path, cxl_size)
MasterService->>SegmentManager: Initialize (enable_cxl=true)
SegmentManager->>Allocator: initializeCxlAllocator(cxl_path, cxl_size)
Allocator->>Allocator: Create CXL BufferAllocator
Client->>Client: setup_internal (protocol="cxl")
Client->>Client: GetBaseAddr() from transfer_engine
Client->>SegmentManager: MountSegment(ptr, size, protocol="cxl")
SegmentManager->>SegmentManager: Detect CXL protocol
SegmentManager->>Allocator: Use cxl_global_allocator_
Allocator->>Allocator: Register allocator for segment
Client->>MasterService: Put/BatchPut with CXL segment
MasterService->>CxlAllocationStrategy: Allocate (preferred_segments=[cxl_segment])
CxlAllocationStrategy->>Allocator: Allocate from CXL allocator
Allocator->>Allocator: Allocate buffer, mark as CXL
CxlAllocationStrategy->>MasterService: Return replica with CXL buffer
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
🤖 Fix all issues with AI agents
In `@mooncake-store/include/allocation_strategy.h`:
- Around line 362-419: The CxlAllocationStrategy::Allocate currently always uses
preferred_segments[0] and returns one replica regardless of replica_num; update
Allocate to (1) validate that preferred_segments is non-empty and choose the
first preferred segment that is not in excluded_segments (if none available
return ErrorCode::INVALID_PARAMS or NO_AVAILABLE_HANDLE), (2) enforce
replica_num by attempting to allocate replica_num buffers from the selected
allocator (use allocator_manager.getAllocators(cxl_segment_name) and
BufferAllocatorBase::allocate in a loop), pushing each buffer after calling
change_to_cxl into replicas, and (3) on any per-replica allocation failure,
release/free already-allocated buffers and return
tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE); keep existing logging and
error returns otherwise.
In `@mooncake-store/src/master_service.cpp`:
- Around line 85-91: The code sets allocation_strategy_ to CxlAllocationStrategy
then calls segment_manager_.initializeCxlAllocator(cxl_path_, cxl_size_) which
can throw, so wrap that initialization in a try-catch that catches
std::exception (and optionally ...) and handles failures by logging the error
(include e.what()), falling back to creating a RandomAllocationStrategy (or
disabling enable_cxl_), and ensuring allocation_strategy_ is left in a valid
state; alternatively modify initializeCxlAllocator to return a success/failure
boolean and handle the false case by logging and switching to
RandomAllocationStrategy; reference enable_cxl_, allocation_strategy_,
CxlAllocationStrategy, segment_manager_.initializeCxlAllocator, cxl_path_, and
cxl_size_ when making the change.
In `@mooncake-store/src/real_client.cpp`:
- Around line 265-334: MC_CXL_DEV_SIZE is not validated and
client_->GetBaseAddr() is not checked: ensure the parsed MC_CXL_DEV_SIZE yields
a positive non-zero cxl_dev_size (reject invalid/zero values and return
tl::unexpected(ErrorCode::INVALID_PARAMS) with a logged error) and check that
client_->GetBaseAddr() != nullptr before calling client_->MountSegment; if base
addr is null log an error and return tl::unexpected(ErrorCode::INVALID_PARAMS).
Update the cxl branch around the getenv/strtoull parsing and the pointer
retrieval (symbols: MC_CXL_DEV_SIZE, cxl_dev_size, client_->GetBaseAddr(),
client_->MountSegment) to perform these validations and early returns.
In `@mooncake-store/src/segment.cpp`:
- Around line 221-236: The unmount path skips calling
MasterMetricManager::instance().dec_total_mem_capacity when segment.protocol ==
"cxl", causing capacity to be incremented on mount but not decremented on
unmount; make the accounting symmetric by always decrementing total mem capacity
for the segment you found. In practice, ensure you only call
dec_total_mem_capacity when the segment was located (use the same found
condition used to set segment_name/is_cxl), and remove the is_cxl short-circuit
so that MasterMetricManager::instance().dec_total_mem_capacity(segment_name,
metrics_dec_capacity) is invoked for CXL segments as well (alternatively move
CXL capacity tracking to a one-time initializer if you intend per-system
accounting).
- Around line 12-38: The CXL branch in the mounting logic skips the duplicate
segment-id and size validation and can accept size==0, causing retries to
overwrite state and double-count metrics; before using segment.id/segment.name
and adding allocators/metrics in the CXL path (the block guarded by
segment_manager_->enable_cxl_ && segment.protocol == "cxl" and
BufferAllocatorType::CACHELIB), add the same guardrails used by the non‑CXL
path: verify size > 0, check that segment_manager_->mounted_segments_ does not
already contain segment.id (and/or segment_manager_->client_by_name_ does not
already map the name), and return the appropriate error code instead of
proceeding; only after those checks pass should you initialize allocator, call
allocator_manager_.addAllocator(...), update client_segments_, mounted_segments_
and client_by_name_, and increment metrics.
In `@mooncake-store/tests/client_integration_test.cpp`:
- Around line 145-146: Update the help text for the FLAGS_protocol flag to
explicitly include "cxl" (e.g., add "cxl" to the list of supported protocols in
the flag description) so it reflects that FLAGS_protocol is passed into
MountSegment; change both occurrences of the flag description associated with
FLAGS_protocol referenced in the tests to mention "cxl" to avoid confusion when
running CXL tests (search for FLAGS_protocol and the DEFINE_string/flag
declaration near its uses and update the help string).
In `@mooncake-store/tests/cxl_client_integration_test.cpp`:
- Around line 265-276: The test allocates 1 * 1024 * 1024 bytes via
client_buffer_allocator_->allocate but deallocates only test_data.size(),
causing a size mismatch; change the deallocation to match the allocated size
(e.g., store size_t allocated_size = 1 * 1024 * 1024 or use the same expression)
and call client_buffer_allocator_->deallocate(buffer, allocated_size) (instead
of deallocating test_data.size()) so allocate(...) and deallocate(...) use the
same size for buffer.
- Around line 90-104: The CreateClient function currently uses a hardcoded
master server address ("10.130.5.132:50051"); add a gflags string flag (e.g.,
DEFINE_string(master_server_address, "10.130.5.132:50051", "Master server
address for testing")) near the top of the test file and replace the literal in
CreateClient with FLAGS_master_server_address so tests can be configured per
environment (refer to CreateClient and the hardcoded address string to locate
the change).
- Around line 300-329: The test leaks memory because buffers allocated via
client_buffer_allocator_->allocate for each Slice stored in batched_slices are
never deallocated; after the BatchPut call completes (test_client_->BatchPut),
iterate the batched_slices (and any target_buffer if used) and call the matching
deallocation on the allocator for each Slice pointer (e.g., deallocate the void*
stored in each Slice) to free all allocated buffers before the test ends; ensure
you handle all entries in batched_slices and clear the container afterward.
- Around line 27-31: The test hardcodes an environment-specific IP in the
FLAGS/DEFINE defaults (DEFINE_string cxl_device_name and
transfer_engine_metadata_url) which breaks portability; update the DEFINE_string
defaults to use localhost or read from an environment variable (e.g.,
getenv("CXL_DEVICE_NAME") / getenv("TRANSFER_ENGINE_METADATA_URL")) instead of
"10.130.5.132", and in the test suite SetUpTestSuite (or equivalent test fixture
initialization) validate that the transfer_engine_metadata_url is present and
fail fast with a clear message if it is missing or empty; reference the
DEFINE_string symbols cxl_device_name and transfer_engine_metadata_url and the
SetUpTestSuite test fixture to locate where to change defaults and add the
validation.
- Around line 129-134: TearDownTestSuite is calling master_.Stop() although the
static InProcMaster master_ is never started; either remove the unused master_
member and its master_.Stop() call in TearDownTestSuite (also remove master_
from the class static members) or ensure the master is actually started by
adding a master_.Start() (and any necessary configuration) in SetUpTestSuite;
locate references to master_, InProcMaster, TearDownTestSuite and SetUpTestSuite
to apply the chosen fix consistently.
🧹 Nitpick comments (9)
mooncake-store/include/master_service.h (1)
915-918: Consider makingenable_cxl_const for consistency.
cxl_path_andcxl_size_are declaredconst, butenable_cxl_is not. If this is a configuration value set at construction and not modified afterwards, consider making itconst bool enable_cxl_;for consistency with the other CXL-related members.♻️ Suggested change
const std::string cxl_path_; const size_t cxl_size_; - bool enable_cxl_; + const bool enable_cxl_;mooncake-store/include/allocator.h (2)
75-76: Passclient_segment_nameby const reference.The
change_to_cxlmethod takesstd::stringby value. For efficiency, consider passing byconst std::string&to avoid unnecessary string copies.♻️ Suggested change
- void change_to_cxl(std::string client_segment_name); + void change_to_cxl(const std::string& client_segment_name);
80-83: Inconsistent member naming:protocolshould beprotocol_.The new
protocolmember at line 83 doesn't follow the naming convention used by other private members in this class (allocator_,buffer_ptr_,size_,offset_handle_,segment_name_). Renaming toprotocol_would maintain consistency.♻️ Suggested change
std::string segment_name_; void* buffer_ptr_{nullptr}; std::size_t size_{0}; - std::string protocol{"tcp"}; + std::string protocol_{"tcp"};mooncake-store/src/segment.cpp (1)
310-321: Guard against re-initializing the CXL allocatorRe-running initialization could replace the allocator while segments still use it. Consider an idempotency guard to prevent accidental reinit.
♻️ Suggested refinement
void SegmentManager::initializeCxlAllocator(const std::string& cxl_path, const size_t cxl_size) { + if (cxl_global_allocator_) { + LOG(WARNING) << "CXL global allocator already initialized"; + return; + } VLOG(1) << "Init CXL global allocator.";mooncake-store/src/allocator.cpp (1)
49-59: Add defensive checks before converting to CXL offset
change_to_cxlsubtractsDEFAULT_CXL_BASEunconditionally; if called twice or if the address is below the base, it can underflow and later free the wrong address. A small guard makes this safer.♻️ Suggested refinement
void AllocatedBuffer::change_to_cxl(std::string client_segment_name) { uint64_t offset_raw = reinterpret_cast<uintptr_t>(buffer_ptr_); + if (protocol == "cxl") { + LOG(WARNING) << "change_to_cxl called twice; ignoring"; + return; + } + if (offset_raw < DEFAULT_CXL_BASE) { + LOG(ERROR) << "CXL base underflow: " << offset_raw; + return; + } buffer_ptr_ = reinterpret_cast<void*>(offset_raw - DEFAULT_CXL_BASE); protocol = "cxl"; segment_name_ = client_segment_name; }mooncake-store/src/client_service.cpp (1)
1616-1617: Consider adding a null check for defensive programming.
GetBaseAddr()directly dereferencestransfer_engine_without verifying it's initialized. While the transfer engine is typically set duringClient::Create, calling this method before initialization completes could cause undefined behavior.🛡️ Optional defensive null check
-void* Client::GetBaseAddr() { return transfer_engine_->getBaseAddr(); } +void* Client::GetBaseAddr() { + if (!transfer_engine_) { + LOG(ERROR) << "GetBaseAddr called before transfer engine initialized"; + return nullptr; + } + return transfer_engine_->getBaseAddr(); +}mooncake-store/tests/cxl_client_integration_test.cpp (2)
172-195: Consider using RAII for log sink management.The manual
new/deletepattern fortest_client_sinkis error-prone. If an exception occurs between allocation and deletion, memory will leak. Usingstd::unique_ptrwould be safer.♻️ Suggested RAII-based approach
static void InitializeClients() { // This client is used for testing purposes. // Capture test_client_ client_id from logs - ClientIdCaptureSink* test_client_sink = new ClientIdCaptureSink(); + auto test_client_sink = std::make_unique<ClientIdCaptureSink>(); - google::AddLogSink(test_client_sink); + google::AddLogSink(test_client_sink.get()); test_client_ = CreateClient("localhost:17813"); ASSERT_TRUE(test_client_ != nullptr); // Wait for logs to flush std::this_thread::sleep_for(std::chrono::milliseconds(200)); - google::RemoveLogSink(test_client_sink); + google::RemoveLogSink(test_client_sink.get()); if (!test_client_sink->captured_client_id.empty()) { UUID extracted_id = ParseClientId(test_client_sink->captured_client_id); if (extracted_id.first != 0 || extracted_id.second != 0) { test_client_id_ = extracted_id; LOG(INFO) << "Captured test_client_id: " << FormatClientId(test_client_id_); } } - delete test_client_sink; + // unique_ptr automatically handles cleanup }
220-229: Remove unused static members.Several static members are declared but never used:
segment_ptr_,ram_buffer_size_,master_address_,metadata_url_, andis_cxl. This is dead code that adds confusion.♻️ Remove unused declarations
static std::unique_ptr<SimpleAllocator> client_buffer_allocator_; - static void* segment_ptr_; - static size_t ram_buffer_size_; static void* test_client_segment_ptr_; static size_t test_client_ram_buffer_size_; static uint64_t default_kv_lease_ttl_; - static InProcMaster master_; - static std::string master_address_; - static std::string metadata_url_; static UUID test_client_id_; - static inline bool is_cxl = false; }; // Static members initialization std::shared_ptr<Client> ClientIntegrationTestCxl::test_client_ = nullptr; -void* ClientIntegrationTestCxl::segment_ptr_ = nullptr; void* ClientIntegrationTestCxl::test_client_segment_ptr_ = nullptr; std::unique_ptr<SimpleAllocator> ClientIntegrationTestCxl::client_buffer_allocator_ = nullptr; -size_t ClientIntegrationTestCxl::ram_buffer_size_ = 0; size_t ClientIntegrationTestCxl::test_client_ram_buffer_size_ = 0; uint64_t ClientIntegrationTestCxl::default_kv_lease_ttl_ = 0; -InProcMaster ClientIntegrationTestCxl::master_; -std::string ClientIntegrationTestCxl::master_address_; -std::string ClientIntegrationTestCxl::metadata_url_; UUID ClientIntegrationTestCxl::test_client_id_{0, 0};mooncake-store/include/master_config.h (1)
531-547: Clarify intent with a comment for the CXL allocator override.The logic at lines 546-547 silently overrides the configured
memory_allocatortoCACHELIBwhen CXL is enabled. While this is likely intentional (CXL requires CACHELIB), a brief comment would help future maintainers understand this constraint.📝 Add explanatory comment
- auto cxl_allocator_type = BufferAllocatorType::CACHELIB; - + // CXL mode requires CACHELIB allocator for proper memory management + constexpr auto cxl_allocator_type = BufferAllocatorType::CACHELIB; default_kv_lease_ttl = config.default_kv_lease_ttl; ... memory_allocator = config.enable_cxl ? cxl_allocator_type : config.memory_allocator;
| class CxlAllocationStrategy : public AllocationStrategy { | ||
| public: | ||
| CxlAllocationStrategy() = default; | ||
| tl::expected<std::vector<Replica>, ErrorCode> Allocate( | ||
| const AllocatorManager& allocator_manager, const size_t slice_length, | ||
| const size_t replica_num = 1, | ||
| const std::vector<std::string>& preferred_segments = | ||
| std::vector<std::string>(), | ||
| const std::set<std::string>& excluded_segments = | ||
| std::set<std::string>()) { | ||
| if (slice_length == 0 || replica_num == 0) { | ||
| return tl::make_unexpected(ErrorCode::INVALID_PARAMS); | ||
| } | ||
|
|
||
| if (preferred_segments.empty()) { | ||
| LOG(ERROR) << "Preferred_segments is empty."; | ||
| return tl::make_unexpected(ErrorCode::INVALID_PARAMS); | ||
| } | ||
|
|
||
| const std::string& cxl_segment_name = preferred_segments[0]; | ||
|
|
||
| VLOG(1) << "Do cxl allocate, overwritten segment=" << cxl_segment_name; | ||
|
|
||
| const auto cxl_allocators = | ||
| allocator_manager.getAllocators(cxl_segment_name); | ||
|
|
||
| if (cxl_allocators == nullptr || cxl_allocators->size() == 0) { | ||
| return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE); | ||
| } | ||
| std::shared_ptr<BufferAllocatorBase> cxl_allocator = | ||
| (*cxl_allocators)[0]; | ||
| if (!cxl_allocator) { | ||
| LOG(ERROR) << "No CXL allocator in preferred_segment"; | ||
| return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE); | ||
| } | ||
|
|
||
| std::vector<Replica> replicas; | ||
| replicas.reserve(replica_num); | ||
|
|
||
| auto buffer = cxl_allocator->allocate(slice_length); | ||
| if (!buffer) { | ||
| return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE); | ||
| } | ||
|
|
||
| buffer->change_to_cxl(cxl_segment_name); | ||
| replicas.emplace_back(std::move(buffer), ReplicaStatus::PROCESSING); | ||
|
|
||
| VLOG(1) << "Successfully allocated " << replicas.size() | ||
| << " CXL replica."; | ||
| return replicas; | ||
| } | ||
|
|
||
| tl::expected<Replica, ErrorCode> AllocateFrom( | ||
| const AllocatorManager& allocator_manager, const size_t slice_length, | ||
| const std::string& segment_name) { | ||
| return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE); | ||
| } | ||
| }; |
There was a problem hiding this comment.
CXL allocation ignores exclusions and replica count
The strategy always allocates from preferred_segments[0] even if it’s excluded, and it silently returns a single replica even when replica_num > 1. That violates caller constraints and replication expectations. Validate exclusions and explicitly reject/handle replica counts > 1.
🔧 Suggested fix
const std::string& cxl_segment_name = preferred_segments[0];
+if (excluded_segments.contains(cxl_segment_name)) {
+ return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE);
+}
+if (replica_num != 1) {
+ LOG(ERROR) << "CXL allocation supports only 1 replica, replica_num="
+ << replica_num;
+ return tl::make_unexpected(ErrorCode::INVALID_PARAMS);
+}
VLOG(1) << "Do cxl allocate, overwritten segment=" << cxl_segment_name;🤖 Prompt for AI Agents
In `@mooncake-store/include/allocation_strategy.h` around lines 362 - 419, The
CxlAllocationStrategy::Allocate currently always uses preferred_segments[0] and
returns one replica regardless of replica_num; update Allocate to (1) validate
that preferred_segments is non-empty and choose the first preferred segment that
is not in excluded_segments (if none available return ErrorCode::INVALID_PARAMS
or NO_AVAILABLE_HANDLE), (2) enforce replica_num by attempting to allocate
replica_num buffers from the selected allocator (use
allocator_manager.getAllocators(cxl_segment_name) and
BufferAllocatorBase::allocate in a loop), pushing each buffer after calling
change_to_cxl into replicas, and (3) on any per-replica allocation failure,
release/free already-allocated buffers and return
tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE); keep existing logging and
error returns otherwise.
| if (enable_cxl_) { | ||
| allocation_strategy_ = std::make_shared<CxlAllocationStrategy>(); | ||
| segment_manager_.initializeCxlAllocator(cxl_path_, cxl_size_); | ||
| VLOG(1) << "action=start_cxl_global_allocator"; | ||
| } else { | ||
| allocation_strategy_ = std::make_shared<RandomAllocationStrategy>(); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -nP 'initializeCxlAllocator' --type=cpp --type=hppRepository: XucSh/Mooncake
Length of output: 85
🏁 Script executed:
rg -nP 'initializeCxlAllocator' --type=cppRepository: XucSh/Mooncake
Length of output: 375
🏁 Script executed:
# Read the function declaration
sed -n '220,230p' mooncake-store/include/segment.h
# Read the function implementation
sed -n '310,330p' mooncake-store/src/segment.cppRepository: XucSh/Mooncake
Length of output: 1088
🏁 Script executed:
# Read more context around the call in master_service.cpp
sed -n '75,100p' mooncake-store/src/master_service.cppRepository: XucSh/Mooncake
Length of output: 979
🏁 Script executed:
# Search for cxl_path_ initialization and validation
rg -nP 'cxl_path_\s*=' mooncake-store/src/master_service.cpp | head -20
# Search for CachelibBufferAllocator constructor
rg -nP 'CachelibBufferAllocator' --type=cpp -A 5 | head -40Repository: XucSh/Mooncake
Length of output: 3234
🏁 Script executed:
# Look for try-catch or other error handling in the initialization function
rg -nP 'try|catch|throw' mooncake-store/src/master_service.cpp | head -20
# Get the full context of the Init function
rg -nB 30 'if \(enable_cxl_\)' mooncake-store/src/master_service.cpp | grep -A 30 'void MasterService::Init'Repository: XucSh/Mooncake
Length of output: 518
🏁 Script executed:
# Find the complete Init function to see if it has try-catch
rg -nP 'void MasterService::Init' mooncake-store/src/master_service.cpp -A 60 | head -80Repository: XucSh/Mooncake
Length of output: 40
🏁 Script executed:
# Try finding Init function with different pattern
sed -n '1,100p' mooncake-store/src/master_service.cpp | tail -50Repository: XucSh/Mooncake
Length of output: 1982
Add error handling for CXL allocator initialization.
The initializeCxlAllocator() function has a void return type and contains no try-catch, yet it calls CachelibBufferAllocator which can throw exceptions if the size or base address is invalid (as documented in segment.cpp:73-74 and shown by the try-catch in addSegment()). If initialization fails, the service will proceed with CxlAllocationStrategy but a potentially uninitialized allocator. Either wrap the call in try-catch, change the return type to report errors, or validate cxl_path_ and cxl_size_ before calling the function.
🤖 Prompt for AI Agents
In `@mooncake-store/src/master_service.cpp` around lines 85 - 91, The code sets
allocation_strategy_ to CxlAllocationStrategy then calls
segment_manager_.initializeCxlAllocator(cxl_path_, cxl_size_) which can throw,
so wrap that initialization in a try-catch that catches std::exception (and
optionally ...) and handles failures by logging the error (include e.what()),
falling back to creating a RandomAllocationStrategy (or disabling enable_cxl_),
and ensuring allocation_strategy_ is left in a valid state; alternatively modify
initializeCxlAllocator to return a success/failure boolean and handle the false
case by logging and switching to RandomAllocationStrategy; reference
enable_cxl_, allocation_strategy_, CxlAllocationStrategy,
segment_manager_.initializeCxlAllocator, cxl_path_, and cxl_size_ when making
the change.
| if (protocol == "cxl") { | ||
| size_t cxl_dev_size = 0; | ||
| const char *env = std::getenv("MC_CXL_DEV_SIZE"); | ||
| if (env) { | ||
| char *end = nullptr; | ||
| unsigned long long val = strtoull(env, &end, 10); | ||
| if (end != env && *end == '\0') | ||
| cxl_dev_size = static_cast<size_t>(val); | ||
| } else { | ||
| ptr = | ||
| allocate_buffer_allocator_memory(segment_size, this->protocol); | ||
| } | ||
|
|
||
| if (!ptr) { | ||
| LOG(ERROR) << "Failed to allocate segment memory"; | ||
| LOG(FATAL) << "MC_CXL_DEV_SIZE not set"; | ||
| return tl::unexpected(ErrorCode::INVALID_PARAMS); | ||
| } | ||
| if (this->protocol == "ascend") { | ||
| ascend_segment_ptrs_.emplace_back(ptr); | ||
| } else if (should_use_hugepage) { | ||
| hugepage_segment_ptrs_.emplace_back( | ||
| ptr, HugepageSegmentDeleter{mapped_size}); | ||
| } else { | ||
| segment_ptrs_.emplace_back(ptr); | ||
| } | ||
| auto mount_result = client_->MountSegment(ptr, mapped_size); | ||
|
|
||
| void *ptr = client_->GetBaseAddr(); | ||
| LOG(INFO) << "Mounting CXL segment: " << cxl_dev_size << " bytes, " | ||
| << ptr; | ||
| auto mount_result = client_->MountSegment(ptr, cxl_dev_size, protocol); | ||
| if (!mount_result.has_value()) { | ||
| LOG(ERROR) << "Failed to mount segment: " | ||
| << toString(mount_result.error()); | ||
| return tl::unexpected(mount_result.error()); | ||
| } | ||
| } | ||
| if (total_glbseg_size == 0) { | ||
| LOG(INFO) << "Global segment size is 0, skip mounting segment"; | ||
|
|
||
| } else { | ||
| auto max_mr_size = globalConfig().max_mr_size; // Max segment size | ||
| uint64_t total_glbseg_size = global_segment_size; // For logging | ||
| uint64_t current_glbseg_size = 0; // For logging | ||
| while (global_segment_size > 0) { | ||
| size_t segment_size = std::min(global_segment_size, max_mr_size); | ||
| global_segment_size -= segment_size; | ||
| current_glbseg_size += segment_size; | ||
| LOG(INFO) << "Mounting segment: " << segment_size << " bytes, " | ||
| << current_glbseg_size << " of " << total_glbseg_size; | ||
|
|
||
| size_t mapped_size = segment_size; | ||
| void *ptr = nullptr; | ||
| if (should_use_hugepage) { | ||
| mapped_size = | ||
| align_up(segment_size, get_hugepage_size_from_env()); | ||
| ptr = allocate_buffer_mmap_memory(mapped_size, | ||
| get_hugepage_size_from_env()); | ||
| } else { | ||
| ptr = allocate_buffer_allocator_memory(segment_size, | ||
| this->protocol); | ||
| } | ||
|
|
||
| if (!ptr) { | ||
| LOG(ERROR) << "Failed to allocate segment memory"; | ||
| return tl::unexpected(ErrorCode::INVALID_PARAMS); | ||
| } | ||
| if (this->protocol == "ascend") { | ||
| ascend_segment_ptrs_.emplace_back(ptr); | ||
| } else if (should_use_hugepage) { | ||
| hugepage_segment_ptrs_.emplace_back( | ||
| ptr, HugepageSegmentDeleter{mapped_size}); | ||
| } else { | ||
| segment_ptrs_.emplace_back(ptr); | ||
| } | ||
| auto mount_result = | ||
| client_->MountSegment(ptr, mapped_size, protocol); | ||
| if (!mount_result.has_value()) { | ||
| LOG(ERROR) << "Failed to mount segment: " | ||
| << toString(mount_result.error()); | ||
| return tl::unexpected(mount_result.error()); | ||
| } | ||
| } | ||
| if (total_glbseg_size == 0) { | ||
| LOG(INFO) << "Global segment size is 0, skip mounting segment"; | ||
| } | ||
| } |
There was a problem hiding this comment.
Validate MC_CXL_DEV_SIZE and base address before mounting
If MC_CXL_DEV_SIZE is set but invalid (or zero), cxl_dev_size stays 0 and a zero-sized segment gets mounted. Also, GetBaseAddr() could return null. Validate the env var and size, and guard the base address.
🔧 Suggested fix
if (protocol == "cxl") {
- size_t cxl_dev_size = 0;
const char *env = std::getenv("MC_CXL_DEV_SIZE");
- if (env) {
- char *end = nullptr;
- unsigned long long val = strtoull(env, &end, 10);
- if (end != env && *end == '\0')
- cxl_dev_size = static_cast<size_t>(val);
- } else {
- LOG(FATAL) << "MC_CXL_DEV_SIZE not set";
- return tl::unexpected(ErrorCode::INVALID_PARAMS);
- }
+ if (!env) {
+ LOG(ERROR) << "MC_CXL_DEV_SIZE not set";
+ return tl::unexpected(ErrorCode::INVALID_PARAMS);
+ }
+ char *end = nullptr;
+ unsigned long long val = strtoull(env, &end, 10);
+ if (end == env || *end != '\0' || val == 0) {
+ LOG(ERROR) << "Invalid MC_CXL_DEV_SIZE: " << env;
+ return tl::unexpected(ErrorCode::INVALID_PARAMS);
+ }
+ size_t cxl_dev_size = static_cast<size_t>(val);
void *ptr = client_->GetBaseAddr();
+ if (!ptr) {
+ LOG(ERROR) << "CXL base address is null";
+ return tl::unexpected(ErrorCode::INTERNAL_ERROR);
+ }
LOG(INFO) << "Mounting CXL segment: " << cxl_dev_size << " bytes, "
<< ptr;🤖 Prompt for AI Agents
In `@mooncake-store/src/real_client.cpp` around lines 265 - 334, MC_CXL_DEV_SIZE
is not validated and client_->GetBaseAddr() is not checked: ensure the parsed
MC_CXL_DEV_SIZE yields a positive non-zero cxl_dev_size (reject invalid/zero
values and return tl::unexpected(ErrorCode::INVALID_PARAMS) with a logged error)
and check that client_->GetBaseAddr() != nullptr before calling
client_->MountSegment; if base addr is null log an error and return
tl::unexpected(ErrorCode::INVALID_PARAMS). Update the cxl branch around the
getenv/strtoull parsing and the pointer retrieval (symbols: MC_CXL_DEV_SIZE,
cxl_dev_size, client_->GetBaseAddr(), client_->MountSegment) to perform these
validations and early returns.
| // Check if cxl storage is enable | ||
| if (segment_manager_->enable_cxl_ && segment.protocol == "cxl") { | ||
| LOG(INFO) << "Start Mounting CXL Segment."; | ||
| if (segment_manager_->memory_allocator_ == | ||
| BufferAllocatorType::CACHELIB) { | ||
| auto allocator = segment_manager_->cxl_global_allocator_; | ||
| if (segment_manager_->cxl_global_allocator_ == nullptr) { | ||
| LOG(ERROR) << "Cxl global allocator has not been initialized."; | ||
| return ErrorCode::INTERNAL_ERROR; | ||
| } | ||
| segment_manager_->allocator_manager_.addAllocator(segment.name, | ||
| allocator); | ||
| segment_manager_->client_segments_[client_id].push_back(segment.id); | ||
| segment_manager_->mounted_segments_[segment.id] = { | ||
| segment, SegmentStatus::OK, allocator}; | ||
| segment_manager_->client_by_name_[segment.name] = client_id; | ||
| MasterMetricManager::instance().inc_total_mem_capacity(segment.name, | ||
| size); | ||
|
|
||
| LOG(INFO) << "[CXL Segment Mounted Successfully] Segment name: " | ||
| << segment.name | ||
| << ", Mount size: " << (size / 1024 / 1024 / 1024) | ||
| << " GB"; | ||
| return ErrorCode::OK; | ||
| } | ||
| return ErrorCode::INTERNAL_ERROR; | ||
| } |
There was a problem hiding this comment.
CXL MountSegment skips duplicate/size validation
The CXL branch bypasses the existing segment-id check and accepts size=0, so retries can overwrite state and double-count metrics. Add the same guardrails used by the non‑CXL path.
🔧 Suggested fix
if (segment_manager_->enable_cxl_ && segment.protocol == "cxl") {
+ if (size == 0) {
+ LOG(ERROR) << "size=0 is invalid for CXL segment";
+ return ErrorCode::INVALID_PARAMS;
+ }
+ auto exist_segment_it =
+ segment_manager_->mounted_segments_.find(segment.id);
+ if (exist_segment_it != segment_manager_->mounted_segments_.end()) {
+ auto& exist_segment = exist_segment_it->second;
+ if (exist_segment.status == SegmentStatus::OK) {
+ LOG(WARNING) << "segment_name=" << segment.name
+ << ", warn=segment_already_exists";
+ return ErrorCode::SEGMENT_ALREADY_EXISTS;
+ } else {
+ LOG(ERROR) << "segment_name=" << segment.name
+ << ", error=segment_already_exists_but_not_ok"
+ << ", status=" << exist_segment.status;
+ return ErrorCode::UNAVAILABLE_IN_CURRENT_STATUS;
+ }
+ }
LOG(INFO) << "Start Mounting CXL Segment.";
if (segment_manager_->memory_allocator_ ==
BufferAllocatorType::CACHELIB) {🤖 Prompt for AI Agents
In `@mooncake-store/src/segment.cpp` around lines 12 - 38, The CXL branch in the
mounting logic skips the duplicate segment-id and size validation and can accept
size==0, causing retries to overwrite state and double-count metrics; before
using segment.id/segment.name and adding allocators/metrics in the CXL path (the
block guarded by segment_manager_->enable_cxl_ && segment.protocol == "cxl" and
BufferAllocatorType::CACHELIB), add the same guardrails used by the non‑CXL
path: verify size > 0, check that segment_manager_->mounted_segments_ does not
already contain segment.id (and/or segment_manager_->client_by_name_ does not
already map the name), and return the appropriate error code instead of
proceeding; only after those checks pass should you initialize allocator, call
allocator_manager_.addAllocator(...), update client_segments_, mounted_segments_
and client_by_name_, and increment metrics.
| bool is_cxl = false; | ||
| auto&& segment = segment_manager_->mounted_segments_.find(segment_id); | ||
| if (segment != segment_manager_->mounted_segments_.end()) { | ||
| segment_name = segment->second.segment.name; | ||
| // Also remove from segment_name_client_id_map_ | ||
| segment_manager_->client_by_name_.erase(segment_name); | ||
| is_cxl = (segment->second.segment.protocol == "cxl"); | ||
| } | ||
| // Remove from mounted_segments_ | ||
| segment_manager_->mounted_segments_.erase(segment_id); | ||
|
|
||
| // Decrease the total capacity | ||
| MasterMetricManager::instance().dec_total_mem_capacity( | ||
| segment_name, metrics_dec_capacity); | ||
| if (!is_cxl) { | ||
| MasterMetricManager::instance().dec_total_mem_capacity( | ||
| segment_name, metrics_dec_capacity); | ||
| } |
There was a problem hiding this comment.
CXL capacity metrics become unbalanced on unmount
CXL mount increments total capacity, but unmount now skips decrement. Re-mounting (or multiple CXL segments) will inflate capacity metrics. Either keep accounting symmetric or move CXL capacity tracking to a one-time initializer.
🔧 Suggested fix (symmetry)
- if (!is_cxl) {
- MasterMetricManager::instance().dec_total_mem_capacity(
- segment_name, metrics_dec_capacity);
- }
+ MasterMetricManager::instance().dec_total_mem_capacity(
+ segment_name, metrics_dec_capacity);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| bool is_cxl = false; | |
| auto&& segment = segment_manager_->mounted_segments_.find(segment_id); | |
| if (segment != segment_manager_->mounted_segments_.end()) { | |
| segment_name = segment->second.segment.name; | |
| // Also remove from segment_name_client_id_map_ | |
| segment_manager_->client_by_name_.erase(segment_name); | |
| is_cxl = (segment->second.segment.protocol == "cxl"); | |
| } | |
| // Remove from mounted_segments_ | |
| segment_manager_->mounted_segments_.erase(segment_id); | |
| // Decrease the total capacity | |
| MasterMetricManager::instance().dec_total_mem_capacity( | |
| segment_name, metrics_dec_capacity); | |
| if (!is_cxl) { | |
| MasterMetricManager::instance().dec_total_mem_capacity( | |
| segment_name, metrics_dec_capacity); | |
| } | |
| bool is_cxl = false; | |
| auto&& segment = segment_manager_->mounted_segments_.find(segment_id); | |
| if (segment != segment_manager_->mounted_segments_.end()) { | |
| segment_name = segment->second.segment.name; | |
| // Also remove from segment_name_client_id_map_ | |
| segment_manager_->client_by_name_.erase(segment_name); | |
| is_cxl = (segment->second.segment.protocol == "cxl"); | |
| } | |
| // Remove from mounted_segments_ | |
| segment_manager_->mounted_segments_.erase(segment_id); | |
| // Decrease the total capacity | |
| MasterMetricManager::instance().dec_total_mem_capacity( | |
| segment_name, metrics_dec_capacity); |
🤖 Prompt for AI Agents
In `@mooncake-store/src/segment.cpp` around lines 221 - 236, The unmount path
skips calling MasterMetricManager::instance().dec_total_mem_capacity when
segment.protocol == "cxl", causing capacity to be incremented on mount but not
decremented on unmount; make the accounting symmetric by always decrementing
total mem capacity for the segment you found. In practice, ensure you only call
dec_total_mem_capacity when the segment was located (use the same found
condition used to set segment_name/is_cxl), and remove the is_cxl short-circuit
so that MasterMetricManager::instance().dec_total_mem_capacity(segment_name,
metrics_dec_capacity) is invoked for CXL segments as well (alternatively move
CXL capacity tracking to a one-time initializer if you intend per-system
accounting).
| DEFINE_string(cxl_device_name, "/dev/dax0.0", "Device name for cxl"); | ||
| DEFINE_uint64(cxl_device_size, 8589934592, "Device Size for cxl"); | ||
| DEFINE_bool(auto_disc, false, "Auto discover tcp devices"); | ||
| DEFINE_string(transfer_engine_metadata_url, "etcd://10.130.5.132:2379", | ||
| "Metadata connection string for transfer engine"); |
There was a problem hiding this comment.
Hardcoded IP addresses reduce test portability.
The hardcoded IP address 10.130.5.132 in cxl_device_name default and transfer_engine_metadata_url will fail in environments where this specific infrastructure isn't available. Consider using environment variables or localhost for default values.
🔧 Suggested fix for portability
-DEFINE_string(cxl_device_name, "/dev/dax0.0", "Device name for cxl");
-DEFINE_uint64(cxl_device_size, 8589934592, "Device Size for cxl");
-DEFINE_bool(auto_disc, false, "Auto discover tcp devices");
-DEFINE_string(transfer_engine_metadata_url, "etcd://10.130.5.132:2379",
- "Metadata connection string for transfer engine");
+DEFINE_string(cxl_device_name, "/dev/dax0.0", "Device name for cxl");
+DEFINE_uint64(cxl_device_size, 8589934592, "Device Size for cxl");
+DEFINE_bool(auto_disc, false, "Auto discover tcp devices");
+DEFINE_string(transfer_engine_metadata_url, "",
+ "Metadata connection string for transfer engine (required)");Then validate in SetUpTestSuite that the URL is provided.
🤖 Prompt for AI Agents
In `@mooncake-store/tests/cxl_client_integration_test.cpp` around lines 27 - 31,
The test hardcodes an environment-specific IP in the FLAGS/DEFINE defaults
(DEFINE_string cxl_device_name and transfer_engine_metadata_url) which breaks
portability; update the DEFINE_string defaults to use localhost or read from an
environment variable (e.g., getenv("CXL_DEVICE_NAME") /
getenv("TRANSFER_ENGINE_METADATA_URL")) instead of "10.130.5.132", and in the
test suite SetUpTestSuite (or equivalent test fixture initialization) validate
that the transfer_engine_metadata_url is present and fail fast with a clear
message if it is missing or empty; reference the DEFINE_string symbols
cxl_device_name and transfer_engine_metadata_url and the SetUpTestSuite test
fixture to locate where to change defaults and add the validation.
| static std::shared_ptr<Client> CreateClient(const std::string& host_name) { | ||
| auto client_opt = Client::Create( | ||
| host_name, // Local hostname | ||
| FLAGS_transfer_engine_metadata_url, // Metadata connection string | ||
| FLAGS_protocol, // Transfer protocol | ||
| std::nullopt, // RDMA device names (auto-discovery) | ||
| "10.130.5.132:50051"); | ||
|
|
||
| EXPECT_TRUE(client_opt.has_value()) | ||
| << "Failed to create client with host_name: " << host_name; | ||
| if (!client_opt.has_value()) { | ||
| return nullptr; | ||
| } | ||
| return client_opt.value(); | ||
| } |
There was a problem hiding this comment.
Hardcoded master server address limits test flexibility.
The master server address "10.130.5.132:50051" at line 96 is hardcoded. This should be configurable to allow running tests in different environments.
🔧 Suggested fix
Add a flag at the top of the file:
DEFINE_string(master_server_address, "10.130.5.132:50051",
"Master server address for testing");Then use it in CreateClient:
auto client_opt = Client::Create(
host_name, // Local hostname
FLAGS_transfer_engine_metadata_url, // Metadata connection string
FLAGS_protocol, // Transfer protocol
std::nullopt, // RDMA device names (auto-discovery)
- "10.130.5.132:50051");
+ FLAGS_master_server_address);🤖 Prompt for AI Agents
In `@mooncake-store/tests/cxl_client_integration_test.cpp` around lines 90 - 104,
The CreateClient function currently uses a hardcoded master server address
("10.130.5.132:50051"); add a gflags string flag (e.g.,
DEFINE_string(master_server_address, "10.130.5.132:50051", "Master server
address for testing")) near the top of the test file and replace the literal in
CreateClient with FLAGS_master_server_address so tests can be configured per
environment (refer to CreateClient and the hardcoded address string to locate
the change).
| static void TearDownTestSuite() { | ||
| CleanupSegment(); | ||
| CleanupClients(); | ||
| master_.Stop(); | ||
| google::ShutdownGoogleLogging(); | ||
| } |
There was a problem hiding this comment.
master_.Stop() called but master never started.
The static InProcMaster master_ is never started via a Start() call, yet Stop() is called in teardown. This suggests either incomplete test setup or dead code. If the test relies on an external master (given the hardcoded IP), master_ shouldn't be a member at all.
🔧 Suggested fix
Either remove the unused master_ member and its Stop() call:
static void TearDownTestSuite() {
CleanupSegment();
CleanupClients();
- master_.Stop();
google::ShutdownGoogleLogging();
}And remove from static members:
-static InProcMaster master_;Or properly start the master in SetUpTestSuite if it should be used.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| static void TearDownTestSuite() { | |
| CleanupSegment(); | |
| CleanupClients(); | |
| master_.Stop(); | |
| google::ShutdownGoogleLogging(); | |
| } | |
| static void TearDownTestSuite() { | |
| CleanupSegment(); | |
| CleanupClients(); | |
| google::ShutdownGoogleLogging(); | |
| } |
🤖 Prompt for AI Agents
In `@mooncake-store/tests/cxl_client_integration_test.cpp` around lines 129 - 134,
TearDownTestSuite is calling master_.Stop() although the static InProcMaster
master_ is never started; either remove the unused master_ member and its
master_.Stop() call in TearDownTestSuite (also remove master_ from the class
static members) or ensure the master is actually started by adding a
master_.Start() (and any necessary configuration) in SetUpTestSuite; locate
references to master_, InProcMaster, TearDownTestSuite and SetUpTestSuite to
apply the chosen fix consistently.
| buffer = client_buffer_allocator_->allocate(1 * 1024 * 1024); | ||
| slices.clear(); | ||
| slices.emplace_back(Slice{buffer, test_data.size()}); | ||
| // Verify data through Get operation | ||
| auto get_result = test_client_->Get(key, slices); | ||
| ASSERT_TRUE(get_result.has_value()) | ||
| << "Get operation failed: " << toString(get_result.error()); | ||
| ASSERT_EQ(slices.size(), 1); | ||
| ASSERT_EQ(slices[0].size, test_data.size()); | ||
| ASSERT_EQ(slices[0].ptr, buffer); | ||
| ASSERT_EQ(memcmp(slices[0].ptr, test_data.data(), test_data.size()), 0); | ||
| client_buffer_allocator_->deallocate(buffer, test_data.size()); |
There was a problem hiding this comment.
Allocation/deallocation size mismatch.
At line 265, 1 * 1024 * 1024 bytes are allocated, but at line 276, only test_data.size() bytes are deallocated. This mismatch may cause memory tracking issues in SimpleAllocator or memory leaks depending on the allocator implementation.
🐛 Fix the deallocation size
- buffer = client_buffer_allocator_->allocate(1 * 1024 * 1024);
+ const size_t get_buffer_size = 1 * 1024 * 1024;
+ buffer = client_buffer_allocator_->allocate(get_buffer_size);
slices.clear();
slices.emplace_back(Slice{buffer, test_data.size()});
// Verify data through Get operation
auto get_result = test_client_->Get(key, slices);
ASSERT_TRUE(get_result.has_value())
<< "Get operation failed: " << toString(get_result.error());
ASSERT_EQ(slices.size(), 1);
ASSERT_EQ(slices[0].size, test_data.size());
ASSERT_EQ(slices[0].ptr, buffer);
ASSERT_EQ(memcmp(slices[0].ptr, test_data.data(), test_data.size()), 0);
- client_buffer_allocator_->deallocate(buffer, test_data.size());
+ client_buffer_allocator_->deallocate(buffer, get_buffer_size);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| buffer = client_buffer_allocator_->allocate(1 * 1024 * 1024); | |
| slices.clear(); | |
| slices.emplace_back(Slice{buffer, test_data.size()}); | |
| // Verify data through Get operation | |
| auto get_result = test_client_->Get(key, slices); | |
| ASSERT_TRUE(get_result.has_value()) | |
| << "Get operation failed: " << toString(get_result.error()); | |
| ASSERT_EQ(slices.size(), 1); | |
| ASSERT_EQ(slices[0].size, test_data.size()); | |
| ASSERT_EQ(slices[0].ptr, buffer); | |
| ASSERT_EQ(memcmp(slices[0].ptr, test_data.data(), test_data.size()), 0); | |
| client_buffer_allocator_->deallocate(buffer, test_data.size()); | |
| const size_t get_buffer_size = 1 * 1024 * 1024; | |
| buffer = client_buffer_allocator_->allocate(get_buffer_size); | |
| slices.clear(); | |
| slices.emplace_back(Slice{buffer, test_data.size()}); | |
| // Verify data through Get operation | |
| auto get_result = test_client_->Get(key, slices); | |
| ASSERT_TRUE(get_result.has_value()) | |
| << "Get operation failed: " << toString(get_result.error()); | |
| ASSERT_EQ(slices.size(), 1); | |
| ASSERT_EQ(slices[0].size, test_data.size()); | |
| ASSERT_EQ(slices[0].ptr, buffer); | |
| ASSERT_EQ(memcmp(slices[0].ptr, test_data.data(), test_data.size()), 0); | |
| client_buffer_allocator_->deallocate(buffer, get_buffer_size); |
🤖 Prompt for AI Agents
In `@mooncake-store/tests/cxl_client_integration_test.cpp` around lines 265 - 276,
The test allocates 1 * 1024 * 1024 bytes via client_buffer_allocator_->allocate
but deallocates only test_data.size(), causing a size mismatch; change the
deallocation to match the allocated size (e.g., store size_t allocated_size = 1
* 1024 * 1024 or use the same expression) and call
client_buffer_allocator_->deallocate(buffer, allocated_size) (instead of
deallocating test_data.size()) so allocate(...) and deallocate(...) use the same
size for buffer.
| for (int i = 0; i < batch_sz; i++) { | ||
| keys.push_back("test_key_batch_put_" + std::to_string(i)); | ||
| test_data_list.push_back("test_data_" + std::to_string(i)); | ||
| } | ||
| void* buffer = nullptr; | ||
| void* target_buffer = nullptr; | ||
| batched_slices.reserve(batch_sz); | ||
| for (int i = 0; i < batch_sz; i++) { | ||
| std::vector<Slice> slices; | ||
| buffer = client_buffer_allocator_->allocate(test_data_list[i].size()); | ||
| memcpy(buffer, test_data_list[i].data(), test_data_list[i].size()); | ||
| slices.emplace_back(Slice{buffer, test_data_list[i].size()}); | ||
| batched_slices.push_back(std::move(slices)); | ||
| } | ||
| // Test Batch Put operation | ||
| ReplicateConfig config; | ||
| config.replica_num = 1; | ||
| auto start = std::chrono::high_resolution_clock::now(); | ||
| auto batch_put_results = | ||
| test_client_->BatchPut(keys, batched_slices, config); | ||
| // Check that all operations succeeded | ||
| for (const auto& result : batch_put_results) { | ||
| ASSERT_TRUE(result.has_value()) << "BatchPut operation failed"; | ||
| } | ||
| auto end = std::chrono::high_resolution_clock::now(); | ||
| LOG(INFO) << "Time taken for BatchPut: " | ||
| << std::chrono::duration_cast<std::chrono::microseconds>(end - | ||
| start) | ||
| .count() | ||
| << "us"; |
There was a problem hiding this comment.
Memory leak: batch put buffers are never deallocated.
The buffers allocated in the loop at lines 307-313 for batched_slices are never freed after the batch put operation completes. This causes a memory leak.
🐛 Add cleanup for batch put buffers
auto batch_put_results =
test_client_->BatchPut(keys, batched_slices, config);
// Check that all operations succeeded
for (const auto& result : batch_put_results) {
ASSERT_TRUE(result.has_value()) << "BatchPut operation failed";
}
auto end = std::chrono::high_resolution_clock::now();
LOG(INFO) << "Time taken for BatchPut: "
<< std::chrono::duration_cast<std::chrono::microseconds>(end -
start)
.count()
<< "us";
+
+ // Clean up batch put buffers
+ for (int i = 0; i < batch_sz; i++) {
+ client_buffer_allocator_->deallocate(batched_slices[i][0].ptr,
+ test_data_list[i].size());
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for (int i = 0; i < batch_sz; i++) { | |
| keys.push_back("test_key_batch_put_" + std::to_string(i)); | |
| test_data_list.push_back("test_data_" + std::to_string(i)); | |
| } | |
| void* buffer = nullptr; | |
| void* target_buffer = nullptr; | |
| batched_slices.reserve(batch_sz); | |
| for (int i = 0; i < batch_sz; i++) { | |
| std::vector<Slice> slices; | |
| buffer = client_buffer_allocator_->allocate(test_data_list[i].size()); | |
| memcpy(buffer, test_data_list[i].data(), test_data_list[i].size()); | |
| slices.emplace_back(Slice{buffer, test_data_list[i].size()}); | |
| batched_slices.push_back(std::move(slices)); | |
| } | |
| // Test Batch Put operation | |
| ReplicateConfig config; | |
| config.replica_num = 1; | |
| auto start = std::chrono::high_resolution_clock::now(); | |
| auto batch_put_results = | |
| test_client_->BatchPut(keys, batched_slices, config); | |
| // Check that all operations succeeded | |
| for (const auto& result : batch_put_results) { | |
| ASSERT_TRUE(result.has_value()) << "BatchPut operation failed"; | |
| } | |
| auto end = std::chrono::high_resolution_clock::now(); | |
| LOG(INFO) << "Time taken for BatchPut: " | |
| << std::chrono::duration_cast<std::chrono::microseconds>(end - | |
| start) | |
| .count() | |
| << "us"; | |
| for (int i = 0; i < batch_sz; i++) { | |
| keys.push_back("test_key_batch_put_" + std::to_string(i)); | |
| test_data_list.push_back("test_data_" + std::to_string(i)); | |
| } | |
| void* buffer = nullptr; | |
| void* target_buffer = nullptr; | |
| batched_slices.reserve(batch_sz); | |
| for (int i = 0; i < batch_sz; i++) { | |
| std::vector<Slice> slices; | |
| buffer = client_buffer_allocator_->allocate(test_data_list[i].size()); | |
| memcpy(buffer, test_data_list[i].data(), test_data_list[i].size()); | |
| slices.emplace_back(Slice{buffer, test_data_list[i].size()}); | |
| batched_slices.push_back(std::move(slices)); | |
| } | |
| // Test Batch Put operation | |
| ReplicateConfig config; | |
| config.replica_num = 1; | |
| auto start = std::chrono::high_resolution_clock::now(); | |
| auto batch_put_results = | |
| test_client_->BatchPut(keys, batched_slices, config); | |
| // Check that all operations succeeded | |
| for (const auto& result : batch_put_results) { | |
| ASSERT_TRUE(result.has_value()) << "BatchPut operation failed"; | |
| } | |
| auto end = std::chrono::high_resolution_clock::now(); | |
| LOG(INFO) << "Time taken for BatchPut: " | |
| << std::chrono::duration_cast<std::chrono::microseconds>(end - | |
| start) | |
| .count() | |
| << "us"; | |
| // Clean up batch put buffers | |
| for (int i = 0; i < batch_sz; i++) { | |
| client_buffer_allocator_->deallocate(batched_slices[i][0].ptr, | |
| test_data_list[i].size()); | |
| } |
🤖 Prompt for AI Agents
In `@mooncake-store/tests/cxl_client_integration_test.cpp` around lines 300 - 329,
The test leaks memory because buffers allocated via
client_buffer_allocator_->allocate for each Slice stored in batched_slices are
never deallocated; after the BatchPut call completes (test_client_->BatchPut),
iterate the batched_slices (and any target_buffer if used) and call the matching
deallocation on the allocator for each Slice pointer (e.g., deallocate the void*
stored in each Slice) to free all allocated buffers before the test ends; ensure
you handle all entries in batched_slices and clear the container afterward.
Description
Type of Change
How Has This Been Tested?
Checklist
./scripts/code_format.shbefore submitting.Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.