Skip to content

add force flag for remove api#12

Closed
XucSh wants to merge 1 commit intomainfrom
ci-test-pr
Closed

add force flag for remove api#12
XucSh wants to merge 1 commit intomainfrom
ci-test-pr

Conversation

@XucSh
Copy link
Owner

@XucSh XucSh commented Jan 27, 2026

Description

Type of Change

  • Types
    • Bug fix
    • New feature
      • Transfer Engine
      • Mooncake Store
      • Mooncake EP
      • Integration
      • P2P Store
      • Python Wheel
    • Breaking change
    • CI/CD
    • Documentation update
    • Other

How Has This Been Tested?

Checklist

  • I have performed a self-review of my own code.
  • I have formatted my own code using ./scripts/code_format.sh before submitting.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added force removal capability to all removal operations, allowing items to be removed regardless of lease status or processing state, providing users with greater control over data management.
  • Tests

    • Expanded test coverage for force-removal scenarios across various conditions.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 27, 2026

📝 Walkthrough

Walkthrough

The pull request extends remove operations across the mooncake store API by adding a force parameter to three key methods: Remove, RemoveByRegex, and RemoveAll. The force parameter (default false) bypasses lease and replication safety checks, threading through Python bindings, client implementations, service layers, and RPC wrappers down to the master service logic.

Changes

Cohort / File(s) Summary
Python Bindings
mooncake-integration/store/store_py.cpp
Extended remove, removeByRegex, and removeAll Python bindings to accept a force parameter and pass it through to underlying store methods
Client Service
mooncake-store/include/client_service.h, mooncake-store/src/client_service.cpp
Updated Remove, RemoveByRegex, RemoveAll signatures to include bool force parameter; delegates to master_client_ with force argument
Dummy Client
mooncake-store/include/dummy_client.h, mooncake-store/src/dummy_client.cpp
Added force parameter to remove, removeByRegex, removeAll; forwards to corresponding _internal RPC methods
Master Client
mooncake-store/include/master_client.h, mooncake-store/src/master_client.cpp
Extended Remove, RemoveByRegex, RemoveAll with force parameter; updated logging and RPC invocations to include force
Master Service
mooncake-store/include/master_service.h, mooncake-store/src/master_service.cpp
Added force parameter to Remove, RemoveByRegex, RemoveAll; conditionally skips lease, replica, and replication-task guards when force=true
PyClient Interface
mooncake-store/include/pyclient.h
Updated abstract virtual methods to include bool force = false parameter
Real Client
mooncake-store/include/real_client.h, mooncake-store/src/real_client.cpp
Extended public and internal remove methods (remove, removeByRegex, removeAll) with force parameter; threads through to client_-> calls
RPC Service
mooncake-store/include/rpc_service.h, mooncake-store/src/rpc_service.cpp
Updated WrappedMasterService Remove, RemoveByRegex, RemoveAll signatures with force parameter; forwards to master_service_ and updates logging
Tests
mooncake-store/tests/master_service_test.cpp
Added comprehensive test suite for force-removal scenarios: ForceRemoveLeasedObject, ForceRemoveByRegexLeasedObjects, ForceRemoveAllLeasedObjects, ForceRemoveProcessingObject, validating bypass of lease and processing constraints

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 With whiskers twitching and a hop so bright,
We thread the force through code with might,
From Python bindings down so deep,
Past clients, masters, leases we sweep,
Force removal flows—no lease to keep!

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is a blank template with no actual content filled in—no description of changes, no type selection, no testing information, and no checklist completion. Complete the description template by providing: a clear description of the changes and related issues, selecting the appropriate change type (likely 'New feature' under 'Mooncake Store'), explaining how changes were tested, and checking off completed checklist items.
Docstring Coverage ⚠️ Warning Docstring coverage is 19.61% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'add force flag for remove api' clearly and concisely describes the main change: adding a force parameter to remove-related methods across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
mooncake-store/src/master_service.cpp (1)

1260-1381: Force removals orphan replication/processing state, causing logged errors in background cleanup.

When force=true, Remove/RemoveByRegex/RemoveAll erase metadata without clearing replication_tasks or processing_keys. This leaves stale task entries that background cleanup encounters and logs as errors ("Key X was removed with ongoing replication task"). Clear these structures alongside metadata erasure when force is enabled.

For Remove(), call accessor.EraseReplicationTask() and accessor.EraseFromProcessing() when force=true before erasing metadata. For RemoveByRegex() and RemoveAll(), explicitly erase keys from both shard->replication_tasks and shard->processing_keys before erasing metadata.

🧹 Nitpick comments (1)
mooncake-store/src/master_client.cpp (1)

530-539: Inconsistent log parameter name.

The log message uses key= but the parameter represents a regex pattern (str). This should be regex= for consistency with the WrappedMasterService::RemoveByRegex logging shown in the relevant snippets.

Suggested fix
 tl::expected<long, ErrorCode> MasterClient::RemoveByRegex(
     const std::string& str, bool force) {
     ScopedVLogTimer timer(1, "MasterClient::RemoveByRegex");
-    timer.LogRequest("key=", str, ", force=", force);
+    timer.LogRequest("regex=", str, ", force=", force);
 
     auto result =
         invoke_rpc<&WrappedMasterService::RemoveByRegex, long>(str, force);

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a force flag to the remove APIs (Remove, RemoveByRegex, and RemoveAll) in the Mooncake Store. The force flag allows bypassing safety checks (lease expiration, replica completion, and replication task checks) to enable forceful removal of objects even when they are in active use.

Changes:

  • Added a boolean force parameter (default=false) to Remove, RemoveByRegex, and RemoveAll methods across all API layers
  • Added comprehensive test coverage for force removal scenarios including leased objects and processing objects
  • Updated Python bindings to expose the force parameter with proper documentation

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
mooncake-store/tests/master_service_test.cpp Adds 4 new test cases covering force removal scenarios: leased objects, regex removal, remove all, and processing objects
mooncake-store/src/master_service.cpp Core implementation: adds force parameter and conditional checks that bypass safety validations when force=true
mooncake-store/src/rpc_service.cpp RPC wrapper: propagates force parameter and adds it to request logging
mooncake-store/src/master_client.cpp Master client: propagates force parameter and adds it to request logging
mooncake-store/src/client_service.cpp Client service: propagates force parameter through to master client
mooncake-store/src/real_client.cpp Real client implementation: adds force parameter to internal and public methods
mooncake-store/src/dummy_client.cpp Dummy client implementation: adds force parameter and propagates to real client
mooncake-store/include/master_service.h Header: adds force parameter with default value and updates documentation
mooncake-store/include/rpc_service.h Header: adds force parameter with default value to RPC service interface
mooncake-store/include/master_client.h Header: adds force parameter with default value and updates documentation
mooncake-store/include/client_service.h Header: adds force parameter with default value and updates documentation
mooncake-store/include/real_client.h Header: adds force parameter with default value to both public and internal methods
mooncake-store/include/dummy_client.h Header: adds force parameter with default value
mooncake-store/include/pyclient.h Header: adds force parameter with default value to virtual interface
mooncake-integration/store/store_py.cpp Python bindings: exposes force parameter with proper argument naming and documentation strings

Comment on lines +1323 to 1327
if (!force &&
metadata_shards_[i].replication_tasks.contains(it->first)) {
LOG(WARNING) << "key=" << it->first
<< ", matched by regex, but has replication "
"task. Skipping removal.";
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When force removal bypasses safety checks in RemoveByRegex, consider adding logging to indicate this occurred for each object. The current logging at line 1332-1333 doesn't distinguish between normal and force removals. Adding information about which safety checks were bypassed (e.g., active lease, incomplete replicas, or replication tasks) would aid in troubleshooting and auditing.

Copilot uses AI. Check for mistakes.
Comment on lines +1363 to 1368
if (force || (it->second.IsLeaseExpired(now) &&
it->second.AllReplicas(&Replica::fn_is_completed) &&
!shard->replication_tasks.contains(it->first))) {
auto mem_rep_count =
it->second.CountReplicas(&Replica::fn_is_memory_replica);
total_freed_size += it->second.size * mem_rep_count;
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When force=true is used in RemoveAll, consider logging additional information about the operation. The current log at line 1377-1379 doesn't indicate whether force was used or how many objects were removed despite having active leases or incomplete states. Adding a force parameter to the log would help with auditing and troubleshooting: VLOG(1) << "action=remove_all_objects, force=" << force << ", removed_count=" << removed_count << ", total_freed_size=" << total_freed_size;

Copilot uses AI. Check for mistakes.
Comment on lines +1280 to 1283
if (!force && accessor.HasReplicationTask()) {
LOG(ERROR) << "key=" << key << ", error=object_has_replication_task";
return tl::make_unexpected(ErrorCode::OBJECT_HAS_REPLICATION_TASK);
}
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When force=true, the Remove method erases the object metadata without cleaning up any associated replication task (line 1286). This can leave orphaned replication tasks in the system that reference non-existent objects. Consider adding if (accessor.HasReplicationTask()) { accessor.EraseReplicationTask(); } before calling accessor.Erase() to properly clean up replication tasks even during force removal. This would prevent workers from fetching and attempting to execute tasks for objects that no longer exist.

Copilot uses AI. Check for mistakes.
Comment on lines +1323 to 1327
if (!force &&
metadata_shards_[i].replication_tasks.contains(it->first)) {
LOG(WARNING) << "key=" << it->first
<< ", matched by regex, but has replication "
"task. Skipping removal.";
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When force=true, RemoveByRegex skips the replication task check but doesn't clean up any associated replication tasks before removing objects. This can leave orphaned replication tasks in the system. Consider adding cleanup code to erase replication tasks for matched keys when force=true: if (metadata_shards_[i].replication_tasks.contains(it->first)) { metadata_shards_[i].replication_tasks.erase(it->first); } before erasing the metadata. This would prevent workers from fetching tasks for objects that no longer exist.

Copilot uses AI. Check for mistakes.
!shard->replication_tasks.contains(it->first))) {
auto mem_rep_count =
it->second.CountReplicas(&Replica::fn_is_memory_replica);
total_freed_size += it->second.size * mem_rep_count;
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When force=true, RemoveAll skips the replication task check but doesn't clean up any associated replication tasks before removing objects. This can leave orphaned replication tasks in the system. Consider adding cleanup code to erase replication tasks for objects being removed when force=true: if (shard->replication_tasks.contains(it->first)) { shard->replication_tasks.erase(it->first); } before erasing the metadata. This would prevent workers from fetching tasks for objects that no longer exist.

Suggested change
total_freed_size += it->second.size * mem_rep_count;
total_freed_size += it->second.size * mem_rep_count;
// When force=true, we may be removing objects that still have
// outstanding replication tasks. Clean them up to avoid
// leaving orphaned replication tasks.
if (force && shard->replication_tasks.contains(it->first)) {
shard->replication_tasks.erase(it->first);
}

Copilot uses AI. Check for mistakes.
Comment on lines +4276 to +4305
// Test force Remove with processing replicas (not complete)
TEST_F(MasterServiceTest, ForceRemoveProcessingObject) {
std::unique_ptr<MasterService> service_(new MasterService());
[[maybe_unused]] const auto context = PrepareSimpleSegment(*service_);
const UUID client_id = generate_uuid();

// Start a put but don't end it - object will be in PROCESSING state
std::string key = "processing_key";
uint64_t slice_length = 1024;
ReplicateConfig config;
config.replica_num = 1;
auto put_start_result =
service_->PutStart(client_id, key, slice_length, config);
ASSERT_TRUE(put_start_result.has_value());

// Normal remove should fail because replica is not ready
auto remove_result_no_force = service_->Remove(key, false);
EXPECT_FALSE(remove_result_no_force.has_value());
EXPECT_EQ(ErrorCode::REPLICA_IS_NOT_READY, remove_result_no_force.error());

// Force remove should succeed even with processing replica
auto remove_result_force = service_->Remove(key, true);
EXPECT_TRUE(remove_result_force.has_value());

// Verify object is removed
auto get_result = service_->GetReplicaList(key);
EXPECT_FALSE(get_result.has_value());
EXPECT_EQ(ErrorCode::OBJECT_NOT_FOUND, get_result.error());
}

Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test suite is missing coverage for force removal of objects with active replication tasks. Consider adding a test case that creates a copy or move task for an object, then attempts to remove it with force=true. This would verify that force removal works correctly (or expose the orphaned task issue mentioned in other comments) when replication tasks are present. The test should verify both that the object is removed and check the state of the replication task afterward.

Copilot uses AI. Check for mistakes.
@XucSh XucSh closed this Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments