Skip to content

Ci test pr#11

Closed
XucSh wants to merge 1 commit intomainfrom
ci-test-pr
Closed

Ci test pr#11
XucSh wants to merge 1 commit intomainfrom
ci-test-pr

Conversation

@XucSh
Copy link
Owner

@XucSh XucSh commented Jan 27, 2026

Description

Type of Change

  • Types
    • Bug fix
    • New feature
      • Transfer Engine
      • Mooncake Store
      • Mooncake EP
      • Integration
      • P2P Store
      • Python Wheel
    • Breaking change
    • CI/CD
    • Documentation update
    • Other

How Has This Been Tested?

Checklist

  • I have performed a self-review of my own code.
  • I have formatted my own code using ./scripts/code_format.sh before submitting.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

Summary by CodeRabbit

New Features

  • Added an optional "force" parameter to removal operations (remove, removeByRegex, removeAll) that bypasses safety checks for lease expiration and replica readiness validation
  • Enables immediate deletion of objects in any state without waiting for normal conditions to be met
  • Available across single-object, pattern-based, and bulk removal operations

Tests

  • Added test coverage for forced removal scenarios including leased objects and objects in processing states

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 27, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This PR adds a force boolean parameter (defaulting to false) to removal methods across the mooncake store system. The parameter propagates through Python bindings, client interfaces, RPC services, and master service implementations, enabling callers to bypass lease and replica readiness checks during object removal operations.

Changes

Cohort / File(s) Summary
Python Bindings
mooncake-integration/store/store_py.cpp
Updated remove(), removeByRegex(), and removeAll() bindings to accept optional force parameter (default false), release GIL, and forward the flag to underlying C++ methods.
Client Interface Headers
mooncake-store/include/client_service.h, dummy_client.h, master_client.h, real_client.h, rpc_service.h
Extended removal method signatures across all client interfaces with optional force parameter (default false) to support bypassing lease/replica checks.
Service Interface Header
mooncake-store/include/master_service.h
Added force parameter to Remove(), RemoveByRegex(), and RemoveAll() declarations with documentation; minor formatting adjustments to other method signatures.
Client Service Implementation
mooncake-store/src/client_service.cpp
Updated Client::Remove(), Client::RemoveByRegex(), and Client::RemoveAll() to accept and forward force parameter to master_client_ calls with explicit error handling.
Dummy Client Implementation
mooncake-store/src/dummy_client.cpp
Added force parameter to public methods and wired them to forward through RealClient internal calls.
Master Client Implementation
mooncake-store/src/master_client.cpp
Extended Remove(), RemoveByRegex(), and RemoveAll() with force parameter; updated RPC invocations and logging to include force flag.
Master Service Implementation
mooncake-store/src/master_service.cpp
Added control flow logic to skip lease expiration, replica readiness, and replication-task checks when force=true; otherwise preserves original validations for all three removal methods.
Real Client Implementation
mooncake-store/src/real_client.cpp
Updated public and internal methods (remove, removeByRegex, removeAll variants) to accept and propagate force parameter through call chain.
RPC Service Implementation
mooncake-store/src/rpc_service.cpp
Extended WrappedMasterService removal methods with force parameter; updated RPC execution paths and logging to include force flag propagation.
Master Service Tests
mooncake-store/tests/master_service_test.cpp
Added 6 test cases validating force removal: ForceRemoveLeasedObject, ForceRemoveByRegexLeasedObjects, ForceRemoveAllLeasedObjects, and ForceRemoveProcessingObject scenarios with assertions for lease and replica state bypassing.

Sequence Diagram(s)

sequenceDiagram
    participant Python as Python Caller
    participant PyBinding as Python Binding<br/>(store_py.cpp)
    participant Client as Client Service<br/>(client_service.cpp)
    participant RealClient as Real Client<br/>(real_client.cpp)
    participant MasterClient as Master Client<br/>(master_client.cpp)
    participant RPC as RPC Service<br/>(rpc_service.cpp)
    participant MasterSvc as Master Service<br/>(master_service.cpp)

    Python->>PyBinding: remove(key, force=False)
    PyBinding->>PyBinding: Release GIL
    PyBinding->>Client: Remove(key, force)
    Client->>MasterClient: Remove(key, force)
    MasterClient->>RPC: Remove(key, force)
    RPC->>MasterSvc: Remove(key, force)
    
    alt force == true
        MasterSvc->>MasterSvc: Skip lease/replica checks
        MasterSvc->>MasterSvc: Erase object directly
    else force == false
        MasterSvc->>MasterSvc: Validate lease expired
        MasterSvc->>MasterSvc: Validate replicas ready
        MasterSvc->>MasterSvc: Validate no replication tasks
        MasterSvc->>MasterSvc: Erase object if valid
    end
    
    MasterSvc-->>RPC: Result
    RPC-->>MasterClient: Result
    MasterClient-->>Client: Result
    Client-->>PyBinding: Result
    PyBinding->>PyBinding: Reacquire GIL
    PyBinding-->>Python: Return value
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A force flag hops through the store,
Bypassing checks that seemed before,
Leases and replicas step aside,
When forced removal leads the ride!
Tests validate the paths so new,
The rabbit grins—constraints we flew! 🌟

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request appears to be a comprehensive CI test and infrastructure improvement update with significant refactoring across multiple components.

Changes:

  • Split EP (Elastic Parallelism) module into separate PG (Process Group) module for better organization
  • Add support for PyTorch 2.10.0 and CUDA 13.0 build configurations
  • Enhance store service with force removal capabilities and retry mechanisms
  • Add new transport protocols (UBSHMEM, IntraNode NVLink) and improve protocol documentation
  • Improve build system with better glibc version detection and platform tag handling
  • Refactor mooncake-pg as a standalone module with version-specific builds

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.

Show a summary per file
File Description
.github/workflows/* Add CUDA 13 CI workflow, update PyTorch versions to include 2.10.0, improve integration test timeout handling
scripts/build_wheel.sh Enhance glibc version detection with multiple fallback methods for better platform compatibility
mooncake-pg/ New standalone process group module extracted from mooncake-ep for version-specific PyTorch builds
mooncake-wheel/mooncake/pg.py New module for PyTorch distributed backend with dynamic version-specific loading
mooncake-store/src/* Add force parameter to Remove/RemoveAll operations, improve file storage with retry logic and GC
mooncake-transfer-engine/ Add UBSHMEM and IntraNode NVLink transport support, improve RDMA endpoint handling
docs/source/getting_started/supported-protocols.md New comprehensive protocol documentation (325 lines)
mooncake-ep/src/mooncake_ep_buffer.cpp Improve GID index discovery for RDMA, add async stream synchronization
scripts/code_format.sh Require clang-format-20, add thirdparty to exclusions, extend file type coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments