Conversation
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThis PR adds a Changes
Sequence Diagram(s)sequenceDiagram
participant Python as Python Caller
participant PyBinding as Python Binding<br/>(store_py.cpp)
participant Client as Client Service<br/>(client_service.cpp)
participant RealClient as Real Client<br/>(real_client.cpp)
participant MasterClient as Master Client<br/>(master_client.cpp)
participant RPC as RPC Service<br/>(rpc_service.cpp)
participant MasterSvc as Master Service<br/>(master_service.cpp)
Python->>PyBinding: remove(key, force=False)
PyBinding->>PyBinding: Release GIL
PyBinding->>Client: Remove(key, force)
Client->>MasterClient: Remove(key, force)
MasterClient->>RPC: Remove(key, force)
RPC->>MasterSvc: Remove(key, force)
alt force == true
MasterSvc->>MasterSvc: Skip lease/replica checks
MasterSvc->>MasterSvc: Erase object directly
else force == false
MasterSvc->>MasterSvc: Validate lease expired
MasterSvc->>MasterSvc: Validate replicas ready
MasterSvc->>MasterSvc: Validate no replication tasks
MasterSvc->>MasterSvc: Erase object if valid
end
MasterSvc-->>RPC: Result
RPC-->>MasterClient: Result
MasterClient-->>Client: Result
Client-->>PyBinding: Result
PyBinding->>PyBinding: Reacquire GIL
PyBinding-->>Python: Return value
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
There was a problem hiding this comment.
Pull request overview
This pull request appears to be a comprehensive CI test and infrastructure improvement update with significant refactoring across multiple components.
Changes:
- Split EP (Elastic Parallelism) module into separate PG (Process Group) module for better organization
- Add support for PyTorch 2.10.0 and CUDA 13.0 build configurations
- Enhance store service with force removal capabilities and retry mechanisms
- Add new transport protocols (UBSHMEM, IntraNode NVLink) and improve protocol documentation
- Improve build system with better glibc version detection and platform tag handling
- Refactor mooncake-pg as a standalone module with version-specific builds
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/* |
Add CUDA 13 CI workflow, update PyTorch versions to include 2.10.0, improve integration test timeout handling |
scripts/build_wheel.sh |
Enhance glibc version detection with multiple fallback methods for better platform compatibility |
mooncake-pg/ |
New standalone process group module extracted from mooncake-ep for version-specific PyTorch builds |
mooncake-wheel/mooncake/pg.py |
New module for PyTorch distributed backend with dynamic version-specific loading |
mooncake-store/src/* |
Add force parameter to Remove/RemoveAll operations, improve file storage with retry logic and GC |
mooncake-transfer-engine/ |
Add UBSHMEM and IntraNode NVLink transport support, improve RDMA endpoint handling |
docs/source/getting_started/supported-protocols.md |
New comprehensive protocol documentation (325 lines) |
mooncake-ep/src/mooncake_ep_buffer.cpp |
Improve GID index discovery for RDMA, add async stream synchronization |
scripts/code_format.sh |
Require clang-format-20, add thirdparty to exclusions, extend file type coverage |
Description
Type of Change
How Has This Been Tested?
Checklist
./scripts/code_format.shbefore submitting.Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.