-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
Summary
PTOAS can already lower local routing primitives like TGATHER / TSCATTER, but MC2 routing ports still have no MLIR-level surface for the distributed communication context itself.
Current state
In the local 910B migration workspace:
moe_distribute_dispatchandmoe_distribute_combinenow compile throughPTO-DSL -> PTOAS -> bishengfor the real A2 contract (epWorldSize=8,H=7168).- PTO source is explicit-sync-free and PTOAS insert-sync is active.
- The actual 8-rank routing benchmark still times out before any rank report is emitted, so failures are currently only visible at the outer harness level.
Gaps exposed by bring-up
- No MLIR-level op or abstraction for HCCL / parallel-group / window context used by MC2 collectives.
- No direct way to represent the communication boundary in PTO IR, so MC2 ports fall back to host-managed HCCL steps outside PTOAS.
- Diagnostics for multi-rank routing failures are too indirect; once ranks stall, PTOAS provides no comm-specific breadcrumbs to distinguish context/setup issues from lowered-kernel issues.
Requested work
- Add PTO IR / lowering surface for MC2 collective context and collective-style operations.
- Improve diagnostics around comm-lowered kernels so 8-rank routing failures can be attributed earlier and more precisely.
- Keep A2/A3 autosync behavior explicit for scalar-pipe comm instructions such as
TSCATTER.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels