InsertSync: ping/pong multibuffer support#196
InsertSync: ping/pong multibuffer support#196TaoTao-real wants to merge 7 commits intozhangstevenunity:mainfrom
Conversation
|
Remote NPU validation (Ascend910, CANN 8.5.0, pto-isa 08a50c5):
Generated C++ now selects between ping/pong addresses (int64) and builds a loop-local tile via |
|
Checked and reconciled local multibuffer work with this PR.\n\n- PR head branch is already at local latest commit (no extra local multibuffer commits pending integration).\n- The subset-based handwritten multibuffer regression is already included ().\n- Local sample check passed for the two multibuffer Sync cases:\n - \n - \n\nNo additional code update is required for PR #196 at this point. |
|
Follow-up (fixed formatting): Checked and reconciled local multibuffer work with this PR.
No additional code update is required for PR #196 at this point. |
- Add pto.multi_buffer=2 attr plumbing into PlanMemory (alloc_tile -> memref.alloc). - Detect ping/pong via planned address overlap-matrix and emit dynamic event-id set/wait. - Add EnableMultiBuffer pass to materialize loop-local ping/pong selection. - Add Sync sample + runop guard; fix PTOViewToMemref typed accessor crash for bitcast/treshape.
emitc.call_opaque requires an IntegerAttr placeholder to print SSA operands. Add the operand placeholder for event_id so set_flag/wait_flag receive the dynamic event argument, and extend the Sync multibuf runop guard to catch missing 3rd argument.
- Track view-like alias closures (bind_tile/subview/casts) from multi-address pointer_cast. - Build ping/pong selector in the LCA loop and rematerialize loop-local alias ops so tile allocations also switch ping/pong. - Update multibuf pingpong sample to use alloc_tile + TLOAD/TSTORE so the generated C++ builds on A2/A3 pto-isa.
Materialize ping/pong by selecting between i64 addresses and building a loop-local PointerCastOp. This keeps bind_tile lowering able to trace the defining PointerCastOp and avoids generating C++ that casts Tile<> to __ubuf__ pointers (which breaks A2/A3 compilation). Update the multibuf runop guard accordingly.
The ping/pong base addresses are an implementation detail of PlanMemory. Check for >=2 distinct int64 constants + ternary address selection, rather than hard-coding 0/512.
a15da46 to
4183bf4
Compare
| `[` $src_pipe `,` $dst_pipe `,` $event_id `]` attr-dict | ||
| }]; | ||
| } | ||
|
|
There was a problem hiding this comment.
新增op务必同时更新文档 docs/PTO_IR_manual.md
Summary
This change adds first-class double-buffer (ping/pong) support to PTOAS so that:
What’s included
pto.multi_buffer = 2 : i32.pto.alloc_tile -> memref.alloc + pto.bind_tilepropagatespto.multi_bufferonto the createdmemref.allocso PlanMemory can observe it.pto.multi_bufferonmemref.alloc(currently only supports value2; other values error out to avoid silent miscompile) and emits multi-addresspto.pointer_cast(addrs=[ping, pong]).eventIdNum=2is enabled only when the planned addresses satisfy a ping/pong overlap-matrix proof (i==joverlaps,i!=jdoes not overlap). Otherwise it safely degrades to single-buffer.pto.set_flag_dyn/pto.wait_flag_dyn(event-id is an SSAindex) + EmitC lowering.pto.pointer_cast(so runtime actually alternates ping/pong). This also avoids emitting Tile->pointer casts that are not guaranteed to compile on A2/A3 toolchains.test/samples/Sync/test_inject_sync_multibuf_pingpong.py+runop.shregression guard locking in:pto.bitcast/pto.treshapeafteralloc_tilelowering inPTOViewToMemref.Testing
PYTHONPATH=... bash test/samples/runop.sh -t SyncPYTHONPATH=... bash test/samples/runop.sh allNotes
pto.multi_buffer = 2for the first version.