Skip to content

InsertSync: ping/pong multibuffer support#196

Open
TaoTao-real wants to merge 7 commits intozhangstevenunity:mainfrom
TaoTao-real:codex/multibuffer-pingpong
Open

InsertSync: ping/pong multibuffer support#196
TaoTao-real wants to merge 7 commits intozhangstevenunity:mainfrom
TaoTao-real:codex/multibuffer-pingpong

Conversation

@TaoTao-real
Copy link
Contributor

@TaoTao-real TaoTao-real commented Mar 5, 2026

Summary

This change adds first-class double-buffer (ping/pong) support to PTOAS so that:

  • frontends can explicitly mark a local buffer as multi-buffered,
  • PlanMemory can plan two addresses for it,
  • InsertSync can reliably detect true ping/pong (instead of heuristics) and emit dynamic event-id synchronization,
  • and the emitted C++ will actually switch between ping/pong buffers inside loops.

What’s included

  • IR contract: alloc-like buffers may carry pto.multi_buffer = 2 : i32.
  • Lowering plumbing: pto.alloc_tile -> memref.alloc + pto.bind_tile propagates pto.multi_buffer onto the created memref.alloc so PlanMemory can observe it.
  • PlanMemory: reads pto.multi_buffer on memref.alloc (currently only supports value 2; other values error out to avoid silent miscompile) and emits multi-address pto.pointer_cast(addrs=[ping, pong]).
  • InsertSync memory modeling: pointer-cast base addresses are tracked as a set (all planned addresses), and alias/overlap checks for local memory are based on planned address ranges (conservative when unknown).
  • InsertSync multibuffer detection: for loop back-edge deps, eventIdNum=2 is enabled only when the planned addresses satisfy a ping/pong overlap-matrix proof (i==j overlaps, i!=j does not overlap). Otherwise it safely degrades to single-buffer.
  • Dynamic event-id sync ops: new pto.set_flag_dyn / pto.wait_flag_dyn (event-id is an SSA index) + EmitC lowering.
  • EnableMultiBuffer pass: materializes ping/pong by selecting between the planned i64 addresses in-loop and building a loop-local pto.pointer_cast (so runtime actually alternates ping/pong). This also avoids emitting Tile->pointer casts that are not guaranteed to compile on A2/A3 toolchains.
  • Tests: new python sample test/samples/Sync/test_inject_sync_multibuf_pingpong.py + runop.sh regression guard locking in:
    • dynamic event-id usage,
    • ping/pong address selection in-loop,
    • and no Tile->pointer cast signature.
  • Crash fix: avoid ODS typed-accessor asserts when lowering pto.bitcast / pto.treshape after alloc_tile lowering in PTOViewToMemref.

Testing

  • PYTHONPATH=... bash test/samples/runop.sh -t Sync
  • PYTHONPATH=... bash test/samples/runop.sh all

Notes

  • This PR intentionally only supports pto.multi_buffer = 2 for the first version.
  • If static address/size info is insufficient to prove ping/pong, InsertSync conservatively falls back to single-buffer synchronization.

@TaoTao-real TaoTao-real changed the title InsertSync: ping/pong multibuffer support [WIP] InsertSync: ping/pong multibuffer support Mar 5, 2026
@TaoTao-real
Copy link
Contributor Author

Remote NPU validation (Ascend910, CANN 8.5.0, pto-isa 08a50c5):

  • testcase test_inject_sync_multibuf_pingpong: OK (ran 3 times; each run does 2 executions + strict compare)

Generated C++ now selects between ping/pong addresses (int64) and builds a loop-local tile via TASSIGN, avoiding Tile->pointer casts that can fail to compile on A2/A3 toolchains.

@TaoTao-real
Copy link
Contributor Author

Checked and reconciled local multibuffer work with this PR.\n\n- PR head branch is already at local latest commit (no extra local multibuffer commits pending integration).\n- The subset-based handwritten multibuffer regression is already included ().\n- Local sample check passed for the two multibuffer Sync cases:\n - \n - \n\nNo additional code update is required for PR #196 at this point.

@TaoTao-real
Copy link
Contributor Author

Follow-up (fixed formatting):

Checked and reconciled local multibuffer work with this PR.

  • PR head branch codex/multibuffer-pingpong is already at local latest commit a15da46 (no extra local multibuffer commits pending integration).
  • The subset-based handwritten multibuffer regression is already included (test_inject_sync_multibuf_subset_pingpong.py).
  • Local sample check passed for the two multibuffer Sync cases:
    • test_inject_sync_multibuf_pingpong.py
    • test_inject_sync_multibuf_subset_pingpong.py

No additional code update is required for PR #196 at this point.

@TaoTao-real TaoTao-real changed the title [WIP] InsertSync: ping/pong multibuffer support InsertSync: ping/pong multibuffer support Mar 7, 2026
- Add pto.multi_buffer=2 attr plumbing into PlanMemory (alloc_tile -> memref.alloc).
- Detect ping/pong via planned address overlap-matrix and emit dynamic event-id set/wait.
- Add EnableMultiBuffer pass to materialize loop-local ping/pong selection.
- Add Sync sample + runop guard; fix PTOViewToMemref typed accessor crash for bitcast/treshape.
emitc.call_opaque requires an IntegerAttr placeholder to print SSA operands.
Add the operand placeholder for event_id so set_flag/wait_flag receive the dynamic event argument, and extend the Sync multibuf runop guard to catch missing 3rd argument.
- Track view-like alias closures (bind_tile/subview/casts) from multi-address pointer_cast.
- Build ping/pong selector in the LCA loop and rematerialize loop-local alias ops so tile allocations also switch ping/pong.
- Update multibuf pingpong sample to use alloc_tile + TLOAD/TSTORE so the generated C++ builds on A2/A3 pto-isa.
Materialize ping/pong by selecting between i64 addresses and building a loop-local PointerCastOp.
This keeps bind_tile lowering able to trace the defining PointerCastOp and avoids generating C++ that casts Tile<> to __ubuf__ pointers (which breaks A2/A3 compilation).
Update the multibuf runop guard accordingly.
The ping/pong base addresses are an implementation detail of PlanMemory.
Check for >=2 distinct int64 constants + ternary address selection, rather than hard-coding 0/512.
@TaoTao-real TaoTao-real force-pushed the codex/multibuffer-pingpong branch from a15da46 to 4183bf4 Compare March 7, 2026 07:58
`[` $src_pipe `,` $dst_pipe `,` $event_id `]` attr-dict
}];
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增op务必同时更新文档 docs/PTO_IR_manual.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants