refactor(pass): rewrite insert_sync_pass#250
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR redesigns the insert_sync pass from a single-pass algorithm to a four-phase approach: dependency collection, scope adjustment, event ID allocation, and AST construction. It introduces new path-based position tracking, refactors event ID management with scope awareness, expands synchronization patterns across multiple pipeline types, and rewrites tests to use structural IR comparison instead of assertion counting. Changes
Sequence Diagram(s)sequenceDiagram
participant User as InsertSyncPass
participant Phase1 as Dependency<br/>Collector
participant Phase2 as Scope<br/>Adjuster
participant Phase3 as Event ID<br/>Manager
participant Phase4 as AST<br/>Constructor
User->>Phase1: collect_sync_pairs()
activate Phase1
Phase1->>Phase1: Walk IR tree
Phase1->>Phase1: Track RAW/WAR/WAW deps
Phase1->>Phase1: Handle loop unrolling
Phase1-->>User: SyncPair records
deactivate Phase1
User->>Phase2: adjust_scope_crossings()
activate Phase2
Phase2->>Phase2: Check IfStmt/ForStmt bounds
Phase2->>Phase2: Relocate sync_src/sync_dst
Phase2->>Phase2: Handle cross-iteration moves
Phase2-->>User: Adjusted SyncPair positions
deactivate Phase2
User->>Phase3: alloc_event_ids()
activate Phase3
Phase3->>Phase3: Scope-aware allocation
Phase3->>Phase3: Track set/wait positions
Phase3->>Phase3: Reuse where possible
Phase3-->>User: Populated event IDs
deactivate Phase3
User->>Phase4: build_insertion_plan()
activate Phase4
Phase4->>Phase4: Create InsertionPlan
Phase4->>Phase4: Map sync calls & barriers
Phase4-->>User: InsertionPlan
deactivate Phase4
User->>Phase4: mutate_ast()
activate Phase4
Phase4->>Phase4: Insert sync_src/sync_dst
Phase4->>Phase4: Insert bar_v/bar_m
Phase4-->>User: Modified AST
deactivate Phase4
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Summary of ChangesHello @zhangqi-chen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive refactor of the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is an impressive and thorough refactoring of the insert_sync_pass. The new position-based algorithm is a significant improvement, providing a much more robust way to handle complex control flow like nested IfStmt and ForStmt. The four-phase approach (Collect, Adjust, Assign, Build) is well-structured and makes the logic easier to follow. The updates to the documentation and tests are also excellent; the new tests using structural comparison are far more reliable than the old counting-based tests.
I have found one critical issue in the implementation of Position::IsBefore, which could lead to incorrect sync placement. Please see the detailed comment. Once that is addressed, this will be a fantastic improvement to the codebase.
634c4d8 to
3346c77
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/dev/passes/06-insert_sync.md (1)
181-187:⚠️ Potential issue | 🟠 MajorUpdate doc example to use the correct backend configuration API.
The doc uses
backend.set_backend(backend.Ascend910B()), but the actual API isbackend.set_backend_type()with aBackendTypeenum. Replace with:from pypto import backend from pypto.backend import BackendType # Set backend before running InsertSync backend.set_backend_type(BackendType.CCE) program_with_sync = passes.insert_sync()(program)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/dev/passes/06-insert_sync.md` around lines 181 - 187, The docs example uses the removed API backend.set_backend(backend.Ascend910B()); update it to call backend.set_backend_type(...) using the BackendType enum: import BackendType from pypto.backend and call backend.set_backend_type(BackendType.CCE) before running passes.insert_sync()(program) so the example matches the real API (replace references to backend.set_backend and Ascend910B with backend.set_backend_type and BackendType.CCE).
🧹 Nitpick comments (3)
tests/ut/ir/transforms/test_insert_sync.py (3)
16-45: Consider addingmake_bar_m()(and optionally “private” helpers) for future coverage.
You already insertbar_min the pass; having a symmetric helper now avoids copy/paste when tests expand to M-pipe intra-barriers. Also, prefixing helpers with_can reduce accidental imports/reuse from other test modules.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ut/ir/transforms/test_insert_sync.py` around lines 16 - 45, Add a symmetric helper make_bar_m() that mirrors make_bar_v() but creates an op call to "system.bar_m" (returning ir.EvalStmt(ir.create_op_call("system.bar_m", [], {}, _span), _span)), and consider renaming or adding underscore-prefixed variants (e.g., _make_sync_src, _make_sync_dst, _make_bar_v/_make_bar_m) so helpers are treated as private; update any test usages to call make_bar_m or the new private helpers as appropriate.
104-108: Deduplicate repeated backend setup via a helper/fixture.
The repeatedbackend.reset_for_testing()+backend.set_backend_type(BackendType.CCE)pattern across tests is easy to drift over time (e.g., if a new required backend init step is added). A smallrun_insert_sync(program)helper or anautousepytest fixture would keep this consistent.Also applies to: 180-184, 275-279, 381-385
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ut/ir/transforms/test_insert_sync.py` around lines 104 - 108, Tests repeat backend.reset_for_testing() and backend.set_backend_type(BackendType.CCE) before invoking passes.insert_sync()(Before); create a small helper (e.g., run_insert_sync(program)) or an autouse pytest fixture that performs backend.reset_for_testing(), backend.set_backend_type(BackendType.CCE) and then returns passes.insert_sync()(program), and replace the repeated setup lines in tests (locations around the calls to passes.insert_sync()(Before) and similar at lines ~180-184, 275-279, 381-385) with a single call to that helper/fixture to ensure consistent backend initialization.
48-139: Add a regression test for event-id reuse within the same pipe-pair across multiple regions.
Right now tests cover “multiple event IDs needed” (CUBE) but not “same event ID reused later”. A minimal case like two disjointMTE2 -> Vproducer/consumer regions in the same scope should verify that bothsync_src/sync_dstpairs are emitted even if the event id is reused (oftenevent=0twice). This will catch insertion-plan de-dup mistakes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ut/ir/transforms/test_insert_sync.py` around lines 48 - 139, Add a regression test to ensure event-id reuse across disjoint regions emits both sync pairs: create a new test (e.g., test_insert_sync_reuse_event_id) in tests/ut/ir/transforms/test_insert_sync.py modeled after test_insert_sync_cross_pipe that constructs two separate MTE2 -> V producer/consumer regions in the same scope (two independent load/add/store sequences) so the insert_sync pass may reuse event id (event=0) for both; run passes.insert_sync() on the program and assert the resulting IR contains both make_sync_src(MTE2, V, 0)/make_sync_dst(MTE2, V, 0) pairs for each region (use ir.assert_structural_equal with enable_auto_mapping=True), ensuring you reference the same vars/tile names as the Before IR so auto-mapping handles matching.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/dev/passes/06-insert_sync.md`:
- Around line 41-57: Update the wording to reflect implementation: replace
“Global barrier” with “Intra-pipeline barrier” and state that bar_v / bar_m are
intra-pipe ordering barriers (not whole-program/global fences); also soften the
OpStmts merging claim from “No standalone sync OpStmts” to something like “Sync
operations are merged into adjacent OpStmts when possible; in some control-flow
edge cases (insertions between non-OpStmts like IfStmt/ForStmt boundaries) the
pass may still emit OpStmts that contain only sync/bar calls.” Keep references
to symbols bar_v, bar_m, and OpStmts so readers can locate the behavior in the
implementation.
In `@src/ir/transforms/insert_sync_pass.cpp`:
- Around line 327-363: ProcessLeafStmt currently does O(N*M) scans using
IsSameMem across state.last_writers/last_readers; change it to use direct map
lookups by MemRefPtr (assume stable identity) instead of iterating and
IsSameMem: in ProcessLeafStmt, replace the sync_against_writers loop and the WAR
loop to lookup state.last_writers.find(mr) and state.last_readers.find(w)
respectively (use MemRefPtr as the exact key), and adjust updates to
state.last_writers and state.last_readers to maintain only the minimal reader
set (e.g., latest reader per pipe or a bounded small collection) rather than
appending all readers; touch functions/symbols ProcessLeafStmt,
state.last_writers, state.last_readers, GetLeafMemRefs, CreateSyncPair and
remove IsSameMem-based scans to avoid quadratic pair creation.
- Around line 507-540: RemoveTransitiveRedundantPairs currently does an O(n³)
scan over ctx_.sync_pairs to find transitive A→B and B→C patterns and mark
SyncPair entries redundant; replace the triple nested search by building two
hash maps (or unordered_multimap) keyed by set_position and by wait_position
that index indices (or pointers) into ctx_.ync_pairs so you can, for each
pair_ac in RemoveTransitiveRedundantPairs, iterate candidates pair_ab =
lookup_by_set_position[pair_ac.set_position] and then check pair_bc existence
via lookup_by_set_position or lookup_by_wait_position (as appropriate) to test
pair_ab.wait_position == pair_bc.set_position and pair_bc.wait_position ==
pair_ac.wait_position; update is_redundant[i] when a match is found and finally
filter ctx_.sync_pairs as before. Ensure the maps are built once at the start of
RemoveTransitiveRedundantPairs and update lookups using SyncPair.set_position
and SyncPair.wait_position to reduce complexity to roughly O(n²) or better.
- Around line 704-753: BuildInsertionPlans() currently de-duplicates syncs in
scope_sync_src/scope_sync_dst using only (producer_pipe, consumer_pipe,
event_id) which drops required syncs when the same event_id is reused at
different positions; change the dedup key to include the insertion position (use
set_key/wait_key or the Position/PosKey) so uniqueness is (position,
producer_pipe, consumer_pipe, event_id). Concretely, update scope_sync_src and
scope_sync_dst to map/set using a key that includes the PosKey (or pair of
PosKey and the existing tuple) and use set_key/wait_key when inserting/checking
before pushing CreateSyncCall in BuildInsertionPlans().
- Around line 123-129: GetPipeForCall currently dereferences
pypto::backend::GetBackend() without checking for null, which can crash; update
GetPipeForCall to call pypto::backend::GetBackend(), check the returned pointer
for nullptr, and if null fail fast with a clear diagnostic (e.g., throw a
descriptive runtime_error or use the project's CHECK/LOG/error-reporting
utility) before calling backend->GetOpInfo; keep the existing behavior when
backend is non-null by then querying GetOpInfo(call->op_->name_) and returning
info->pipe or PipeType::S as before.
---
Outside diff comments:
In `@docs/dev/passes/06-insert_sync.md`:
- Around line 181-187: The docs example uses the removed API
backend.set_backend(backend.Ascend910B()); update it to call
backend.set_backend_type(...) using the BackendType enum: import BackendType
from pypto.backend and call backend.set_backend_type(BackendType.CCE) before
running passes.insert_sync()(program) so the example matches the real API
(replace references to backend.set_backend and Ascend910B with
backend.set_backend_type and BackendType.CCE).
---
Nitpick comments:
In `@tests/ut/ir/transforms/test_insert_sync.py`:
- Around line 16-45: Add a symmetric helper make_bar_m() that mirrors
make_bar_v() but creates an op call to "system.bar_m" (returning
ir.EvalStmt(ir.create_op_call("system.bar_m", [], {}, _span), _span)), and
consider renaming or adding underscore-prefixed variants (e.g., _make_sync_src,
_make_sync_dst, _make_bar_v/_make_bar_m) so helpers are treated as private;
update any test usages to call make_bar_m or the new private helpers as
appropriate.
- Around line 104-108: Tests repeat backend.reset_for_testing() and
backend.set_backend_type(BackendType.CCE) before invoking
passes.insert_sync()(Before); create a small helper (e.g.,
run_insert_sync(program)) or an autouse pytest fixture that performs
backend.reset_for_testing(), backend.set_backend_type(BackendType.CCE) and then
returns passes.insert_sync()(program), and replace the repeated setup lines in
tests (locations around the calls to passes.insert_sync()(Before) and similar at
lines ~180-184, 275-279, 381-385) with a single call to that helper/fixture to
ensure consistent backend initialization.
- Around line 48-139: Add a regression test to ensure event-id reuse across
disjoint regions emits both sync pairs: create a new test (e.g.,
test_insert_sync_reuse_event_id) in tests/ut/ir/transforms/test_insert_sync.py
modeled after test_insert_sync_cross_pipe that constructs two separate MTE2 -> V
producer/consumer regions in the same scope (two independent load/add/store
sequences) so the insert_sync pass may reuse event id (event=0) for both; run
passes.insert_sync() on the program and assert the resulting IR contains both
make_sync_src(MTE2, V, 0)/make_sync_dst(MTE2, V, 0) pairs for each region (use
ir.assert_structural_equal with enable_auto_mapping=True), ensuring you
reference the same vars/tile names as the Before IR so auto-mapping handles
matching.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/dev/passes/06-insert_sync.mdsrc/ir/transforms/insert_sync_pass.cpptests/ut/ir/transforms/test_insert_sync.py
3346c77 to
c4fb7c2
Compare
Rewrite insert_sync_pass to handle OpStmts and scope crossings (IfStmt/ForStmt) correctly. The new algorithm uses a position-based IR addressing scheme (SeqIndex + OpIndex path) and processes sync insertion in four phases: 1. Collect: walk IR tree, build producer-consumer dependency pairs 2. AdjustScopeCrossings: relocate sync points that cross IfStmt/ForStmt boundaries, including cross-iteration dependencies via loop unrolling 3. AssignEventIds: allocate and reuse event IDs across sync pairs 4. BuildAST: flatten-insert-regroup strategy for final IR construction Also rewrites tests to use before/after structural comparison pattern and updates pass documentation to reflect the new algorithm.
c4fb7c2 to
828c2f6
Compare
Rewrite insert_sync_pass to handle OpStmts and scope crossings (IfStmt/ForStmt) correctly. The new algorithm uses a position-based IR addressing scheme (SeqIndex + OpIndex path) and processes sync insertion in four phases:
Also rewrites tests to use before/after structural comparison pattern and updates pass documentation to reflect the new algorithm.