Skip to content

Comments

refactor(pass): rewrite insert_sync_pass#250

Merged
Hzfengsy merged 1 commit intohw-native-sys:mainfrom
zhangqi-chen:insertsync
Feb 24, 2026
Merged

refactor(pass): rewrite insert_sync_pass#250
Hzfengsy merged 1 commit intohw-native-sys:mainfrom
zhangqi-chen:insertsync

Conversation

@zhangqi-chen
Copy link
Contributor

Rewrite insert_sync_pass to handle OpStmts and scope crossings (IfStmt/ForStmt) correctly. The new algorithm uses a position-based IR addressing scheme (SeqIndex + OpIndex path) and processes sync insertion in four phases:

  1. Collect: walk IR tree, build producer-consumer dependency pairs
  2. AdjustScopeCrossings: relocate sync points that cross IfStmt/ForStmt boundaries, including cross-iteration dependencies via loop unrolling
  3. AssignEventIds: allocate and reuse event IDs across sync pairs
  4. BuildAST: flatten-insert-regroup strategy for final IR construction

Also rewrites tests to use before/after structural comparison pattern and updates pass documentation to reflect the new algorithm.

@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR redesigns the insert_sync pass from a single-pass algorithm to a four-phase approach: dependency collection, scope adjustment, event ID allocation, and AST construction. It introduces new path-based position tracking, refactors event ID management with scope awareness, expands synchronization patterns across multiple pipeline types, and rewrites tests to use structural IR comparison instead of assertion counting.

Changes

Cohort / File(s) Summary
Documentation
docs/dev/passes/06-insert_sync.md
Comprehensive algorithm redesign documentation: single-pass replaced with 4-phase approach (Dependency Collection, Scope Adjustment, Event ID Allocation, AST Construction), explicit synchronization patterns for scope/iteration crossings, expanded examples across MTE2, V, M, CUBE pipelines, updated backend integration notes.
Core Implementation
src/ir/transforms/insert_sync_pass.cpp
Major refactor introducing PathElement/Position structures for IR location tracking, MemRefSummary for dependency analysis, EventIdManager API changes (Allocate→Alloc with Position param, Release→Free with wait_position), new InsertionPlan/AnalysisContext structures for synchronization planning, and replacement of monolithic mutation with phased approach.
Test Suite
tests/ut/ir/transforms/test_insert_sync.py
Complete test rewrite: replaces assertion-based verification with structural IR equality checks; adds builder helpers (make_sync_src, make_sync_dst, make_bar_v) and pipe-type aliases (MTE1, MTE2, MTE3, M, V); expands test coverage for cross-pipe, intra-pipe, CUBE, conditionals, and loop patterns; removes count_syncs utility.

Sequence Diagram(s)

sequenceDiagram
    participant User as InsertSyncPass
    participant Phase1 as Dependency<br/>Collector
    participant Phase2 as Scope<br/>Adjuster
    participant Phase3 as Event ID<br/>Manager
    participant Phase4 as AST<br/>Constructor

    User->>Phase1: collect_sync_pairs()
    activate Phase1
    Phase1->>Phase1: Walk IR tree
    Phase1->>Phase1: Track RAW/WAR/WAW deps
    Phase1->>Phase1: Handle loop unrolling
    Phase1-->>User: SyncPair records
    deactivate Phase1

    User->>Phase2: adjust_scope_crossings()
    activate Phase2
    Phase2->>Phase2: Check IfStmt/ForStmt bounds
    Phase2->>Phase2: Relocate sync_src/sync_dst
    Phase2->>Phase2: Handle cross-iteration moves
    Phase2-->>User: Adjusted SyncPair positions
    deactivate Phase2

    User->>Phase3: alloc_event_ids()
    activate Phase3
    Phase3->>Phase3: Scope-aware allocation
    Phase3->>Phase3: Track set/wait positions
    Phase3->>Phase3: Reuse where possible
    Phase3-->>User: Populated event IDs
    deactivate Phase3

    User->>Phase4: build_insertion_plan()
    activate Phase4
    Phase4->>Phase4: Create InsertionPlan
    Phase4->>Phase4: Map sync calls & barriers
    Phase4-->>User: InsertionPlan
    deactivate Phase4

    User->>Phase4: mutate_ast()
    activate Phase4
    Phase4->>Phase4: Insert sync_src/sync_dst
    Phase4->>Phase4: Insert bar_v/bar_m
    Phase4-->>User: Modified AST
    deactivate Phase4
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 Hop along the four-phase way,
Dependencies collected, scopes at bay,
Event IDs dance with spatial grace,
While barriers guard each AST place!
✨🔄

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor(pass): rewrite insert_sync_pass' directly and clearly summarizes the main change—a complete rewrite of the insert_sync_pass component.
Description check ✅ Passed The description is directly related to the changeset, detailing the four-phase algorithm redesign, position-based addressing scheme, and documentation/test updates that match the file changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @zhangqi-chen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive refactor of the insert_sync_pass to improve its accuracy and robustness in handling complex control flow structures and inter-pipeline dependencies. The rewrite provides a more structured and efficient approach to inserting synchronization primitives, ensuring correct program execution across different hardware pipelines, particularly for operations within conditional statements and loops.

Highlights

  • Algorithm Rewrite: The insert_sync_pass algorithm has been completely rewritten into a new 4-phase pipeline: Dependency Collection, Scope Adjustment, Event ID Allocation, and AST Construction. This replaces the previous 7-step process.
  • Enhanced Scope Handling: The pass now correctly handles synchronization across control flow scopes, including IfStmt and ForStmt boundaries, and detects cross-iteration dependencies via loop unrolling.
  • Position-Based IR Addressing: A new position-based IR addressing scheme (SeqIndex + OpIndex path) has been introduced for precise tracking and insertion of synchronization points.
  • Optimized Sync Insertion: The new algorithm includes logic for deduplicating sync pairs, removing transitive and linear redundant pairs, and merging sync operations into adjacent OpStmts.
  • Updated Test Suite: The test suite has been rewritten to use before/after structural comparison patterns and expanded to cover new scenarios like CUBE pipeline, various IfStmt scope crossings, ForStmt scope crossings, and cross-iteration dependencies.
Changelog
  • docs/dev/passes/06-insert_sync.md
    • Updated algorithm description to reflect the new 4-phase approach.
    • Revised synchronization patterns to include new scope-aware behaviors and cross-iteration handling.
    • Replaced existing examples with new, more illustrative scenarios for cross-pipeline, intra-pipeline, and CUBE pipeline dependencies.
  • src/ir/transforms/insert_sync_pass.cpp
    • Rewrote the SyncInserter class to implement the new 4-phase algorithm.
    • Introduced PathElement and Position structs for precise IR addressing.
    • Refactored EventIdManager to use position-aware allocation and freeing of event IDs.
    • Implemented logic for deduplication and redundancy removal of sync pairs.
    • Modified AST construction to flatten, insert, and regroup statements, merging sync operations into OpStmts.
  • tests/ut/ir/transforms/test_insert_sync.py
    • Removed count_syncs utility and adopted ir.assert_structural_equal for precise IR comparison.
    • Added helper functions make_sync_src, make_sync_dst, and make_bar_v for building expected IR structures.
    • Expanded test cases to cover new scenarios including CUBE pipeline, various IfStmt and ForStmt scope crossings, and cross-iteration dependencies.
Activity
  • Initial commit for the refactored insert_sync_pass.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is an impressive and thorough refactoring of the insert_sync_pass. The new position-based algorithm is a significant improvement, providing a much more robust way to handle complex control flow like nested IfStmt and ForStmt. The four-phase approach (Collect, Adjust, Assign, Build) is well-structured and makes the logic easier to follow. The updates to the documentation and tests are also excellent; the new tests using structural comparison are far more reliable than the old counting-based tests.

I have found one critical issue in the implementation of Position::IsBefore, which could lead to incorrect sync placement. Please see the detailed comment. Once that is addressed, this will be a fantastic improvement to the codebase.

@zhangqi-chen zhangqi-chen force-pushed the insertsync branch 2 times, most recently from 634c4d8 to 3346c77 Compare February 24, 2026 06:53
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/dev/passes/06-insert_sync.md (1)

181-187: ⚠️ Potential issue | 🟠 Major

Update doc example to use the correct backend configuration API.

The doc uses backend.set_backend(backend.Ascend910B()), but the actual API is backend.set_backend_type() with a BackendType enum. Replace with:

from pypto import backend
from pypto.backend import BackendType

# Set backend before running InsertSync
backend.set_backend_type(BackendType.CCE)
program_with_sync = passes.insert_sync()(program)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/dev/passes/06-insert_sync.md` around lines 181 - 187, The docs example
uses the removed API backend.set_backend(backend.Ascend910B()); update it to
call backend.set_backend_type(...) using the BackendType enum: import
BackendType from pypto.backend and call
backend.set_backend_type(BackendType.CCE) before running
passes.insert_sync()(program) so the example matches the real API (replace
references to backend.set_backend and Ascend910B with backend.set_backend_type
and BackendType.CCE).
🧹 Nitpick comments (3)
tests/ut/ir/transforms/test_insert_sync.py (3)

16-45: Consider adding make_bar_m() (and optionally “private” helpers) for future coverage.
You already insert bar_m in the pass; having a symmetric helper now avoids copy/paste when tests expand to M-pipe intra-barriers. Also, prefixing helpers with _ can reduce accidental imports/reuse from other test modules.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ut/ir/transforms/test_insert_sync.py` around lines 16 - 45, Add a
symmetric helper make_bar_m() that mirrors make_bar_v() but creates an op call
to "system.bar_m" (returning ir.EvalStmt(ir.create_op_call("system.bar_m", [],
{}, _span), _span)), and consider renaming or adding underscore-prefixed
variants (e.g., _make_sync_src, _make_sync_dst, _make_bar_v/_make_bar_m) so
helpers are treated as private; update any test usages to call make_bar_m or the
new private helpers as appropriate.

104-108: Deduplicate repeated backend setup via a helper/fixture.
The repeated backend.reset_for_testing() + backend.set_backend_type(BackendType.CCE) pattern across tests is easy to drift over time (e.g., if a new required backend init step is added). A small run_insert_sync(program) helper or an autouse pytest fixture would keep this consistent.

Also applies to: 180-184, 275-279, 381-385

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ut/ir/transforms/test_insert_sync.py` around lines 104 - 108, Tests
repeat backend.reset_for_testing() and backend.set_backend_type(BackendType.CCE)
before invoking passes.insert_sync()(Before); create a small helper (e.g.,
run_insert_sync(program)) or an autouse pytest fixture that performs
backend.reset_for_testing(), backend.set_backend_type(BackendType.CCE) and then
returns passes.insert_sync()(program), and replace the repeated setup lines in
tests (locations around the calls to passes.insert_sync()(Before) and similar at
lines ~180-184, 275-279, 381-385) with a single call to that helper/fixture to
ensure consistent backend initialization.

48-139: Add a regression test for event-id reuse within the same pipe-pair across multiple regions.
Right now tests cover “multiple event IDs needed” (CUBE) but not “same event ID reused later”. A minimal case like two disjoint MTE2 -> V producer/consumer regions in the same scope should verify that both sync_src/sync_dst pairs are emitted even if the event id is reused (often event=0 twice). This will catch insertion-plan de-dup mistakes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ut/ir/transforms/test_insert_sync.py` around lines 48 - 139, Add a
regression test to ensure event-id reuse across disjoint regions emits both sync
pairs: create a new test (e.g., test_insert_sync_reuse_event_id) in
tests/ut/ir/transforms/test_insert_sync.py modeled after
test_insert_sync_cross_pipe that constructs two separate MTE2 -> V
producer/consumer regions in the same scope (two independent load/add/store
sequences) so the insert_sync pass may reuse event id (event=0) for both; run
passes.insert_sync() on the program and assert the resulting IR contains both
make_sync_src(MTE2, V, 0)/make_sync_dst(MTE2, V, 0) pairs for each region (use
ir.assert_structural_equal with enable_auto_mapping=True), ensuring you
reference the same vars/tile names as the Before IR so auto-mapping handles
matching.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/dev/passes/06-insert_sync.md`:
- Around line 41-57: Update the wording to reflect implementation: replace
“Global barrier” with “Intra-pipeline barrier” and state that bar_v / bar_m are
intra-pipe ordering barriers (not whole-program/global fences); also soften the
OpStmts merging claim from “No standalone sync OpStmts” to something like “Sync
operations are merged into adjacent OpStmts when possible; in some control-flow
edge cases (insertions between non-OpStmts like IfStmt/ForStmt boundaries) the
pass may still emit OpStmts that contain only sync/bar calls.” Keep references
to symbols bar_v, bar_m, and OpStmts so readers can locate the behavior in the
implementation.

In `@src/ir/transforms/insert_sync_pass.cpp`:
- Around line 327-363: ProcessLeafStmt currently does O(N*M) scans using
IsSameMem across state.last_writers/last_readers; change it to use direct map
lookups by MemRefPtr (assume stable identity) instead of iterating and
IsSameMem: in ProcessLeafStmt, replace the sync_against_writers loop and the WAR
loop to lookup state.last_writers.find(mr) and state.last_readers.find(w)
respectively (use MemRefPtr as the exact key), and adjust updates to
state.last_writers and state.last_readers to maintain only the minimal reader
set (e.g., latest reader per pipe or a bounded small collection) rather than
appending all readers; touch functions/symbols ProcessLeafStmt,
state.last_writers, state.last_readers, GetLeafMemRefs, CreateSyncPair and
remove IsSameMem-based scans to avoid quadratic pair creation.
- Around line 507-540: RemoveTransitiveRedundantPairs currently does an O(n³)
scan over ctx_.sync_pairs to find transitive A→B and B→C patterns and mark
SyncPair entries redundant; replace the triple nested search by building two
hash maps (or unordered_multimap) keyed by set_position and by wait_position
that index indices (or pointers) into ctx_.ync_pairs so you can, for each
pair_ac in RemoveTransitiveRedundantPairs, iterate candidates pair_ab =
lookup_by_set_position[pair_ac.set_position] and then check pair_bc existence
via lookup_by_set_position or lookup_by_wait_position (as appropriate) to test
pair_ab.wait_position == pair_bc.set_position and pair_bc.wait_position ==
pair_ac.wait_position; update is_redundant[i] when a match is found and finally
filter ctx_.sync_pairs as before. Ensure the maps are built once at the start of
RemoveTransitiveRedundantPairs and update lookups using SyncPair.set_position
and SyncPair.wait_position to reduce complexity to roughly O(n²) or better.
- Around line 704-753: BuildInsertionPlans() currently de-duplicates syncs in
scope_sync_src/scope_sync_dst using only (producer_pipe, consumer_pipe,
event_id) which drops required syncs when the same event_id is reused at
different positions; change the dedup key to include the insertion position (use
set_key/wait_key or the Position/PosKey) so uniqueness is (position,
producer_pipe, consumer_pipe, event_id). Concretely, update scope_sync_src and
scope_sync_dst to map/set using a key that includes the PosKey (or pair of
PosKey and the existing tuple) and use set_key/wait_key when inserting/checking
before pushing CreateSyncCall in BuildInsertionPlans().
- Around line 123-129: GetPipeForCall currently dereferences
pypto::backend::GetBackend() without checking for null, which can crash; update
GetPipeForCall to call pypto::backend::GetBackend(), check the returned pointer
for nullptr, and if null fail fast with a clear diagnostic (e.g., throw a
descriptive runtime_error or use the project's CHECK/LOG/error-reporting
utility) before calling backend->GetOpInfo; keep the existing behavior when
backend is non-null by then querying GetOpInfo(call->op_->name_) and returning
info->pipe or PipeType::S as before.

---

Outside diff comments:
In `@docs/dev/passes/06-insert_sync.md`:
- Around line 181-187: The docs example uses the removed API
backend.set_backend(backend.Ascend910B()); update it to call
backend.set_backend_type(...) using the BackendType enum: import BackendType
from pypto.backend and call backend.set_backend_type(BackendType.CCE) before
running passes.insert_sync()(program) so the example matches the real API
(replace references to backend.set_backend and Ascend910B with
backend.set_backend_type and BackendType.CCE).

---

Nitpick comments:
In `@tests/ut/ir/transforms/test_insert_sync.py`:
- Around line 16-45: Add a symmetric helper make_bar_m() that mirrors
make_bar_v() but creates an op call to "system.bar_m" (returning
ir.EvalStmt(ir.create_op_call("system.bar_m", [], {}, _span), _span)), and
consider renaming or adding underscore-prefixed variants (e.g., _make_sync_src,
_make_sync_dst, _make_bar_v/_make_bar_m) so helpers are treated as private;
update any test usages to call make_bar_m or the new private helpers as
appropriate.
- Around line 104-108: Tests repeat backend.reset_for_testing() and
backend.set_backend_type(BackendType.CCE) before invoking
passes.insert_sync()(Before); create a small helper (e.g.,
run_insert_sync(program)) or an autouse pytest fixture that performs
backend.reset_for_testing(), backend.set_backend_type(BackendType.CCE) and then
returns passes.insert_sync()(program), and replace the repeated setup lines in
tests (locations around the calls to passes.insert_sync()(Before) and similar at
lines ~180-184, 275-279, 381-385) with a single call to that helper/fixture to
ensure consistent backend initialization.
- Around line 48-139: Add a regression test to ensure event-id reuse across
disjoint regions emits both sync pairs: create a new test (e.g.,
test_insert_sync_reuse_event_id) in tests/ut/ir/transforms/test_insert_sync.py
modeled after test_insert_sync_cross_pipe that constructs two separate MTE2 -> V
producer/consumer regions in the same scope (two independent load/add/store
sequences) so the insert_sync pass may reuse event id (event=0) for both; run
passes.insert_sync() on the program and assert the resulting IR contains both
make_sync_src(MTE2, V, 0)/make_sync_dst(MTE2, V, 0) pairs for each region (use
ir.assert_structural_equal with enable_auto_mapping=True), ensuring you
reference the same vars/tile names as the Before IR so auto-mapping handles
matching.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7201a97 and a2e17f0.

📒 Files selected for processing (3)
  • docs/dev/passes/06-insert_sync.md
  • src/ir/transforms/insert_sync_pass.cpp
  • tests/ut/ir/transforms/test_insert_sync.py

@zhangqi-chen
Copy link
Contributor Author

@Hzfengsy

Rewrite insert_sync_pass to handle OpStmts and scope crossings
(IfStmt/ForStmt) correctly. The new algorithm uses a position-based
IR addressing scheme (SeqIndex + OpIndex path) and processes sync
insertion in four phases:

1. Collect: walk IR tree, build producer-consumer dependency pairs
2. AdjustScopeCrossings: relocate sync points that cross IfStmt/ForStmt
   boundaries, including cross-iteration dependencies via loop unrolling
3. AssignEventIds: allocate and reuse event IDs across sync pairs
4. BuildAST: flatten-insert-regroup strategy for final IR construction

Also rewrites tests to use before/after structural comparison pattern
and updates pass documentation to reflect the new algorithm.
@Hzfengsy Hzfengsy merged commit 81383e1 into hw-native-sys:main Feb 24, 2026
5 checks passed
@zhangqi-chen zhangqi-chen deleted the insertsync branch February 24, 2026 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants