Skip to content

[InsertSync] scalar autosync + addptr alias-through (scalar-only scope)#206

Open
TaoTao-real wants to merge 4 commits intozhangstevenunity:mainfrom
TaoTao-real:codex/load-store-scalar-pipe-s-sync
Open

[InsertSync] scalar autosync + addptr alias-through (scalar-only scope)#206
TaoTao-real wants to merge 4 commits intozhangstevenunity:mainfrom
TaoTao-real:codex/load-store-scalar-pipe-s-sync

Conversation

@TaoTao-real
Copy link
Contributor

@TaoTao-real TaoTao-real commented Mar 6, 2026

Summary

  • make pto.load_scalar / pto.store_scalar participate in InsertSync as PIPE_S ops
  • add pto.addptr alias-through in InsertSync translator, including constant-offset byte folding and conservative handling for dynamic offsets
  • keep scalar sync behavior aligned with AscendNPU-IR semantics:
    • skip PIPE_S -> PIPE_S local sync insertion
    • keep scalar cross-pipe dependencies as normal set_flag/wait_flag sync (no forced PIPE_ALL fallback)
  • add scalar sync regressions and runop structural checks:
    • test_inject_sync_scalar_cross_pipe.py
    • test_inject_sync_scalar_intra_pipe_barrier.py

Scope

This PR is intentionally scalar-sync only.
Layout-inference changes (e.g. rowmin/rowsum/rowmax related fixes) are explicitly excluded from this PR.

Motivation

InsertSync previously could miss synchronization around scalar load/store paths and addptr alias chains, which risks under-synchronization in scalar-memory-involved dependencies.

Design

  • LoadScalarOp / StoreScalarOp
    • add OpPipeInterface
    • return PIPE_S in getPipe()
  • InsertSync analysis
    • same-pipe PIPE_S is skipped (in-order scalar pipeline)
    • scalar cross-pipe dependencies are synchronized via set_flag/wait_flag
  • alias propagation
    • add pto.addptr to alias chain updates
    • support constant addptr offsets in byte-range analysis
    • dynamic addptr offsets use conservative alias modeling (prefer over-sync to under-sync)

Testing

Local regression:

source /Users/lishengtao/Workspace/PTO/env.sh
export PTO_PYTHON_ROOT=/Users/lishengtao/Documents/PTO/PTOAS_upstream_main_main/build-main/python
pto_export_runtime
PTOAS_BIN=/Users/lishengtao/Documents/PTO/PTOAS_pr206_scalar_only/build-main/tools/ptoas/ptoas \
  bash test/samples/runop.sh -t Sync

Result: pass (A5-only sample skipped as expected).

A3 remote board validation (latest, commit 6c8c44e):

  • Host: 101.245.68.6 (DEVICE_ID=10, SOC_VERSION=Ascend910)
  • Flow: CI-equivalent test/npu_validation/scripts/run_remote_npu_validation.sh
  • Result: OK=19, FAIL=0, SKIP=1
  • Key scalar cases:
    • test_inject_sync_scalar_cross_pipe: OK
    • test_inject_sync_scalar_intra_pipe_barrier: OK

Risk / Rollback

  • Risk: scalar-related sync coverage expands compared with previous scalar-untracked behavior
  • Mitigation: same-pipe PIPE_S remains skipped; cross-pipe behavior follows existing set/wait model and has local + remote regression coverage
  • Rollback: revert this PR (changes are localized to InsertSync + Sync tests)

@TaoTao-real
Copy link
Contributor Author

Added commit 269dedb for the long-term row-reduction layout fix and validated on A3 board before PR submission.

A3 remote validation (CI script flow):

  • Script: test/npu_validation/scripts/run_remote_npu_validation.sh
  • Env: STAGE=run, RUN_MODE=npu, SOC_VERSION=Ascend910, DEVICE_ID=2
  • Run only: rowmin,rowsum,rowmax

Result:

  • rowmin: OK
  • rowsum: OK
  • rowmax: OK
  • Summary: OK=3 FAIL=0

Extra note: when additionally running tensor_view_layout_dn, it fails with a pto-isa static assert (DN GlobalTensor with RowMajor MatTile) on A3; this case is orthogonal to the row-reduction fix above.

@TaoTao-real TaoTao-real changed the title [InsertSync] support scalar PIPE_S autosync and addptr alias-through [WIP] [InsertSync] support scalar PIPE_S autosync and addptr alias-through Mar 6, 2026
@TaoTao-real TaoTao-real force-pushed the codex/load-store-scalar-pipe-s-sync branch from 269dedb to 40cee3b Compare March 6, 2026 08:30
@TaoTao-real TaoTao-real changed the title [WIP] [InsertSync] support scalar PIPE_S autosync and addptr alias-through [InsertSync] scalar autosync + addptr alias-through (scalar-only scope) Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant