Skip to content

feat(vops): restore VF/VOPS vtile implementation (from session logs)#201

Open
zhoubot wants to merge 1 commit intozhangstevenunity:mainfrom
zhoubot:restore/vops-vtile-from-jsonl-20260305
Open

feat(vops): restore VF/VOPS vtile implementation (from session logs)#201
zhoubot wants to merge 1 commit intozhangstevenunity:mainfrom
zhoubot:restore/vops-vtile-from-jsonl-20260305

Conversation

@zhoubot
Copy link
Contributor

@zhoubot zhoubot commented Mar 5, 2026

This PR restores the VF/VOPS (vector tile) implementation that was previously discussed/implemented but not landed, based on the OpenClaw session log:

  • /home/zhouruoyu/.openclaw/agents/main/sessions/9e67b5d2-2667-4ef7-a9a9-45376c4cb63f.jsonl

What

  • Add VOPS types:
    • !pto.preg
    • !pto.uscalar<elem>
    • !pto.vtile<lanes x elem>
      with stable custom assembly formats.
  • Add target config attr:
    • #pto.target_config<arch=..., isa=..., variant=..., repeat_bytes=..., block_bytes=..., caps={...}>
  • Add core VOPS ops:
    • pto.vf.scope
    • predicate ops: pto.vpred.all, pto.vpred.tail
    • loads/stores: pto.vload/vstore and pto.vload_tail/vstore_tail
    • helpers: pto.vdup, pto.uload_row, pto.vload_block, pto.vlane_adapt
    • common binops: vadd/vsub/vmul/vmin/vmax/vand/vor/vxor
  • Add -pto-canonicalize-vops pass:
    • tail rewrite (pred.tail -> tail ops)
    • pred propagation for binops
    • count == lanes simplification
    • conservative loop-invariant hoisting
  • Add a design note doc: docs/ir/PTO-IR-vf-vops-design.md

Files

  • include/PTO/IR/PTOTypeDefs.td
  • include/PTO/IR/PTOAttrs.td
  • include/PTO/IR/PTOOps.td
  • include/PTO/Transforms/Passes.td
  • lib/PTO/IR/PTO.cpp
  • lib/PTO/Transforms/PTOCanonicalizeVops.cpp
  • lib/PTO/Transforms/CMakeLists.txt
  • docs/ir/PTO-IR-vf-vops-design.md

Notes

This is a restored baseline; follow-up PRs can tighten verifiers (target_config requirement/capability gating) and add FileCheck coverage.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8381b3d708

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

//===----------------------------------------------------------------------===//

// pto.vf.scope { ... }
def VFScopeOp : PTO_Op<"vf.scope", [IsolatedFromAbove, NoRegionArguments, SingleBlockImplicitTerminator<"YieldOp">]> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow vf.scope to capture surrounding SSA values

Removing data flow in and out of pto.vf.scope makes the op effectively unusable for real kernels: with IsolatedFromAbove, no region arguments, and no yielded values, the body cannot read function arguments or previously computed tiles/pointers, so typical vload/vstore sequences cannot operate on surrounding state. This is a behavioral regression for any lowering that expects vf.scope to wrap vectorized code over existing operands.

Useful? React with 👍 / 👎.

Comment on lines +23 to +25
for (auto &op : *rewriter.getInsertionBlock()) {
if (auto all = dyn_cast<mlir::pto::VPredAllOp>(&op))
return all.getPred();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reuse only dominating vpred.all in rewrite helper

getOrCreatePredAll scans the whole block and returns the first pto.vpred.all even if it is located after the rewrite insertion point; when that happens, VLoadTailToVLoad/VStoreTailToVStore build new ops that use a value defined later in the block, creating an SSA dominance violation and causing verifier/pass failure on blocks where vpred.all is emitted below the rewritten op.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant