Skip to content

feat(codegen): Add PTO codegen for memory, reduction, layout and utility ops#263

Merged
Hzfengsy merged 2 commits intohw-native-sys:mainfrom
lwDavid:asm
Feb 26, 2026
Merged

feat(codegen): Add PTO codegen for memory, reduction, layout and utility ops#263
Hzfengsy merged 2 commits intohw-native-sys:mainfrom
lwDavid:asm

Conversation

@lwDavid
Copy link
Contributor

@lwDavid lwDavid commented Feb 25, 2026

feat(backend): Add PTO codegen for memory, reduction, layout, and utility ops

Register codegen handlers for previously unimplemented block-level operations
on the 910B PTO backend, including:

  • Memory: store_fp, mgather, mscatter, getval, setval
  • Tile arithmetic: lrelu, cmps, addsc, subsc, selc
  • Axis reduction/expansion: row/col sum/max/min/expand variants
  • Data movement: move, move_fp, extract, reshape, assign
  • Complex: ci, gather, gatherb, scatter, sort32, mrgsort, partadd/max/min
  • Utility: print

Also improve CHECK error messages across all arity helpers to include the
actual argument count, and fix an incorrect arity message in MakeFullCodegenPTO
(was "3 arguments", should be "2 arguments").

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @lwDavid, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces significant enhancements to the 910B PTO backend by implementing codegen handlers for previously unsupported block-level operations. These additions cover a wide range of functionalities, including memory operations, tile arithmetic, axis manipulation, data movement, and complex number operations. Furthermore, the PR improves the robustness and user-friendliness of the codebase by refining error messages and fixing an existing bug.

Highlights

  • PTO Codegen Support: Added PTO codegen handlers for various block-level operations on the 910B PTO backend, including memory, tile arithmetic, axis reduction/expansion, data movement, and complex operations.
  • Improved Error Messages: Enhanced CHECK error messages across all arity helpers to include the actual argument count, providing more informative error diagnostics.
  • Bug Fix: Fixed an incorrect arity message in MakeFullCodegenPTO, correcting the expected number of arguments.
Changelog
  • src/backend/910B_PTO/backend_910b_pto_ops.cpp
    • Implemented codegen handlers for memory operations (store_fp, mgather, mscatter, getval, setval).
    • Implemented codegen handlers for tile arithmetic operations (lrelu, cmps, addsc, subsc, selc).
    • Implemented codegen handlers for axis reduction/expansion operations (row/col sum/max/min/expand variants).
    • Implemented codegen handlers for data movement operations (move, move_fp, extract, reshape, assign).
    • Implemented codegen handlers for complex operations (ci, gather, gatherb, scatter, sort32, mrgsort, partadd/max/min).
    • Implemented codegen handlers for utility operations (print).
    • Improved CHECK error messages across all arity helpers to include the actual argument count.
    • Fixed an incorrect arity message in MakeFullCodegenPTO.
Activity
  • lwDavid added PTO codegen for memory, reduction, layout and utility ops
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

Reviews paused

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Expanded Backend910B_PTO with many new PTO codegen helpers and ~50+ new/updated block.* operation registrations; enhanced argument-counted error messages and multi-line emission patterns for several load/store and tile operations.

Changes

Cohort / File(s) Summary
Core implementation
src/backend/910B_PTO/backend_910b_pto_ops.cpp
Large additions: new PTO codegen helpers (StoreFP, Cmps, Assign, Ci, Sort32, MrgSort, Print), improved argument-count error messages, and expanded emission logic (including multi-line block.load/store with subviews).
Backend registrations
src/backend/910B_PTO/...backend_910b_pto_ops.cpp
Added/updated ~50+ Backend910B_PTO registration entries mapping block.* ops to PTO opcodes (getval/setval, load/store/alloc, move/move_fp, transpose/extract/reshape/assign/ci, gather/scatter/mgather/mscatter, sort32/mrgsort, print, part*/row/col reductions/expansions, cmps/addsc/subsc/selc, etc.).
Tile / Tiled variants & data-layout ops
src/backend/910B_PTO/...backend_910b_pto_ops.cpp
Introduced tiled PTO op mappings and handlers (tgetval/tsetval, tload/tstore, treshape/textract/tassign/tci/tgather/tscatter/tmrgsort, tcmps/taddsc/tsels/tprint and many t* elementwise ops).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • Hzfengsy

Poem

🐰 I hopped through headers, helpers, and ops,
I stitched new PTOs in tidy crops,
Loads, sorts, prints — a hopping spree,
Tiles and blocks now sing with me! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and accurately describes the main changes: adding PTO codegen support for memory, reduction, layout, and utility operations.
Description check ✅ Passed The description comprehensively details the implemented operation groups and improvements, clearly relating to the changeset of new PTO codegen handlers and error message enhancements.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly expands the capabilities of the 910B PTO backend by adding codegen support for a wide range of memory, arithmetic, reduction, and other block-level operations. The improvements to the arity check error messages are also a welcome addition for enhancing debuggability. My review identified a critical, recurring issue with missing spaces in the generated MLIR for several new helper functions, which would cause syntax errors. I also found some unused variables in placeholder functions that should be cleaned up to avoid compiler warnings. All identified issues remain relevant and are included in the comments.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp`:
- Line 125: The helpers currently emit malformed MLIR by concatenating
pto_op_name directly with "ins(...)" (e.g., codegen.Emit(pto_op_name + "ins(" +
src + ", " + fp + ") outs(" + mem + ")")); update those emits to include a
separating space and reuse the existing pattern: replace the direct concat with
pto_op_name + " " + GenerateInsOutsClause(src, fp, mem) (or at minimum
pto_op_name + " " + "ins(...)" if you must keep the inline format); apply the
same change to all four sites that use pto_op_name + "ins(...)" (also where
similar "tassignins", "tciins", "tprintins" are emitted).
- Around line 228-235: MakePrintCodegenPTO currently emits a hardcoded
placeholder type string "!pto.partition_tensor_view<MxNxdtype>" which is invalid
MLIR; instead, inspect the operand type (op->args_[0]) like
MakeBlockLoadCodegenPTO does: cast to TensorType (or the appropriate MLIR tensor
wrapper), extract the tile height, width and element dtype strings, build the
concrete partition_tensor_view type string (e.g. "<height>x<width>x<dtype>" or
the same format used by MakeBlockLoadCodegenPTO) and pass that into codegen.Emit
in place of the literal MxNxdtype; update the code in MakePrintCodegenPTO and
any helper calls to use those extracted values so the emitted MLIR contains the
real dimensions and dtype.
- Around line 203-211: In MakeSort32CodegenPTO remove the unused local variables
src and dst and emit the PTO op with its operand and result target instead of a
bare name: use codegen.GetExprAsCode(op->args_[0]) and
codegen.GetCurrentResultTarget() directly when calling codegen.Emit (or pass
them into the existing Emit overload that accepts operands/results) so the
emitted MLIR includes the input and result; keep the CHECK on arg count and do
not leave an empty return.
- Around line 215-225: The function MakeMrgSortCodegenPTO currently computes
src, blockLen, and dst but never uses them and returns an empty string; update
it to mirror MakeSort32CodegenPTO by emitting a PTO invocation that includes
those values and returning the destination target. Specifically, use
codegen.Emit to output a formatted emission that references dst, src, and
blockLen (constructed from codegen.GetCurrentResultTarget() and
codegen.GetExprAsCode(op->args_[0/1])) so the variables are used, and then
return dst instead of an empty string.
- Around line 678-724: The PTO backend registrations for column operations
mismatch the IR arity: change the codegen calls for block.col_expand from
MakeUnaryCodegenPTO to MakeBinaryCodegenPTO to match its IR signature (arguments
"target" and "col_tile"), and verify whether block.col_sum, block.col_max, and
block.col_min should be two-argument ops like their row counterparts—if so, add
corresponding two-argument IR definitions or switch their backend calls from
MakeUnaryCodegenPTO to MakeBinaryCodegenPTO to ensure arity matches (look for
the REGISTER_BACKEND_OP entries and the MakeUnaryCodegenPTO/MakeBinaryCodegenPTO
usages to update).

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a64644e and b8a1d09.

📒 Files selected for processing (1)
  • src/backend/910B_PTO/backend_910b_pto_ops.cpp

Comment on lines 228 to 235
static std::string MakePrintCodegenPTO(const std::string& pto_op_name, const CallPtr& op,
codegen::CodegenBase& codegen_base) {
auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base);
CHECK(op->args_.size() == 1) << "Operation:[pto.tprint] requires 1 argument, but got " << op->args_.size();
std::string src = codegen.GetExprAsCode(op->args_[0]);
codegen.Emit(pto_op_name + "ins(" + src + " | !pto.partition_tensor_view<MxNxdtype>)");
return "";
}
Copy link

@coderabbitai coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded !pto.partition_tensor_view<MxNxdtype> is not valid MLIR.

The type annotation uses the literal string MxNxdtype as a placeholder instead of the actual tile dimensions and element type. The emitted code will be rejected by the MLIR parser at runtime. The dimensions and dtype need to be derived from the operand's type, the same way MakeBlockLoadCodegenPTO derives height, width, and dtype_str from TensorType.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 228 - 235,
MakePrintCodegenPTO currently emits a hardcoded placeholder type string
"!pto.partition_tensor_view<MxNxdtype>" which is invalid MLIR; instead, inspect
the operand type (op->args_[0]) like MakeBlockLoadCodegenPTO does: cast to
TensorType (or the appropriate MLIR tensor wrapper), extract the tile height,
width and element dtype strings, build the concrete partition_tensor_view type
string (e.g. "<height>x<width>x<dtype>" or the same format used by
MakeBlockLoadCodegenPTO) and pass that into codegen.Emit in place of the literal
MxNxdtype; update the code in MakePrintCodegenPTO and any helper calls to use
those extracted values so the emitted MLIR contains the real dimensions and
dtype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP: Tile/Tensor attribute support.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lwDavid, understood! I'll look forward to the tile/tensor attribute support being completed. That will properly resolve the type annotation issue in MakePrintCodegenPTO and likely benefit other operations as well. Let me know if you need any assistance when implementing it.

Comment on lines 678 to 724
REGISTER_BACKEND_OP(Backend910B_PTO, "block.row_sum")
.set_pipe(ir::PipeType::V)
.f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) {
return MakeBinaryCodegenPTO("pto.trowsum", op, codegen);
});

REGISTER_BACKEND_OP(Backend910B_PTO, "block.row_max")
.set_pipe(ir::PipeType::V)
.f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) {
return MakeBinaryCodegenPTO("pto.trowmax", op, codegen);
});

REGISTER_BACKEND_OP(Backend910B_PTO, "block.row_min")
.set_pipe(ir::PipeType::V)
.f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) {
return MakeBinaryCodegenPTO("pto.trowmin", op, codegen);
});

REGISTER_BACKEND_OP(Backend910B_PTO, "block.row_expand")
.set_pipe(ir::PipeType::V)
.f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) {
return MakeUnaryCodegenPTO("pto.trowexpand", op, codegen);
});

REGISTER_BACKEND_OP(Backend910B_PTO, "block.col_sum")
.set_pipe(ir::PipeType::V)
.f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) {
return MakeUnaryCodegenPTO("pto.tcolsum", op, codegen);
});

REGISTER_BACKEND_OP(Backend910B_PTO, "block.col_max")
.set_pipe(ir::PipeType::V)
.f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) {
return MakeUnaryCodegenPTO("pto.tcolmax", op, codegen);
});

REGISTER_BACKEND_OP(Backend910B_PTO, "block.col_min")
.set_pipe(ir::PipeType::V)
.f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) {
return MakeUnaryCodegenPTO("pto.tcolmin", op, codegen);
});

REGISTER_BACKEND_OP(Backend910B_PTO, "block.col_expand")
.set_pipe(ir::PipeType::V)
.f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) {
return MakeUnaryCodegenPTO("pto.tcolexpand", op, codegen);
});
Copy link

@coderabbitai coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for IR definitions and operation declarations
rg -n "row_sum|col_sum|row_max|col_max|row_min|col_min|row_expand|col_expand" --type cpp -B 2 -A 2 | head -100

Repository: hw-native-sys/pypto

Length of output: 7478


🏁 Script executed:

#!/bin/bash
# Find MakeBinaryCodegenPTO and MakeUnaryCodegenPTO implementations
rg -n "MakeBinaryCodegenPTO|MakeUnaryCodegenPTO" --type cpp -B 3 -A 10

Repository: hw-native-sys/pypto

Length of output: 43507


🏁 Script executed:

#!/bin/bash
# Search for ir:: namespace definitions of these operations
fd -e h -e hpp | xargs rg -l "row_sum|col_sum" 2>/dev/null | head -5

Repository: hw-native-sys/pypto

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Search for IR definitions in header files
find . -name "*.h" -o -name "*.hpp" | xargs grep -l "row_sum\|col_sum" 2>/dev/null | head -10

Repository: hw-native-sys/pypto

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Search for tests or usage of these operations
rg -n "block\.row_sum|block\.col_sum|block\.row_max|block\.col_max" --type cpp --type python -C 2 | head -150

Repository: hw-native-sys/pypto

Length of output: 93


🏁 Script executed:

#!/bin/bash
# Look for any documentation or comments about these operations
rg -n "row_sum|col_sum|reduction|trowsum|tcolsum" --type cpp -B 3 -A 3 | grep -E "(//|/\*|\*\/)" -A 2 -B 2 | head -80

Repository: hw-native-sys/pypto

Length of output: 4510


🏁 Script executed:

#!/bin/bash
# Read the reduction.cpp file to understand the arity requirements
cat -n src/ir/op/block_ops/reduction.cpp | head -250

Repository: hw-native-sys/pypto

Length of output: 10987


🏁 Script executed:

#!/bin/bash
# Search for col reduction type deduction
rg -n "DeduceBlockCol|col_sum|col_max|col_min" src/ir/op/block_ops/reduction.cpp -B 3 -A 10

Repository: hw-native-sys/pypto

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Search for col_sum, col_max, col_min in the entire codebase
rg -n "block\.col_sum|block\.col_max|block\.col_min|\"col_sum\"|\"col_max\"|\"col_min\"" --type cpp | head -50

Repository: hw-native-sys/pypto

Length of output: 374


🏁 Script executed:

#!/bin/bash
# Search for any IR registration of col_sum, col_max, col_min, col_expand, col_fill
rg -rn "REGISTER_OP.*col_" --type cpp | head -20

Repository: hw-native-sys/pypto

Length of output: 250


🏁 Script executed:

#!/bin/bash
# Check if there's another file defining col reduction operations
find . -name "*.cpp" -o -name "*.h" | xargs grep -l "col_sum\|col_max\|col_min" | grep -v backend | head -10

Repository: hw-native-sys/pypto

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Check the entire reduction.cpp file for col operations
cat -n src/ir/op/block_ops/reduction.cpp | tail -100

Repository: hw-native-sys/pypto

Length of output: 5528


🏁 Script executed:

#!/bin/bash
# Search across all files for REGISTER_OP of col operations
rg "REGISTER_OP.*col" --type cpp

Repository: hw-native-sys/pypto

Length of output: 338


🏁 Script executed:

#!/bin/bash
# Search the backend file to see if col operations have any explanation
cat -n src/backend/910B_PTO/backend_910b_pto_ops.cpp | sed -n '670,750p'

Repository: hw-native-sys/pypto

Length of output: 3903


🏁 Script executed:

#!/bin/bash
# Check if col_sum, col_max, col_min are registered anywhere in IR
rg "REGISTER_OP.*col_sum|REGISTER_OP.*col_max|REGISTER_OP.*col_min" --type cpp

Repository: hw-native-sys/pypto

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Check broadcast.cpp for col_expand registration to understand the pattern
cat -n src/ir/op/block_ops/broadcast.cpp | grep -A 10 "col_expand"

Repository: hw-native-sys/pypto

Length of output: 2951


🏁 Script executed:

#!/bin/bash
# Verify if there are IR definitions for col_sum, col_max, col_min anywhere
find . -name "*.cpp" -o -name "*.h" | xargs grep -l "block\.col_sum\|block\.col_max\|block\.col_min" | grep -v backend

Repository: hw-native-sys/pypto

Length of output: 45


Arity mismatch: block.col_expand IR definition vs. PTO backend codegen.

The IR definition for block.col_expand in src/ir/op/block_ops/broadcast.cpp (line 182) declares 2 arguments ("target" and "col_tile"), but the PTO backend codegen at line 723 uses MakeUnaryCodegenPTO, which enforces exactly 1 argument. This will cause a CHECK failure at runtime. The row reduction operations (block.row_sum, block.row_max, block.row_min) correctly use MakeBinaryCodegenPTO for their 2-argument signatures defined in the IR.

Additionally, block.col_sum, block.col_max, and block.col_min have no IR definitions—only backend registrations using MakeUnaryCodegenPTO. If these are intended to parallel the row reductions (which take 2 arguments), they need IR definitions with matching 2-argument signatures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 678 - 724, The
PTO backend registrations for column operations mismatch the IR arity: change
the codegen calls for block.col_expand from MakeUnaryCodegenPTO to
MakeBinaryCodegenPTO to match its IR signature (arguments "target" and
"col_tile"), and verify whether block.col_sum, block.col_max, and block.col_min
should be two-argument ops like their row counterparts—if so, add corresponding
two-argument IR definitions or switch their backend calls from
MakeUnaryCodegenPTO to MakeBinaryCodegenPTO to ensure arity matches (look for
the REGISTER_BACKEND_OP entries and the MakeUnaryCodegenPTO/MakeBinaryCodegenPTO
usages to update).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. But there is a mismatch on col_expand between PTO-ISA and block IR.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the current IR definition for col_expand
cat src/ir/op/block_ops/broadcast.cpp | grep -A 20 "col_expand"

Length of output: 2602


🏁 Script executed:

#!/bin/bash
# Check the current backend registration for col_expand
rg -n "block\.col_expand" src/backend/910B_PTO/backend_910b_pto_ops.cpp -A 5

Length of output: 259


🏁 Script executed:

#!/bin/bash
# Check row_expand for comparison
rg -n "block\.row_expand" src/backend/910B_PTO/backend_910b_pto_ops.cpp -A 5

Length of output: 1066


🏁 Script executed:

#!/bin/bash
# Look for any TODOs or comments about col_expand mismatch
rg -n "col_expand|tcolexpand" --type cpp -B 2 -A 2 | grep -E "(TODO|FIXME|NOTE|mismatch|arity)" -B 3 -A 3

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Search for PTO ISA documentation or definitions of tcolexpand
rg -n "tcolexpand" --type cpp -B 3 -A 3

Length of output: 646


🏁 Script executed:

#!/bin/bash
# Check the IR definition for row_expand to compare with col_expand
cat src/ir/op/block_ops/broadcast.cpp | grep -B 2 -A 15 "\"block.row_expand\""

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for any examples or tests showing how col_expand and row_expand are used
rg -n "col_expand|row_expand" --type py -C 2 | head -80

Length of output: 6766


🏁 Script executed:

#!/bin/bash
# Check if there are any helper implementations for row/col expand that show which arg is used
rg -n "trowexpand|tcolexpand" --type cpp -C 5 | head -100

Length of output: 3565


🏁 Script executed:

#!/bin/bash
# Search more broadly for row_expand IR definition
rg -n "REGISTER_OP.*row_expand" --type cpp -A 10

Length of output: 3614


🏁 Script executed:

#!/bin/bash
# Check if there are any comments explaining the target argument usage
rg -n "target.*shape|output shape" src/ir/op/block_ops/broadcast.cpp -B 2 -A 2

Length of output: 1198


🏁 Script executed:

#!/bin/bash
# Look at the implementation of GenerateInsOutsClause to understand how args are processed
rg -n "GenerateInsOutsClause" src/backend/910B_PTO/backend_910b_pto_ops.cpp -A 15

Length of output: 5979


@lwDavid, I understand the issue now. The mismatch you're referring to is an architectural difference between the block IR level and the PTO instruction set:

Block IR (block.col_expand): Takes 2 arguments:

  • target - only used for shape inference at the IR level (determines the output dimensions)
  • col_tile - the actual column data to expand [1, cols]

PTO-ISA (pto.tcolexpand): Takes only 1 operand:

  • The column tile to expand; the output shape is determined by the result tile buffer allocated

The current backend code uses MakeUnaryCodegenPTO, which would fail the arity CHECK since the IR passes 2 arguments. If you changed it to MakeBinaryCodegenPTO, the CHECK would pass but it would incorrectly emit both arguments to the PTO instruction via GenerateInsOutsClause, which iterates through all args.

The proper solution is to create a custom codegen helper similar to MakeFullCodegenPTO (lines 153-163) that:

  1. Accepts 2 arguments to match the IR signature
  2. Only emits args_[1] (the col_tile) to the PTO instruction
  3. Ignores args_[0] (target) since it was only needed for type deduction

Would you like me to help create this custom helper function for col_expand (and row_expand if it has the same pattern)?

@lwDavid
Copy link
Contributor Author

lwDavid commented Feb 25, 2026

/gemini review

@lwDavid
Copy link
Contributor Author

lwDavid commented Feb 25, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly expands the capabilities of the 910B PTO backend by adding codegen support for a wide range of block-level operations, including memory, arithmetic, reduction, and data movement. The changes are well-structured, with new operations logically grouped. I appreciate the improvements to the error messages for arity checks, which will make debugging easier. My review includes one suggestion to enhance the maintainability of the new helper functions.

Comment on lines 120 to 121
CHECK(op->args_.size() == 3) << "Operation:[pto.tstore.fp] requires 3 arguments, but got "
<< op->args_.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better maintainability and consistency, consider using the pto_op_name parameter in the error message instead of hardcoding the operation name.

This would make the helper function slightly more robust if the operation name passed from the registration point changes. This suggestion also applies to the other new helper functions in this file (MakeCmpsCodegenPTO, MakeAssignCodegenPTO, MakeCiCodegenPTO, etc.).

Suggested change
CHECK(op->args_.size() == 3) << "Operation:[pto.tstore.fp] requires 3 arguments, but got "
<< op->args_.size();
CHECK(op->args_.size() == 3) << "Operation:[" << pto_op_name << "] requires 3 arguments, but got "
<< op->args_.size();

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
src/backend/910B_PTO/backend_910b_pto_ops.cpp (3)

203-225: ⚠️ Potential issue | 🔴 Critical

sort32/mrgsort still emit incomplete MLIR instructions.

Line 209 and Line 223 emit only the opcode name, without ins(...)/outs(...). That produces invalid/incomplete codegen for registered ops.

🐛 Suggested fix
 static std::string MakeSort32CodegenPTO(const std::string& pto_op_name, const CallPtr& op,
                                         codegen::CodegenBase& codegen_base) {
   auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base);
   CHECK(op->args_.size() == 1) << "Operation:[pto.tsort32] requires 1 argument, but got " << op->args_.size();
-  // std::string src = codegen.GetExprAsCode(op->args_[0]);
-  // std::string dst = codegen.GetCurrentResultTarget();
-  codegen.Emit(pto_op_name);
+  codegen.Emit(pto_op_name + " " + GenerateInsOutsClause(op, codegen));
   return "";
 }
 
 static std::string MakeMrgSortCodegenPTO(const std::string& pto_op_name, const CallPtr& op,
                                          codegen::CodegenBase& codegen_base) {
   auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base);
   CHECK(op->args_.size() == 2) << "Operation:[pto.tmrgsort] requires 2 arguments, but got "
                                << op->args_.size();
-  // std::string src = codegen.GetExprAsCode(op->args_[0]);
-  // std::string blockLen = codegen.GetExprAsCode(op->args_[1]);
-  // std::string dst = codegen.GetCurrentResultTarget();
-  codegen.Emit(pto_op_name);
+  codegen.Emit(pto_op_name + " " + GenerateInsOutsClause(op, codegen));
   return "";
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 203 - 225,
MakeSort32CodegenPTO and MakeMrgSortCodegenPTO currently emit only the opcode
name (via codegen.Emit(pto_op_name)) and produce invalid MLIR; update both
functions to construct and emit full instruction strings including ins(...) and
outs(...). Use codegen.GetExprAsCode(op->args_[i]) to build the input operands
(for sort32 use the single arg, for MrgSort use both src and blockLen), and use
codegen.GetCurrentResultTarget() for the output(s); then call codegen.Emit with
the assembled string "pto_op_name ins(<inputs>) outs(<outputs>)" so the
generated MLIR includes proper ins/outs metadata.

228-234: ⚠️ Potential issue | 🟠 Major

block.print emits a placeholder type that is not concrete.

Line 233 hardcodes MxNxdtype, so emitted MLIR is not type-concrete. The type should be derived from the operand/result metadata before emission.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 228 - 234,
MakePrintCodegenPTO: Replace the hardcoded placeholder "MxNxdtype" in the
emitted string with the concrete tensor type derived from the operand/result
metadata for op->args_[0]; use the existing codegen/context helpers (e.g., call
into codegen (the dynamic_cast result) to obtain the operand's concrete type
string via the appropriate method (similar to GetExprAsCode - e.g.,
GetTypeAsCode or querying op->args_[0]->type/metadata) and then emit pto_op_name
+ " ins(" + src + " | !pto.partition_tensor_view<" + concreteType + ">)" instead
of the current literal, ensuring the emitted MLIR uses the concrete dtype/shape
from the operand metadata.

720-724: ⚠️ Potential issue | 🟠 Major

col_expand likely needs a custom emitter instead of generic binary forwarding.

If IR keeps 2 args (target, col_tile) but PTO tcolexpand consumes only the data operand, MakeBinaryCodegenPTO will emit a semantically wrong ins(...) list. This should be a dedicated helper that validates 2 args and emits only the ISA-required operand.

Use this read-only check to confirm IR-vs-ISA operand expectations:

#!/bin/bash
set -euo pipefail

echo "=== IR definition for block.col_expand ==="
rg -n "block\\.col_expand|col_expand" src/ir/op/block_ops -C 4

echo
echo "=== PTO backend registration/emission path ==="
rg -n "block\\.col_expand|tcolexpand|MakeBinaryCodegenPTO|MakeUnaryCodegenPTO" \
  src/backend/910B_PTO/backend_910b_pto_ops.cpp -C 4

echo
echo "=== Any other tcolexpand references (signature hints) ==="
rg -n "tcolexpand" --type cpp --type h -C 3

Expected confirmation: whether tcolexpand is unary at ISA level while IR remains binary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 720 - 724, The
current REGISTER_BACKEND_OP for block.col_expand uses MakeBinaryCodegenPTO which
forwards two IR operands, but the PTO instruction "pto.tcolexpand" is ISA-unary
and should only receive the data operand; replace the MakeBinaryCodegenPTO call
with a dedicated emitter (e.g., MakeColExpandCodegenPTO or a validated
MakeUnaryCodegenPTO) that (1) checks the ir::CallPtr op has exactly 2 args
(target, col_tile) and (2) emits only the single data operand required by
pto.tcolexpand, updating the backend registration in
REGISTER_BACKEND_OP(Backend910B_PTO, "block.col_expand") to call this new helper
instead of MakeBinaryCodegenPTO.
🧹 Nitpick comments (1)
src/backend/910B_PTO/backend_910b_pto_ops.cpp (1)

181-193: Use emitted opcode names in CHECK text for consistency.

Line 181 and Line 192 report pto.assign / pto.ci, but the emitted ops are pto.tassign / pto.tci. This makes runtime diagnostics misleading.

🔧 Suggested tweak
-  CHECK(op->args_.size() == 2) << "Operation:[pto.assign] requires 2 arguments, but got " << op->args_.size();
+  CHECK(op->args_.size() == 2) << "Operation:[pto.tassign] requires 2 arguments, but got " << op->args_.size();

-  CHECK(op->args_.size() == 1) << "Operation:[pto.ci] requires 1 argument, but got " << op->args_.size();
+  CHECK(op->args_.size() == 1) << "Operation:[pto.tci] requires 1 argument, but got " << op->args_.size();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 181 - 193, The
CHECK messages currently hardcode "pto.assign" and "pto.ci" which do not match
the actual emitted op names ("pto.tassign"/"pto.tci"); update the CHECK
diagnostics to reference the real emitted names by either using the pto_op_name
variable (e.g., CHECK(... ) << "Operation: [" << pto_op_name << "] requires X
arguments, but got " << ...) in the function that emits using pto_op_name, and
similarly change the CHECK in MakeCiCodegenPTO to mention "pto.tci" or use a
local pto_op_name for consistency so runtime error messages match the emitted op
names.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp`:
- Around line 203-225: MakeSort32CodegenPTO and MakeMrgSortCodegenPTO currently
emit only the opcode name (via codegen.Emit(pto_op_name)) and produce invalid
MLIR; update both functions to construct and emit full instruction strings
including ins(...) and outs(...). Use codegen.GetExprAsCode(op->args_[i]) to
build the input operands (for sort32 use the single arg, for MrgSort use both
src and blockLen), and use codegen.GetCurrentResultTarget() for the output(s);
then call codegen.Emit with the assembled string "pto_op_name ins(<inputs>)
outs(<outputs>)" so the generated MLIR includes proper ins/outs metadata.
- Around line 228-234: MakePrintCodegenPTO: Replace the hardcoded placeholder
"MxNxdtype" in the emitted string with the concrete tensor type derived from the
operand/result metadata for op->args_[0]; use the existing codegen/context
helpers (e.g., call into codegen (the dynamic_cast result) to obtain the
operand's concrete type string via the appropriate method (similar to
GetExprAsCode - e.g., GetTypeAsCode or querying op->args_[0]->type/metadata) and
then emit pto_op_name + " ins(" + src + " | !pto.partition_tensor_view<" +
concreteType + ">)" instead of the current literal, ensuring the emitted MLIR
uses the concrete dtype/shape from the operand metadata.
- Around line 720-724: The current REGISTER_BACKEND_OP for block.col_expand uses
MakeBinaryCodegenPTO which forwards two IR operands, but the PTO instruction
"pto.tcolexpand" is ISA-unary and should only receive the data operand; replace
the MakeBinaryCodegenPTO call with a dedicated emitter (e.g.,
MakeColExpandCodegenPTO or a validated MakeUnaryCodegenPTO) that (1) checks the
ir::CallPtr op has exactly 2 args (target, col_tile) and (2) emits only the
single data operand required by pto.tcolexpand, updating the backend
registration in REGISTER_BACKEND_OP(Backend910B_PTO, "block.col_expand") to call
this new helper instead of MakeBinaryCodegenPTO.

---

Nitpick comments:
In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp`:
- Around line 181-193: The CHECK messages currently hardcode "pto.assign" and
"pto.ci" which do not match the actual emitted op names
("pto.tassign"/"pto.tci"); update the CHECK diagnostics to reference the real
emitted names by either using the pto_op_name variable (e.g., CHECK(... ) <<
"Operation: [" << pto_op_name << "] requires X arguments, but got " << ...) in
the function that emits using pto_op_name, and similarly change the CHECK in
MakeCiCodegenPTO to mention "pto.tci" or use a local pto_op_name for consistency
so runtime error messages match the emitted op names.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8a1d09 and cb1bf7e.

📒 Files selected for processing (1)
  • src/backend/910B_PTO/backend_910b_pto_ops.cpp

@lwDavid
Copy link
Contributor Author

lwDavid commented Feb 25, 2026

@coderabbitai help

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
    • @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
    • @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit configuration file (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, support, documentation and community

  • Visit our status page to check the current availability of CodeRabbit.
  • Create a ticket on our support page for assistance with any issues or questions.
  • Visit our documentation site for detailed information on how to use CodeRabbit.
  • Join our Discord community to connect with other users and get help from the community.
  • Follow us on X/Twitter for updates and announcements.

@lwDavid
Copy link
Contributor Author

lwDavid commented Feb 25, 2026

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

✅ Actions performed

Full review triggered.

@lwDavid
Copy link
Contributor Author

lwDavid commented Feb 25, 2026

@coderabbitai pause

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/backend/910B_PTO/backend_910b_pto_ops.cpp (3)

228-235: ⚠️ Potential issue | 🟠 Major

block.print still hardcodes placeholder type text.

!pto.partition_tensor_view<MxNxdtype> is a literal placeholder, so emitted MLIR cannot be semantically correct for real tensors. The type needs to be derived from the operand shape/dtype.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 228 - 235, The
emitted type in MakePrintCodegenPTO is a hardcoded placeholder; replace the
literal "!pto.partition_tensor_view<MxNxdtype>" with the actual partition tensor
view type derived from the operand (op->args_[0]) using the codegen helper APIs
(e.g. use codegen to query the operand's shape and dtype or a
GetType/TypeAsString-like method and format it into the
partition_tensor_view<rowsxcolsxdtype> string), then emit pto_op_name with that
concrete type; update calls around codegen.GetExprAsCode(op->args_[0]) and
codegen.Emit(...) to build the correct type string before emitting.

203-225: ⚠️ Potential issue | 🔴 Critical

sort32 / mrgsort still emit malformed stub ops (no ins/outs).

Both helpers emit only pto_op_name, ignoring validated args and result target. This produces non-actionable MLIR and effectively leaves these ops unimplemented.

🐛 Minimal safe fix (until multi-output support lands)
 static std::string MakeSort32CodegenPTO(const std::string& pto_op_name, const CallPtr& op,
                                         codegen::CodegenBase& codegen_base) {
   auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base);
   CHECK(op->args_.size() == 1) << "Operation:[pto.tsort32] requires 1 argument, but got " << op->args_.size();
-  codegen.Emit(pto_op_name);
+  codegen.Emit(pto_op_name + " " + GenerateInsOutsClause(op, codegen));
   return "";
 }

 static std::string MakeMrgSortCodegenPTO(const std::string& pto_op_name, const CallPtr& op,
                                          codegen::CodegenBase& codegen_base) {
   auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base);
   CHECK(op->args_.size() == 2) << "Operation:[pto.tmrgsort] requires 2 arguments, but got "
                                << op->args_.size();
-  codegen.Emit(pto_op_name);
+  codegen.Emit(pto_op_name + " " + GenerateInsOutsClause(op, codegen));
   return "";
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 203 - 225,
MakeSort32CodegenPTO and MakeMrgSortCodegenPTO currently only call
codegen.Emit(pto_op_name) which emits ops with no inputs/outputs; instead build
the PTO stub using the validated arguments and the current result target: call
codegen.GetExprAsCode on op->args_[i] to obtain operand strings, call
codegen.GetCurrentResultTarget() for the destination, then invoke codegen.Emit
with a properly formatted op string that includes the input(s) and result (e.g.
"pto_op_name %src : type -> %dst" or the project's PTO emission convention).
Update both functions (MakeSort32CodegenPTO and MakeMrgSortCodegenPTO) to use
these helpers and pass operands/results to codegen.Emit so emitted ops include
ins/outs even until multi-output support is added.

696-724: ⚠️ Potential issue | 🟠 Major

Verify row/col expand backend arity against IR + PTO-ISA before merge.

Current mapping uses unary for block.row_expand and binary for block.col_expand. If col_expand IR includes a shape-only target arg while PTO instruction accepts only data input, MakeBinaryCodegenPTO will emit an extra operand.

#!/bin/bash
set -euo pipefail

echo "== IR signatures for row/col reduce-expand ops =="
rg -n 'REGISTER_OP\("block\.(row_expand|col_expand|row_sum|col_sum|col_max|col_min)"' src/ir/op/block_ops -A12 -B2

echo
echo "== PTO backend mappings =="
rg -n 'REGISTER_BACKEND_OP\(Backend910B_PTO, "block\.(row_expand|col_expand|row_sum|col_sum|col_max|col_min)"' src/backend/910B_PTO/backend_910b_pto_ops.cpp -A6 -B1

echo
echo "== PTO op docs/definitions references (operand expectations) =="
rg -n 'trowexpand|tcolexpand|trowsum|tcolsum|tcolmax|tcolmin' --type cpp --type h --type md --type txt

Expected verification outcome:

  • If IR arity is 2 but ISA arity is 1, add a custom helper that validates 2 IR args and emits only the ISA-required operand.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 696 - 724, The
backend registers use MakeUnaryCodegenPTO for "block.row_expand" (producing
"pto.trowexpand") but MakeBinaryCodegenPTO for "block.col_expand" (producing
"pto.tcolexpand"); verify the IR signatures for block.row_expand and
block.col_expand and the PTO-ISA operand arity, and if the IR includes a
shape-only target argument while the PTO ISA expects only the data input,
replace the direct MakeBinaryCodegenPTO usage (or add a small helper used by
REGISTER_BACKEND_OP for "block.col_expand") that validates both IR args but
emits only the ISA-required operand (dropping the shape-only target) so emitted
operands match the PTO instruction; use the identifiers MakeUnaryCodegenPTO,
MakeBinaryCodegenPTO, "block.row_expand", "block.col_expand", "pto.trowexpand",
and "pto.tcolexpand" to locate and update the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp`:
- Around line 120-121: Several CHECK() calls hardcode operation labels (e.g.,
"pto.tstore.fp") which can drift; replace these hardcoded strings with the
pto_op_name variable so runtime messages reflect the actual emitted op name.
Locate the CHECK(...) calls that validate op->args_.size() (the one currently
using "Operation:[pto.tstore.fp] requires 3 arguments..." and the similar checks
for tassign/tci/etc.), and change the message to use << pto_op_name << "
requires X arguments, but got " << op->args_.size(); keep the CHECK condition
and op/args_ references unchanged and apply the same replacement for the other
occurrences that currently hardcode op labels.

---

Duplicate comments:
In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp`:
- Around line 228-235: The emitted type in MakePrintCodegenPTO is a hardcoded
placeholder; replace the literal "!pto.partition_tensor_view<MxNxdtype>" with
the actual partition tensor view type derived from the operand (op->args_[0])
using the codegen helper APIs (e.g. use codegen to query the operand's shape and
dtype or a GetType/TypeAsString-like method and format it into the
partition_tensor_view<rowsxcolsxdtype> string), then emit pto_op_name with that
concrete type; update calls around codegen.GetExprAsCode(op->args_[0]) and
codegen.Emit(...) to build the correct type string before emitting.
- Around line 203-225: MakeSort32CodegenPTO and MakeMrgSortCodegenPTO currently
only call codegen.Emit(pto_op_name) which emits ops with no inputs/outputs;
instead build the PTO stub using the validated arguments and the current result
target: call codegen.GetExprAsCode on op->args_[i] to obtain operand strings,
call codegen.GetCurrentResultTarget() for the destination, then invoke
codegen.Emit with a properly formatted op string that includes the input(s) and
result (e.g. "pto_op_name %src : type -> %dst" or the project's PTO emission
convention). Update both functions (MakeSort32CodegenPTO and
MakeMrgSortCodegenPTO) to use these helpers and pass operands/results to
codegen.Emit so emitted ops include ins/outs even until multi-output support is
added.
- Around line 696-724: The backend registers use MakeUnaryCodegenPTO for
"block.row_expand" (producing "pto.trowexpand") but MakeBinaryCodegenPTO for
"block.col_expand" (producing "pto.tcolexpand"); verify the IR signatures for
block.row_expand and block.col_expand and the PTO-ISA operand arity, and if the
IR includes a shape-only target argument while the PTO ISA expects only the data
input, replace the direct MakeBinaryCodegenPTO usage (or add a small helper used
by REGISTER_BACKEND_OP for "block.col_expand") that validates both IR args but
emits only the ISA-required operand (dropping the shape-only target) so emitted
operands match the PTO instruction; use the identifiers MakeUnaryCodegenPTO,
MakeBinaryCodegenPTO, "block.row_expand", "block.col_expand", "pto.trowexpand",
and "pto.tcolexpand" to locate and update the code.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a64644e and cb1bf7e.

📒 Files selected for processing (1)
  • src/backend/910B_PTO/backend_910b_pto_ops.cpp

Comment on lines 120 to 121
CHECK(op->args_.size() == 3) << "Operation:[pto.tstore.fp] requires 3 arguments, but got "
<< op->args_.size();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use pto_op_name in CHECK messages to prevent stale op labels.

A couple of helpers hardcode labels that already drift from emitted op names (pto.assign vs pto.tassign, pto.ci vs pto.tci). This makes runtime failures misleading.

🔧 Proposed fix
-  CHECK(op->args_.size() == 3) << "Operation:[pto.tstore.fp] requires 3 arguments, but got "
+  CHECK(op->args_.size() == 3) << "Operation:[" << pto_op_name << "] requires 3 arguments, but got "
                                << op->args_.size();

-  CHECK(op->args_.size() == 2) << "Operation:[pto.tcmps] requires 2 arguments, but got " << op->args_.size();
+  CHECK(op->args_.size() == 2) << "Operation:[" << pto_op_name << "] requires 2 arguments, but got "
+                               << op->args_.size();

-  CHECK(op->args_.size() == 2) << "Operation:[pto.assign] requires 2 arguments, but got " << op->args_.size();
+  CHECK(op->args_.size() == 2) << "Operation:[" << pto_op_name << "] requires 2 arguments, but got "
+                               << op->args_.size();

-  CHECK(op->args_.size() == 1) << "Operation:[pto.ci] requires 1 argument, but got " << op->args_.size();
+  CHECK(op->args_.size() == 1) << "Operation:[" << pto_op_name << "] requires 1 argument, but got "
+                               << op->args_.size();

Also applies to: 169-169, 181-181, 192-192

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 120 - 121,
Several CHECK() calls hardcode operation labels (e.g., "pto.tstore.fp") which
can drift; replace these hardcoded strings with the pto_op_name variable so
runtime messages reflect the actual emitted op name. Locate the CHECK(...) calls
that validate op->args_.size() (the one currently using
"Operation:[pto.tstore.fp] requires 3 arguments..." and the similar checks for
tassign/tci/etc.), and change the message to use << pto_op_name << " requires X
arguments, but got " << op->args_.size(); keep the CHECK condition and op/args_
references unchanged and apply the same replacement for the other occurrences
that currently hardcode op labels.

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

✅ Actions performed

Reviews paused.

@lwDavid
Copy link
Contributor Author

lwDavid commented Feb 25, 2026

@Hzfengsy Request review.

@lwDavid
Copy link
Contributor Author

lwDavid commented Feb 26, 2026

@Hzfengsy Conflicts resolved, request review.

@Hzfengsy Hzfengsy merged commit 53b5493 into hw-native-sys:main Feb 26, 2026
6 checks passed
@lwDavid lwDavid deleted the asm branch February 26, 2026 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants