feat(codegen): Add PTO codegen for memory, reduction, layout and utility ops#263
feat(codegen): Add PTO codegen for memory, reduction, layout and utility ops#263Hzfengsy merged 2 commits intohw-native-sys:mainfrom
Conversation
Summary of ChangesHello @lwDavid, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces significant enhancements to the 910B PTO backend by implementing codegen handlers for previously unsupported block-level operations. These additions cover a wide range of functionalities, including memory operations, tile arithmetic, axis manipulation, data movement, and complex number operations. Furthermore, the PR improves the robustness and user-friendliness of the codebase by refining error messages and fixing an existing bug. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedUse the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughExpanded Backend910B_PTO with many new PTO codegen helpers and ~50+ new/updated block.* operation registrations; enhanced argument-counted error messages and multi-line emission patterns for several load/store and tile operations. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
There was a problem hiding this comment.
Code Review
This pull request significantly expands the capabilities of the 910B PTO backend by adding codegen support for a wide range of memory, arithmetic, reduction, and other block-level operations. The improvements to the arity check error messages are also a welcome addition for enhancing debuggability. My review identified a critical, recurring issue with missing spaces in the generated MLIR for several new helper functions, which would cause syntax errors. I also found some unused variables in placeholder functions that should be cleaned up to avoid compiler warnings. All identified issues remain relevant and are included in the comments.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp`:
- Line 125: The helpers currently emit malformed MLIR by concatenating
pto_op_name directly with "ins(...)" (e.g., codegen.Emit(pto_op_name + "ins(" +
src + ", " + fp + ") outs(" + mem + ")")); update those emits to include a
separating space and reuse the existing pattern: replace the direct concat with
pto_op_name + " " + GenerateInsOutsClause(src, fp, mem) (or at minimum
pto_op_name + " " + "ins(...)" if you must keep the inline format); apply the
same change to all four sites that use pto_op_name + "ins(...)" (also where
similar "tassignins", "tciins", "tprintins" are emitted).
- Around line 228-235: MakePrintCodegenPTO currently emits a hardcoded
placeholder type string "!pto.partition_tensor_view<MxNxdtype>" which is invalid
MLIR; instead, inspect the operand type (op->args_[0]) like
MakeBlockLoadCodegenPTO does: cast to TensorType (or the appropriate MLIR tensor
wrapper), extract the tile height, width and element dtype strings, build the
concrete partition_tensor_view type string (e.g. "<height>x<width>x<dtype>" or
the same format used by MakeBlockLoadCodegenPTO) and pass that into codegen.Emit
in place of the literal MxNxdtype; update the code in MakePrintCodegenPTO and
any helper calls to use those extracted values so the emitted MLIR contains the
real dimensions and dtype.
- Around line 203-211: In MakeSort32CodegenPTO remove the unused local variables
src and dst and emit the PTO op with its operand and result target instead of a
bare name: use codegen.GetExprAsCode(op->args_[0]) and
codegen.GetCurrentResultTarget() directly when calling codegen.Emit (or pass
them into the existing Emit overload that accepts operands/results) so the
emitted MLIR includes the input and result; keep the CHECK on arg count and do
not leave an empty return.
- Around line 215-225: The function MakeMrgSortCodegenPTO currently computes
src, blockLen, and dst but never uses them and returns an empty string; update
it to mirror MakeSort32CodegenPTO by emitting a PTO invocation that includes
those values and returning the destination target. Specifically, use
codegen.Emit to output a formatted emission that references dst, src, and
blockLen (constructed from codegen.GetCurrentResultTarget() and
codegen.GetExprAsCode(op->args_[0/1])) so the variables are used, and then
return dst instead of an empty string.
- Around line 678-724: The PTO backend registrations for column operations
mismatch the IR arity: change the codegen calls for block.col_expand from
MakeUnaryCodegenPTO to MakeBinaryCodegenPTO to match its IR signature (arguments
"target" and "col_tile"), and verify whether block.col_sum, block.col_max, and
block.col_min should be two-argument ops like their row counterparts—if so, add
corresponding two-argument IR definitions or switch their backend calls from
MakeUnaryCodegenPTO to MakeBinaryCodegenPTO to ensure arity matches (look for
the REGISTER_BACKEND_OP entries and the MakeUnaryCodegenPTO/MakeBinaryCodegenPTO
usages to update).
| static std::string MakePrintCodegenPTO(const std::string& pto_op_name, const CallPtr& op, | ||
| codegen::CodegenBase& codegen_base) { | ||
| auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base); | ||
| CHECK(op->args_.size() == 1) << "Operation:[pto.tprint] requires 1 argument, but got " << op->args_.size(); | ||
| std::string src = codegen.GetExprAsCode(op->args_[0]); | ||
| codegen.Emit(pto_op_name + "ins(" + src + " | !pto.partition_tensor_view<MxNxdtype>)"); | ||
| return ""; | ||
| } |
There was a problem hiding this comment.
Hardcoded !pto.partition_tensor_view<MxNxdtype> is not valid MLIR.
The type annotation uses the literal string MxNxdtype as a placeholder instead of the actual tile dimensions and element type. The emitted code will be rejected by the MLIR parser at runtime. The dimensions and dtype need to be derived from the operand's type, the same way MakeBlockLoadCodegenPTO derives height, width, and dtype_str from TensorType.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 228 - 235,
MakePrintCodegenPTO currently emits a hardcoded placeholder type string
"!pto.partition_tensor_view<MxNxdtype>" which is invalid MLIR; instead, inspect
the operand type (op->args_[0]) like MakeBlockLoadCodegenPTO does: cast to
TensorType (or the appropriate MLIR tensor wrapper), extract the tile height,
width and element dtype strings, build the concrete partition_tensor_view type
string (e.g. "<height>x<width>x<dtype>" or the same format used by
MakeBlockLoadCodegenPTO) and pass that into codegen.Emit in place of the literal
MxNxdtype; update the code in MakePrintCodegenPTO and any helper calls to use
those extracted values so the emitted MLIR contains the real dimensions and
dtype.
There was a problem hiding this comment.
WIP: Tile/Tensor attribute support.
There was a problem hiding this comment.
@lwDavid, understood! I'll look forward to the tile/tensor attribute support being completed. That will properly resolve the type annotation issue in MakePrintCodegenPTO and likely benefit other operations as well. Let me know if you need any assistance when implementing it.
| REGISTER_BACKEND_OP(Backend910B_PTO, "block.row_sum") | ||
| .set_pipe(ir::PipeType::V) | ||
| .f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) { | ||
| return MakeBinaryCodegenPTO("pto.trowsum", op, codegen); | ||
| }); | ||
|
|
||
| REGISTER_BACKEND_OP(Backend910B_PTO, "block.row_max") | ||
| .set_pipe(ir::PipeType::V) | ||
| .f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) { | ||
| return MakeBinaryCodegenPTO("pto.trowmax", op, codegen); | ||
| }); | ||
|
|
||
| REGISTER_BACKEND_OP(Backend910B_PTO, "block.row_min") | ||
| .set_pipe(ir::PipeType::V) | ||
| .f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) { | ||
| return MakeBinaryCodegenPTO("pto.trowmin", op, codegen); | ||
| }); | ||
|
|
||
| REGISTER_BACKEND_OP(Backend910B_PTO, "block.row_expand") | ||
| .set_pipe(ir::PipeType::V) | ||
| .f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) { | ||
| return MakeUnaryCodegenPTO("pto.trowexpand", op, codegen); | ||
| }); | ||
|
|
||
| REGISTER_BACKEND_OP(Backend910B_PTO, "block.col_sum") | ||
| .set_pipe(ir::PipeType::V) | ||
| .f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) { | ||
| return MakeUnaryCodegenPTO("pto.tcolsum", op, codegen); | ||
| }); | ||
|
|
||
| REGISTER_BACKEND_OP(Backend910B_PTO, "block.col_max") | ||
| .set_pipe(ir::PipeType::V) | ||
| .f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) { | ||
| return MakeUnaryCodegenPTO("pto.tcolmax", op, codegen); | ||
| }); | ||
|
|
||
| REGISTER_BACKEND_OP(Backend910B_PTO, "block.col_min") | ||
| .set_pipe(ir::PipeType::V) | ||
| .f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) { | ||
| return MakeUnaryCodegenPTO("pto.tcolmin", op, codegen); | ||
| }); | ||
|
|
||
| REGISTER_BACKEND_OP(Backend910B_PTO, "block.col_expand") | ||
| .set_pipe(ir::PipeType::V) | ||
| .f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) { | ||
| return MakeUnaryCodegenPTO("pto.tcolexpand", op, codegen); | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for IR definitions and operation declarations
rg -n "row_sum|col_sum|row_max|col_max|row_min|col_min|row_expand|col_expand" --type cpp -B 2 -A 2 | head -100Repository: hw-native-sys/pypto
Length of output: 7478
🏁 Script executed:
#!/bin/bash
# Find MakeBinaryCodegenPTO and MakeUnaryCodegenPTO implementations
rg -n "MakeBinaryCodegenPTO|MakeUnaryCodegenPTO" --type cpp -B 3 -A 10Repository: hw-native-sys/pypto
Length of output: 43507
🏁 Script executed:
#!/bin/bash
# Search for ir:: namespace definitions of these operations
fd -e h -e hpp | xargs rg -l "row_sum|col_sum" 2>/dev/null | head -5Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Search for IR definitions in header files
find . -name "*.h" -o -name "*.hpp" | xargs grep -l "row_sum\|col_sum" 2>/dev/null | head -10Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Search for tests or usage of these operations
rg -n "block\.row_sum|block\.col_sum|block\.row_max|block\.col_max" --type cpp --type python -C 2 | head -150Repository: hw-native-sys/pypto
Length of output: 93
🏁 Script executed:
#!/bin/bash
# Look for any documentation or comments about these operations
rg -n "row_sum|col_sum|reduction|trowsum|tcolsum" --type cpp -B 3 -A 3 | grep -E "(//|/\*|\*\/)" -A 2 -B 2 | head -80Repository: hw-native-sys/pypto
Length of output: 4510
🏁 Script executed:
#!/bin/bash
# Read the reduction.cpp file to understand the arity requirements
cat -n src/ir/op/block_ops/reduction.cpp | head -250Repository: hw-native-sys/pypto
Length of output: 10987
🏁 Script executed:
#!/bin/bash
# Search for col reduction type deduction
rg -n "DeduceBlockCol|col_sum|col_max|col_min" src/ir/op/block_ops/reduction.cpp -B 3 -A 10Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Search for col_sum, col_max, col_min in the entire codebase
rg -n "block\.col_sum|block\.col_max|block\.col_min|\"col_sum\"|\"col_max\"|\"col_min\"" --type cpp | head -50Repository: hw-native-sys/pypto
Length of output: 374
🏁 Script executed:
#!/bin/bash
# Search for any IR registration of col_sum, col_max, col_min, col_expand, col_fill
rg -rn "REGISTER_OP.*col_" --type cpp | head -20Repository: hw-native-sys/pypto
Length of output: 250
🏁 Script executed:
#!/bin/bash
# Check if there's another file defining col reduction operations
find . -name "*.cpp" -o -name "*.h" | xargs grep -l "col_sum\|col_max\|col_min" | grep -v backend | head -10Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Check the entire reduction.cpp file for col operations
cat -n src/ir/op/block_ops/reduction.cpp | tail -100Repository: hw-native-sys/pypto
Length of output: 5528
🏁 Script executed:
#!/bin/bash
# Search across all files for REGISTER_OP of col operations
rg "REGISTER_OP.*col" --type cppRepository: hw-native-sys/pypto
Length of output: 338
🏁 Script executed:
#!/bin/bash
# Search the backend file to see if col operations have any explanation
cat -n src/backend/910B_PTO/backend_910b_pto_ops.cpp | sed -n '670,750p'Repository: hw-native-sys/pypto
Length of output: 3903
🏁 Script executed:
#!/bin/bash
# Check if col_sum, col_max, col_min are registered anywhere in IR
rg "REGISTER_OP.*col_sum|REGISTER_OP.*col_max|REGISTER_OP.*col_min" --type cppRepository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Check broadcast.cpp for col_expand registration to understand the pattern
cat -n src/ir/op/block_ops/broadcast.cpp | grep -A 10 "col_expand"Repository: hw-native-sys/pypto
Length of output: 2951
🏁 Script executed:
#!/bin/bash
# Verify if there are IR definitions for col_sum, col_max, col_min anywhere
find . -name "*.cpp" -o -name "*.h" | xargs grep -l "block\.col_sum\|block\.col_max\|block\.col_min" | grep -v backendRepository: hw-native-sys/pypto
Length of output: 45
Arity mismatch: block.col_expand IR definition vs. PTO backend codegen.
The IR definition for block.col_expand in src/ir/op/block_ops/broadcast.cpp (line 182) declares 2 arguments ("target" and "col_tile"), but the PTO backend codegen at line 723 uses MakeUnaryCodegenPTO, which enforces exactly 1 argument. This will cause a CHECK failure at runtime. The row reduction operations (block.row_sum, block.row_max, block.row_min) correctly use MakeBinaryCodegenPTO for their 2-argument signatures defined in the IR.
Additionally, block.col_sum, block.col_max, and block.col_min have no IR definitions—only backend registrations using MakeUnaryCodegenPTO. If these are intended to parallel the row reductions (which take 2 arguments), they need IR definitions with matching 2-argument signatures.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 678 - 724, The
PTO backend registrations for column operations mismatch the IR arity: change
the codegen calls for block.col_expand from MakeUnaryCodegenPTO to
MakeBinaryCodegenPTO to match its IR signature (arguments "target" and
"col_tile"), and verify whether block.col_sum, block.col_max, and block.col_min
should be two-argument ops like their row counterparts—if so, add corresponding
two-argument IR definitions or switch their backend calls from
MakeUnaryCodegenPTO to MakeBinaryCodegenPTO to ensure arity matches (look for
the REGISTER_BACKEND_OP entries and the MakeUnaryCodegenPTO/MakeBinaryCodegenPTO
usages to update).
There was a problem hiding this comment.
Fixed. But there is a mismatch on col_expand between PTO-ISA and block IR.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the current IR definition for col_expand
cat src/ir/op/block_ops/broadcast.cpp | grep -A 20 "col_expand"
Length of output: 2602
🏁 Script executed:
#!/bin/bash
# Check the current backend registration for col_expand
rg -n "block\.col_expand" src/backend/910B_PTO/backend_910b_pto_ops.cpp -A 5
Length of output: 259
🏁 Script executed:
#!/bin/bash
# Check row_expand for comparison
rg -n "block\.row_expand" src/backend/910B_PTO/backend_910b_pto_ops.cpp -A 5
Length of output: 1066
🏁 Script executed:
#!/bin/bash
# Look for any TODOs or comments about col_expand mismatch
rg -n "col_expand|tcolexpand" --type cpp -B 2 -A 2 | grep -E "(TODO|FIXME|NOTE|mismatch|arity)" -B 3 -A 3
Length of output: 0
🏁 Script executed:
#!/bin/bash
# Search for PTO ISA documentation or definitions of tcolexpand
rg -n "tcolexpand" --type cpp -B 3 -A 3
Length of output: 646
🏁 Script executed:
#!/bin/bash
# Check the IR definition for row_expand to compare with col_expand
cat src/ir/op/block_ops/broadcast.cpp | grep -B 2 -A 15 "\"block.row_expand\""
Length of output: 0
🏁 Script executed:
#!/bin/bash
# Look for any examples or tests showing how col_expand and row_expand are used
rg -n "col_expand|row_expand" --type py -C 2 | head -80
Length of output: 6766
🏁 Script executed:
#!/bin/bash
# Check if there are any helper implementations for row/col expand that show which arg is used
rg -n "trowexpand|tcolexpand" --type cpp -C 5 | head -100
Length of output: 3565
🏁 Script executed:
#!/bin/bash
# Search more broadly for row_expand IR definition
rg -n "REGISTER_OP.*row_expand" --type cpp -A 10
Length of output: 3614
🏁 Script executed:
#!/bin/bash
# Check if there are any comments explaining the target argument usage
rg -n "target.*shape|output shape" src/ir/op/block_ops/broadcast.cpp -B 2 -A 2
Length of output: 1198
🏁 Script executed:
#!/bin/bash
# Look at the implementation of GenerateInsOutsClause to understand how args are processed
rg -n "GenerateInsOutsClause" src/backend/910B_PTO/backend_910b_pto_ops.cpp -A 15
Length of output: 5979
@lwDavid, I understand the issue now. The mismatch you're referring to is an architectural difference between the block IR level and the PTO instruction set:
Block IR (block.col_expand): Takes 2 arguments:
target- only used for shape inference at the IR level (determines the output dimensions)col_tile- the actual column data to expand [1, cols]
PTO-ISA (pto.tcolexpand): Takes only 1 operand:
- The column tile to expand; the output shape is determined by the result tile buffer allocated
The current backend code uses MakeUnaryCodegenPTO, which would fail the arity CHECK since the IR passes 2 arguments. If you changed it to MakeBinaryCodegenPTO, the CHECK would pass but it would incorrectly emit both arguments to the PTO instruction via GenerateInsOutsClause, which iterates through all args.
The proper solution is to create a custom codegen helper similar to MakeFullCodegenPTO (lines 153-163) that:
- Accepts 2 arguments to match the IR signature
- Only emits
args_[1](the col_tile) to the PTO instruction - Ignores
args_[0](target) since it was only needed for type deduction
Would you like me to help create this custom helper function for col_expand (and row_expand if it has the same pattern)?
|
/gemini review |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Code Review
This pull request significantly expands the capabilities of the 910B PTO backend by adding codegen support for a wide range of block-level operations, including memory, arithmetic, reduction, and data movement. The changes are well-structured, with new operations logically grouped. I appreciate the improvements to the error messages for arity checks, which will make debugging easier. My review includes one suggestion to enhance the maintainability of the new helper functions.
| CHECK(op->args_.size() == 3) << "Operation:[pto.tstore.fp] requires 3 arguments, but got " | ||
| << op->args_.size(); |
There was a problem hiding this comment.
For better maintainability and consistency, consider using the pto_op_name parameter in the error message instead of hardcoding the operation name.
This would make the helper function slightly more robust if the operation name passed from the registration point changes. This suggestion also applies to the other new helper functions in this file (MakeCmpsCodegenPTO, MakeAssignCodegenPTO, MakeCiCodegenPTO, etc.).
| CHECK(op->args_.size() == 3) << "Operation:[pto.tstore.fp] requires 3 arguments, but got " | |
| << op->args_.size(); | |
| CHECK(op->args_.size() == 3) << "Operation:[" << pto_op_name << "] requires 3 arguments, but got " | |
| << op->args_.size(); |
There was a problem hiding this comment.
♻️ Duplicate comments (3)
src/backend/910B_PTO/backend_910b_pto_ops.cpp (3)
203-225:⚠️ Potential issue | 🔴 Critical
sort32/mrgsortstill emit incomplete MLIR instructions.Line 209 and Line 223 emit only the opcode name, without
ins(...)/outs(...). That produces invalid/incomplete codegen for registered ops.🐛 Suggested fix
static std::string MakeSort32CodegenPTO(const std::string& pto_op_name, const CallPtr& op, codegen::CodegenBase& codegen_base) { auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base); CHECK(op->args_.size() == 1) << "Operation:[pto.tsort32] requires 1 argument, but got " << op->args_.size(); - // std::string src = codegen.GetExprAsCode(op->args_[0]); - // std::string dst = codegen.GetCurrentResultTarget(); - codegen.Emit(pto_op_name); + codegen.Emit(pto_op_name + " " + GenerateInsOutsClause(op, codegen)); return ""; } static std::string MakeMrgSortCodegenPTO(const std::string& pto_op_name, const CallPtr& op, codegen::CodegenBase& codegen_base) { auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base); CHECK(op->args_.size() == 2) << "Operation:[pto.tmrgsort] requires 2 arguments, but got " << op->args_.size(); - // std::string src = codegen.GetExprAsCode(op->args_[0]); - // std::string blockLen = codegen.GetExprAsCode(op->args_[1]); - // std::string dst = codegen.GetCurrentResultTarget(); - codegen.Emit(pto_op_name); + codegen.Emit(pto_op_name + " " + GenerateInsOutsClause(op, codegen)); return ""; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 203 - 225, MakeSort32CodegenPTO and MakeMrgSortCodegenPTO currently emit only the opcode name (via codegen.Emit(pto_op_name)) and produce invalid MLIR; update both functions to construct and emit full instruction strings including ins(...) and outs(...). Use codegen.GetExprAsCode(op->args_[i]) to build the input operands (for sort32 use the single arg, for MrgSort use both src and blockLen), and use codegen.GetCurrentResultTarget() for the output(s); then call codegen.Emit with the assembled string "pto_op_name ins(<inputs>) outs(<outputs>)" so the generated MLIR includes proper ins/outs metadata.
228-234:⚠️ Potential issue | 🟠 Major
block.printemits a placeholder type that is not concrete.Line 233 hardcodes
MxNxdtype, so emitted MLIR is not type-concrete. The type should be derived from the operand/result metadata before emission.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 228 - 234, MakePrintCodegenPTO: Replace the hardcoded placeholder "MxNxdtype" in the emitted string with the concrete tensor type derived from the operand/result metadata for op->args_[0]; use the existing codegen/context helpers (e.g., call into codegen (the dynamic_cast result) to obtain the operand's concrete type string via the appropriate method (similar to GetExprAsCode - e.g., GetTypeAsCode or querying op->args_[0]->type/metadata) and then emit pto_op_name + " ins(" + src + " | !pto.partition_tensor_view<" + concreteType + ">)" instead of the current literal, ensuring the emitted MLIR uses the concrete dtype/shape from the operand metadata.
720-724:⚠️ Potential issue | 🟠 Major
col_expandlikely needs a custom emitter instead of generic binary forwarding.If IR keeps 2 args (
target,col_tile) but PTOtcolexpandconsumes only the data operand,MakeBinaryCodegenPTOwill emit a semantically wrongins(...)list. This should be a dedicated helper that validates 2 args and emits only the ISA-required operand.Use this read-only check to confirm IR-vs-ISA operand expectations:
#!/bin/bash set -euo pipefail echo "=== IR definition for block.col_expand ===" rg -n "block\\.col_expand|col_expand" src/ir/op/block_ops -C 4 echo echo "=== PTO backend registration/emission path ===" rg -n "block\\.col_expand|tcolexpand|MakeBinaryCodegenPTO|MakeUnaryCodegenPTO" \ src/backend/910B_PTO/backend_910b_pto_ops.cpp -C 4 echo echo "=== Any other tcolexpand references (signature hints) ===" rg -n "tcolexpand" --type cpp --type h -C 3Expected confirmation: whether
tcolexpandis unary at ISA level while IR remains binary.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 720 - 724, The current REGISTER_BACKEND_OP for block.col_expand uses MakeBinaryCodegenPTO which forwards two IR operands, but the PTO instruction "pto.tcolexpand" is ISA-unary and should only receive the data operand; replace the MakeBinaryCodegenPTO call with a dedicated emitter (e.g., MakeColExpandCodegenPTO or a validated MakeUnaryCodegenPTO) that (1) checks the ir::CallPtr op has exactly 2 args (target, col_tile) and (2) emits only the single data operand required by pto.tcolexpand, updating the backend registration in REGISTER_BACKEND_OP(Backend910B_PTO, "block.col_expand") to call this new helper instead of MakeBinaryCodegenPTO.
🧹 Nitpick comments (1)
src/backend/910B_PTO/backend_910b_pto_ops.cpp (1)
181-193: Use emitted opcode names in CHECK text for consistency.Line 181 and Line 192 report
pto.assign/pto.ci, but the emitted ops arepto.tassign/pto.tci. This makes runtime diagnostics misleading.🔧 Suggested tweak
- CHECK(op->args_.size() == 2) << "Operation:[pto.assign] requires 2 arguments, but got " << op->args_.size(); + CHECK(op->args_.size() == 2) << "Operation:[pto.tassign] requires 2 arguments, but got " << op->args_.size(); - CHECK(op->args_.size() == 1) << "Operation:[pto.ci] requires 1 argument, but got " << op->args_.size(); + CHECK(op->args_.size() == 1) << "Operation:[pto.tci] requires 1 argument, but got " << op->args_.size();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 181 - 193, The CHECK messages currently hardcode "pto.assign" and "pto.ci" which do not match the actual emitted op names ("pto.tassign"/"pto.tci"); update the CHECK diagnostics to reference the real emitted names by either using the pto_op_name variable (e.g., CHECK(... ) << "Operation: [" << pto_op_name << "] requires X arguments, but got " << ...) in the function that emits using pto_op_name, and similarly change the CHECK in MakeCiCodegenPTO to mention "pto.tci" or use a local pto_op_name for consistency so runtime error messages match the emitted op names.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp`:
- Around line 203-225: MakeSort32CodegenPTO and MakeMrgSortCodegenPTO currently
emit only the opcode name (via codegen.Emit(pto_op_name)) and produce invalid
MLIR; update both functions to construct and emit full instruction strings
including ins(...) and outs(...). Use codegen.GetExprAsCode(op->args_[i]) to
build the input operands (for sort32 use the single arg, for MrgSort use both
src and blockLen), and use codegen.GetCurrentResultTarget() for the output(s);
then call codegen.Emit with the assembled string "pto_op_name ins(<inputs>)
outs(<outputs>)" so the generated MLIR includes proper ins/outs metadata.
- Around line 228-234: MakePrintCodegenPTO: Replace the hardcoded placeholder
"MxNxdtype" in the emitted string with the concrete tensor type derived from the
operand/result metadata for op->args_[0]; use the existing codegen/context
helpers (e.g., call into codegen (the dynamic_cast result) to obtain the
operand's concrete type string via the appropriate method (similar to
GetExprAsCode - e.g., GetTypeAsCode or querying op->args_[0]->type/metadata) and
then emit pto_op_name + " ins(" + src + " | !pto.partition_tensor_view<" +
concreteType + ">)" instead of the current literal, ensuring the emitted MLIR
uses the concrete dtype/shape from the operand metadata.
- Around line 720-724: The current REGISTER_BACKEND_OP for block.col_expand uses
MakeBinaryCodegenPTO which forwards two IR operands, but the PTO instruction
"pto.tcolexpand" is ISA-unary and should only receive the data operand; replace
the MakeBinaryCodegenPTO call with a dedicated emitter (e.g.,
MakeColExpandCodegenPTO or a validated MakeUnaryCodegenPTO) that (1) checks the
ir::CallPtr op has exactly 2 args (target, col_tile) and (2) emits only the
single data operand required by pto.tcolexpand, updating the backend
registration in REGISTER_BACKEND_OP(Backend910B_PTO, "block.col_expand") to call
this new helper instead of MakeBinaryCodegenPTO.
---
Nitpick comments:
In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp`:
- Around line 181-193: The CHECK messages currently hardcode "pto.assign" and
"pto.ci" which do not match the actual emitted op names
("pto.tassign"/"pto.tci"); update the CHECK diagnostics to reference the real
emitted names by either using the pto_op_name variable (e.g., CHECK(... ) <<
"Operation: [" << pto_op_name << "] requires X arguments, but got " << ...) in
the function that emits using pto_op_name, and similarly change the CHECK in
MakeCiCodegenPTO to mention "pto.tci" or use a local pto_op_name for consistency
so runtime error messages match the emitted op names.
|
@coderabbitai help |
ChatThere are 3 ways to chat with CodeRabbit:
CodeRabbit commands
Other keywords and placeholders
CodeRabbit configuration file (
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@coderabbitai pause |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
src/backend/910B_PTO/backend_910b_pto_ops.cpp (3)
228-235:⚠️ Potential issue | 🟠 Major
block.printstill hardcodes placeholder type text.
!pto.partition_tensor_view<MxNxdtype>is a literal placeholder, so emitted MLIR cannot be semantically correct for real tensors. The type needs to be derived from the operand shape/dtype.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 228 - 235, The emitted type in MakePrintCodegenPTO is a hardcoded placeholder; replace the literal "!pto.partition_tensor_view<MxNxdtype>" with the actual partition tensor view type derived from the operand (op->args_[0]) using the codegen helper APIs (e.g. use codegen to query the operand's shape and dtype or a GetType/TypeAsString-like method and format it into the partition_tensor_view<rowsxcolsxdtype> string), then emit pto_op_name with that concrete type; update calls around codegen.GetExprAsCode(op->args_[0]) and codegen.Emit(...) to build the correct type string before emitting.
203-225:⚠️ Potential issue | 🔴 Critical
sort32/mrgsortstill emit malformed stub ops (no ins/outs).Both helpers emit only
pto_op_name, ignoring validated args and result target. This produces non-actionable MLIR and effectively leaves these ops unimplemented.🐛 Minimal safe fix (until multi-output support lands)
static std::string MakeSort32CodegenPTO(const std::string& pto_op_name, const CallPtr& op, codegen::CodegenBase& codegen_base) { auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base); CHECK(op->args_.size() == 1) << "Operation:[pto.tsort32] requires 1 argument, but got " << op->args_.size(); - codegen.Emit(pto_op_name); + codegen.Emit(pto_op_name + " " + GenerateInsOutsClause(op, codegen)); return ""; } static std::string MakeMrgSortCodegenPTO(const std::string& pto_op_name, const CallPtr& op, codegen::CodegenBase& codegen_base) { auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base); CHECK(op->args_.size() == 2) << "Operation:[pto.tmrgsort] requires 2 arguments, but got " << op->args_.size(); - codegen.Emit(pto_op_name); + codegen.Emit(pto_op_name + " " + GenerateInsOutsClause(op, codegen)); return ""; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 203 - 225, MakeSort32CodegenPTO and MakeMrgSortCodegenPTO currently only call codegen.Emit(pto_op_name) which emits ops with no inputs/outputs; instead build the PTO stub using the validated arguments and the current result target: call codegen.GetExprAsCode on op->args_[i] to obtain operand strings, call codegen.GetCurrentResultTarget() for the destination, then invoke codegen.Emit with a properly formatted op string that includes the input(s) and result (e.g. "pto_op_name %src : type -> %dst" or the project's PTO emission convention). Update both functions (MakeSort32CodegenPTO and MakeMrgSortCodegenPTO) to use these helpers and pass operands/results to codegen.Emit so emitted ops include ins/outs even until multi-output support is added.
696-724:⚠️ Potential issue | 🟠 MajorVerify
row/col expandbackend arity against IR + PTO-ISA before merge.Current mapping uses unary for
block.row_expandand binary forblock.col_expand. Ifcol_expandIR includes a shape-onlytargetarg while PTO instruction accepts only data input,MakeBinaryCodegenPTOwill emit an extra operand.#!/bin/bash set -euo pipefail echo "== IR signatures for row/col reduce-expand ops ==" rg -n 'REGISTER_OP\("block\.(row_expand|col_expand|row_sum|col_sum|col_max|col_min)"' src/ir/op/block_ops -A12 -B2 echo echo "== PTO backend mappings ==" rg -n 'REGISTER_BACKEND_OP\(Backend910B_PTO, "block\.(row_expand|col_expand|row_sum|col_sum|col_max|col_min)"' src/backend/910B_PTO/backend_910b_pto_ops.cpp -A6 -B1 echo echo "== PTO op docs/definitions references (operand expectations) ==" rg -n 'trowexpand|tcolexpand|trowsum|tcolsum|tcolmax|tcolmin' --type cpp --type h --type md --type txtExpected verification outcome:
- If IR arity is 2 but ISA arity is 1, add a custom helper that validates 2 IR args and emits only the ISA-required operand.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 696 - 724, The backend registers use MakeUnaryCodegenPTO for "block.row_expand" (producing "pto.trowexpand") but MakeBinaryCodegenPTO for "block.col_expand" (producing "pto.tcolexpand"); verify the IR signatures for block.row_expand and block.col_expand and the PTO-ISA operand arity, and if the IR includes a shape-only target argument while the PTO ISA expects only the data input, replace the direct MakeBinaryCodegenPTO usage (or add a small helper used by REGISTER_BACKEND_OP for "block.col_expand") that validates both IR args but emits only the ISA-required operand (dropping the shape-only target) so emitted operands match the PTO instruction; use the identifiers MakeUnaryCodegenPTO, MakeBinaryCodegenPTO, "block.row_expand", "block.col_expand", "pto.trowexpand", and "pto.tcolexpand" to locate and update the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp`:
- Around line 120-121: Several CHECK() calls hardcode operation labels (e.g.,
"pto.tstore.fp") which can drift; replace these hardcoded strings with the
pto_op_name variable so runtime messages reflect the actual emitted op name.
Locate the CHECK(...) calls that validate op->args_.size() (the one currently
using "Operation:[pto.tstore.fp] requires 3 arguments..." and the similar checks
for tassign/tci/etc.), and change the message to use << pto_op_name << "
requires X arguments, but got " << op->args_.size(); keep the CHECK condition
and op/args_ references unchanged and apply the same replacement for the other
occurrences that currently hardcode op labels.
---
Duplicate comments:
In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp`:
- Around line 228-235: The emitted type in MakePrintCodegenPTO is a hardcoded
placeholder; replace the literal "!pto.partition_tensor_view<MxNxdtype>" with
the actual partition tensor view type derived from the operand (op->args_[0])
using the codegen helper APIs (e.g. use codegen to query the operand's shape and
dtype or a GetType/TypeAsString-like method and format it into the
partition_tensor_view<rowsxcolsxdtype> string), then emit pto_op_name with that
concrete type; update calls around codegen.GetExprAsCode(op->args_[0]) and
codegen.Emit(...) to build the correct type string before emitting.
- Around line 203-225: MakeSort32CodegenPTO and MakeMrgSortCodegenPTO currently
only call codegen.Emit(pto_op_name) which emits ops with no inputs/outputs;
instead build the PTO stub using the validated arguments and the current result
target: call codegen.GetExprAsCode on op->args_[i] to obtain operand strings,
call codegen.GetCurrentResultTarget() for the destination, then invoke
codegen.Emit with a properly formatted op string that includes the input(s) and
result (e.g. "pto_op_name %src : type -> %dst" or the project's PTO emission
convention). Update both functions (MakeSort32CodegenPTO and
MakeMrgSortCodegenPTO) to use these helpers and pass operands/results to
codegen.Emit so emitted ops include ins/outs even until multi-output support is
added.
- Around line 696-724: The backend registers use MakeUnaryCodegenPTO for
"block.row_expand" (producing "pto.trowexpand") but MakeBinaryCodegenPTO for
"block.col_expand" (producing "pto.tcolexpand"); verify the IR signatures for
block.row_expand and block.col_expand and the PTO-ISA operand arity, and if the
IR includes a shape-only target argument while the PTO ISA expects only the data
input, replace the direct MakeBinaryCodegenPTO usage (or add a small helper used
by REGISTER_BACKEND_OP for "block.col_expand") that validates both IR args but
emits only the ISA-required operand (dropping the shape-only target) so emitted
operands match the PTO instruction; use the identifiers MakeUnaryCodegenPTO,
MakeBinaryCodegenPTO, "block.row_expand", "block.col_expand", "pto.trowexpand",
and "pto.tcolexpand" to locate and update the code.
| CHECK(op->args_.size() == 3) << "Operation:[pto.tstore.fp] requires 3 arguments, but got " | ||
| << op->args_.size(); |
There was a problem hiding this comment.
Use pto_op_name in CHECK messages to prevent stale op labels.
A couple of helpers hardcode labels that already drift from emitted op names (pto.assign vs pto.tassign, pto.ci vs pto.tci). This makes runtime failures misleading.
🔧 Proposed fix
- CHECK(op->args_.size() == 3) << "Operation:[pto.tstore.fp] requires 3 arguments, but got "
+ CHECK(op->args_.size() == 3) << "Operation:[" << pto_op_name << "] requires 3 arguments, but got "
<< op->args_.size();
- CHECK(op->args_.size() == 2) << "Operation:[pto.tcmps] requires 2 arguments, but got " << op->args_.size();
+ CHECK(op->args_.size() == 2) << "Operation:[" << pto_op_name << "] requires 2 arguments, but got "
+ << op->args_.size();
- CHECK(op->args_.size() == 2) << "Operation:[pto.assign] requires 2 arguments, but got " << op->args_.size();
+ CHECK(op->args_.size() == 2) << "Operation:[" << pto_op_name << "] requires 2 arguments, but got "
+ << op->args_.size();
- CHECK(op->args_.size() == 1) << "Operation:[pto.ci] requires 1 argument, but got " << op->args_.size();
+ CHECK(op->args_.size() == 1) << "Operation:[" << pto_op_name << "] requires 1 argument, but got "
+ << op->args_.size();Also applies to: 169-169, 181-181, 192-192
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 120 - 121,
Several CHECK() calls hardcode operation labels (e.g., "pto.tstore.fp") which
can drift; replace these hardcoded strings with the pto_op_name variable so
runtime messages reflect the actual emitted op name. Locate the CHECK(...) calls
that validate op->args_.size() (the one currently using
"Operation:[pto.tstore.fp] requires 3 arguments..." and the similar checks for
tassign/tci/etc.), and change the message to use << pto_op_name << " requires X
arguments, but got " << op->args_.size(); keep the CHECK condition and op/args_
references unchanged and apply the same replacement for the other occurrences
that currently hardcode op labels.
✅ Actions performedReviews paused. |
|
@Hzfengsy Request review. |
|
@Hzfengsy Conflicts resolved, request review. |
feat(backend): Add PTO codegen for memory, reduction, layout, and utility ops
Register codegen handlers for previously unimplemented block-level operations
on the 910B PTO backend, including:
Also improve CHECK error messages across all arity helpers to include the
actual argument count, and fix an incorrect arity message in MakeFullCodegenPTO
(was "3 arguments", should be "2 arguments").