[WIP] feat(codegen): orchestration tensor APIs, scalar min/max, and if/view codegen#204
[WIP] feat(codegen): orchestration tensor APIs, scalar min/max, and if/view codegen#204zhusy54 wants to merge 4 commits intohw-native-sys:mainfrom
Conversation
… codegen - Add GetRuntimeDataTypeString for runtime-compatible dtype name mapping - Refactor tensor operations to use shape/dtype-based APIs instead of raw strings - Add codegen support for IfOp and ViewOp in orchestration layer - Extract common codegen helpers (EmitTensorDecl, EmitFunctionCall, etc.) - Add scalar min/max operations (pl.min, pl.max) for orchestration - Update tests to cover new codegen patterns and scalar ops
Summary of ChangesHello @zhusy54, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the code generation capabilities by modernizing tensor API usage, introducing support for fundamental scalar operations, and enabling control flow and tensor view operations within the generated code. The changes streamline the orchestration codegen, making it more robust and expressive by moving to shape/dtype-based tensor handling and integrating new IR constructs like Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant and valuable refactoring to the orchestration codegen, switching to shape/dtype-based tensor APIs and simplifying scope management. The addition of scalar min/max operations and codegen for IfOp/ViewOp are also great enhancements. My review focuses on improving maintainability by reducing code duplication, fixing a potential runtime error in the parser, and ensuring the Python-level API for min/max is robust.
| ir_func = getattr(ir, ir_func_name) | ||
| return ir_func(lhs, rhs, call_span) | ||
|
|
||
| if op_name in self._SCALAR_UNARY_OPS: |
There was a problem hiding this comment.
| inline std::string DataTypeToString(const DataType& dtype) { | ||
| if (dtype == DataType::BOOL) return "BOOL"; | ||
| if (dtype == DataType::INT4) return "INT4"; | ||
| if (dtype == DataType::INT8) return "INT8"; | ||
| if (dtype == DataType::INT16) return "INT16"; | ||
| if (dtype == DataType::INT32) return "INT32"; | ||
| if (dtype == DataType::INT64) return "INT64"; | ||
| if (dtype == DataType::UINT4) return "UINT4"; | ||
| if (dtype == DataType::UINT8) return "UINT8"; | ||
| if (dtype == DataType::UINT16) return "UINT16"; | ||
| if (dtype == DataType::UINT32) return "UINT32"; | ||
| if (dtype == DataType::UINT64) return "UINT64"; | ||
| if (dtype == DataType::FP4) return "FP4"; | ||
| if (dtype == DataType::FP8E4M3FN) return "FP8E4M3FN"; | ||
| if (dtype == DataType::FP8E5M2) return "FP8E5M2"; | ||
| if (dtype == DataType::FP16) return "FP16"; | ||
| if (dtype == DataType::FP32) return "FP32"; | ||
| if (dtype == DataType::BF16) return "BFLOAT16"; | ||
| if (dtype == DataType::HF4) return "HF4"; | ||
| if (dtype == DataType::HF8) return "HF8"; | ||
| return "UnknownType"; | ||
| } |
There was a problem hiding this comment.
The DataTypeToString function is implemented as a long chain of if statements. While correct, this can be slightly less efficient and harder to maintain than a switch statement on the underlying type code. Using dtype.Code() with a switch would be a more idiomatic and performant C++ pattern for this kind of dispatch.
inline std::string DataTypeToString(const DataType& dtype) {
switch (dtype.Code()) {
case DataType::kBoolCode: return "BOOL";
case DataType::kInt4Code: return "INT4";
case DataType::kInt8Code: return "INT8";
case DataType::kInt16Code: return "INT16";
case DataType::kInt32Code: return "INT32";
case DataType::kInt64Code: return "INT64";
case DataType::kUInt4Code: return "UINT4";
case DataType::kUInt8Code: return "UINT8";
case DataType::kUInt16Code: return "UINT16";
case DataType::kUInt32Code: return "UINT32";
case DataType::kUInt64Code: return "UINT64";
case DataType::kFp4Code: return "FP4";
case DataType::kFp8e4m3fnCode: return "FP8E4M3FN";
case DataType::kFp8e5m2Code: return "FP8E5M2";
case DataType::kFp16Code: return "FP16";
case DataType::kFp32Code: return "FP32";
case DataType::kBf16Code: return "BFLOAT16";
case DataType::kHf4Code: return "HF4";
case DataType::kHf8Code: return "HF8";
default: return "UnknownType";
}
}References
- When mapping enums to strings, prefer using an explicit map (like a switch statement) over a long chain of if-else if statements for robustness and future extensibility.
| def max(tile: Tile | Scalar, axis: int | Scalar = 0, keepdim: bool = False) -> Tile | Scalar: | ||
| """Max reduction along specified axis, or scalar max of two values. | ||
|
|
||
| Args: | ||
| tile: Input tile | ||
| axis: Reduction axis (0 for rows, 1 for columns, -1 for last) | ||
| keepdim: Whether to keep the reduced dimension as 1 | ||
| tile: Input tile or first scalar operand | ||
| axis: Reduction axis (for tiles) or second scalar operand | ||
| keepdim: Whether to keep the reduced dimension as 1 (tiles only) | ||
|
|
||
| Returns: | ||
| Tile wrapping the max operation | ||
| Tile or Scalar wrapping the max operation | ||
| """ | ||
| call_expr = _ir_ops.max(tile.unwrap(), axis, keepdim) | ||
| call_expr = _ir_ops.max(tile.unwrap(), axis, keepdim) # type: ignore[reportArgumentType] | ||
| return Tile(expr=call_expr) | ||
|
|
||
|
|
||
| def min(tile: Tile, axis: int, keepdim: bool = False) -> Tile: | ||
| """Min reduction along specified axis. | ||
| @overload | ||
| def min(tile: Tile, axis: int, keepdim: bool = False) -> Tile: ... | ||
|
|
||
|
|
||
| @overload | ||
| def min(tile: Scalar, axis: Scalar | int, keepdim: bool = False) -> Scalar: ... | ||
|
|
||
|
|
||
| def min(tile: Tile | Scalar, axis: int | Scalar = 0, keepdim: bool = False) -> Tile | Scalar: | ||
| """Min reduction along specified axis, or scalar min of two values. | ||
|
|
||
| Args: | ||
| tile: Input tile | ||
| axis: Reduction axis (0 for rows, 1 for columns, -1 for last) | ||
| keepdim: Whether to keep the reduced dimension as 1 | ||
| tile: Input tile or first scalar operand | ||
| axis: Reduction axis (for tiles) or second scalar operand | ||
| keepdim: Whether to keep the reduced dimension as 1 (tiles only) | ||
|
|
||
| Returns: | ||
| Tile wrapping the min operation | ||
| Tile or Scalar wrapping the min operation | ||
| """ | ||
| call_expr = _ir_ops.min(tile.unwrap(), axis, keepdim) | ||
| call_expr = _ir_ops.min(tile.unwrap(), axis, keepdim) # type: ignore[reportArgumentType] | ||
| return Tile(expr=call_expr) |
There was a problem hiding this comment.
The implementations for max and min are overloaded to support both Tile and Scalar types, which is great for the DSL's usability. However, the current implementation only handles the Tile reduction case by calling _ir_ops.max (the block operation). If these functions are called directly with Scalar arguments (i.e., not inside a @pl.function that gets parsed), they will fail at runtime.
While the parser correctly dispatches to the scalar IR ops, making the Python implementation robust for direct calls would improve consistency and prevent unexpected errors for users who might use these functions outside a parsed context. A runtime type check on the tile argument could be used to dispatch to the correct IR operation (ir.max_ for scalars).
| raise InvalidOperationError( | ||
| f"Operation '{op_name}' is not supported for scalar arguments", | ||
| span=call_span, | ||
| hint="Supported scalar ops: min, max", |
There was a problem hiding this comment.
The error hint for unsupported scalar operations is hardcoded to min, max. Since the logic also supports unary operations via _SCALAR_UNARY_OPS, the hint should be updated to dynamically include all supported scalar operations. This will provide more accurate feedback to the user.
| hint="Supported scalar ops: min, max", | |
| hint=f"Supported scalar ops: {', '.join(list(self._SCALAR_BINARY_OPS.keys()) + list(getattr(self, '_SCALAR_UNARY_OPS', {}).keys()))}", |
| std::string CodegenBase::GetRuntimeDataTypeString(const DataType& dtype) const { | ||
| if (dtype == DataType::FP16) return "DataType::FLOAT16"; | ||
| if (dtype == DataType::FP32) return "DataType::FLOAT32"; | ||
| if (dtype == DataType::INT32) return "DataType::INT32"; | ||
| if (dtype == DataType::INT16) return "DataType::INT16"; | ||
| if (dtype == DataType::INT8) return "DataType::INT8"; | ||
| if (dtype == DataType::UINT8) return "DataType::UINT8"; | ||
| if (dtype == DataType::BF16) return "DataType::BFLOAT16"; | ||
| if (dtype == DataType::INT64) return "DataType::INT64"; | ||
| if (dtype == DataType::UINT64) return "DataType::UINT64"; | ||
| return "DataType::UNKNOWN"; | ||
| } |
There was a problem hiding this comment.
The implementation of GetRuntimeDataTypeString duplicates logic from the new DataTypeToString function and is incomplete, as it doesn't cover all data types. This could lead to maintenance issues and bugs when new types are added.
To improve maintainability and correctness, this function should be refactored to leverage DataTypeToString for the general case, while only handling the special mappings where the runtime enum name differs from the canonical name (e.g., FP16 -> FLOAT16).
std::string CodegenBase::GetRuntimeDataTypeString(const DataType& dtype) const {
// Handle special cases where runtime name differs from canonical enum name
if (dtype == DataType::FP16) return "DataType::FLOAT16";
if (dtype == DataType::FP32) return "DataType::FLOAT32";
std::string s = DataTypeToString(dtype);
if (s == "UnknownType") {
return "DataType::UNKNOWN";
}
return "DataType::" + s;
}References
- To improve maintainability and correctness in enum-to-string mapping, leverage existing explicit mapping functions and handle special cases, rather than duplicating logic, for robustness and future extensibility.
| auto saved = current_return_var_names_; | ||
| current_return_var_names_.clear(); | ||
| for (const auto& rv : if_stmt->return_vars_) { | ||
| current_return_var_names_.push_back(rv->name_); | ||
| } | ||
| VisitStmt(if_stmt->then_body_); | ||
| current_return_var_names_ = saved; | ||
|
|
||
| indent_ -= 4; | ||
| code_ << Indent() << "}\n"; | ||
| indent_ -= 4; | ||
|
|
||
| if (if_stmt->else_body_.has_value()) { | ||
| code_ << Indent() << "} else {\n"; | ||
| indent_ += 4; | ||
| code_ << Indent() << "PTO2_SCOPE(rt) {\n"; | ||
| indent_ += 4; | ||
|
|
||
| auto saved2 = current_return_var_names_; | ||
| current_return_var_names_.clear(); | ||
| for (const auto& rv : if_stmt->return_vars_) { | ||
| current_return_var_names_.push_back(rv->name_); | ||
| } | ||
| VisitStmt(*if_stmt->else_body_); | ||
| current_return_var_names_ = saved2; |
There was a problem hiding this comment.
The logic for setting up current_return_var_names_ before visiting the then and else bodies is duplicated. This code block could be extracted into a helper function or, for a more robust solution, a RAII class could be used to manage the saving and restoring of current_return_var_names_. This would make the code cleaner and less prone to errors if the logic needs to be updated in the future.
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds min/max scalar operators to the IR layer, enhances parser to dispatch scalar operations, improves code generation infrastructure with virtual methods for naming and type handling, refactors orchestration code generation to track SSA base names, and introduces a paged attention example demonstrating orchestration workflow via PyPTO DSL. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 17
🧹 Nitpick comments (1)
examples/ir_parser/paged_attention_example.py (1)
204-214: Inconsistent# type: ignoresuppression across kernel call sites.Line 210 explicitly suppresses type errors with
# type: ignore[reportArgumentType], but lines 204 and 214 (which callkernel_qk_matmulandkernel_pv_matmulwith mismatched shapes/types just the same) have no such annotation. Add matching suppression comments for consistency.♻️ Proposed fix
- sij = self.kernel_qk_matmul(qi, kj, sij) + sij = self.kernel_qk_matmul(qi, kj, sij) # type: ignore[reportArgumentType]- oi_tmp = self.kernel_pv_matmul(pij, vj, oi_tmp) + oi_tmp = self.kernel_pv_matmul(pij, vj, oi_tmp) # type: ignore[reportArgumentType]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ir_parser/paged_attention_example.py` around lines 204 - 214, The calls to the low-level kernels are inconsistent in type-ignore suppression: kernel_softmax_prepare already has "# type: ignore[reportArgumentType]" but kernel_qk_matmul and kernel_pv_matmul do not, causing linter/type-check noise; add the same "# type: ignore[reportArgumentType]" suppression to the kernel_qk_matmul(...) and kernel_pv_matmul(...) call sites so all three kernel invocations (kernel_qk_matmul, kernel_softmax_prepare, kernel_pv_matmul) consistently suppress the reported argument-type errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/ir_parser/paged_attention_example.py`:
- Around line 105-107: The kernel function signature includes unused parameters
mi, li, and oi_tmp which trigger ARG002; update the kernel (the function that
declares mi, li, oi_tmp in its signature) to either reference them with a short
explanatory inline comment or add a linter suppression (e.g., append “# noqa:
ARG002” to the parameter list or to the function definition line) to indicate
these are intentionally unused in this simplified example and silence the static
analyzer.
- Around line 112-113: The kernel signature for the paging kernel declares
parameters is_first and is_last as pl.Scalar[pl.INT64] but the orchestration
sets/passes them as pl.Scalar[pl.UINT64]; change the kernel parameter types to
pl.Scalar[pl.UINT64] to match the orchestration (or vice versa if you prefer
signed everywhere) so the types are consistent—update the kernel function
signature where is_first and is_last are declared to use pl.UINT64 and ensure
any related annotations/reference to those symbols (is_first, is_last, kernel
signature) reflect the same UINT64 type.
- Line 167: The variable block_num is assigned via pl.tensor.read(config, [5])
in paged_attention_example.py but never used (Ruff F841); remove the unused
assignment or replace its usage: either delete the line "block_num:
pl.Scalar[pl.UINT64] = pl.tensor.read(config, [5])" if it's vestigial, or use
the value where intended (e.g., pass block_num to the function that needs the
block index or incorporate it into the paging logic). Locate the assignment by
the symbol block_num and the call pl.tensor.read and either remove it or wire it
into the downstream code that expects the block number.
- Around line 132-144: The function kernel_init_inplace has a shape mismatch:
parameter oi is annotated as pl.Tensor[[16, 16], pl.FP32] but the function
returns oi as the first element with return type pl.Tensor[[16, 128], pl.FP32]
and the call site passes an oi of shape [q_tile, head_dim] (16,128). Fix this by
updating the oi parameter annotation in kernel_init_inplace to pl.Tensor[[16,
128], pl.FP32] so it matches the declared return type and the call site; verify
the other parameters (li, mi) and the return tuple remain unchanged.
- Around line 87-99: The local variable annotation for out in kernel_pv_matmul
is incorrect: change out's declared shape from pl.Tensor[[16, 128], pl.FP32] to
pl.Tensor[[16, 16], pl.FP32] so it matches the output parameter and the function
return type; ensure the pl.l0c_store call remains the same but that out's
annotation and any uses reflect the [16, 16] shape.
In `@include/pypto/core/dtype.h`:
- Around line 336-369: DataTypeToString currently maps DataType::BF16 to
"BFLOAT16", producing invalid symbols; change the BF16 branch in
DataTypeToString to return "BF16" instead of "BFLOAT16" and adjust the comment
examples that show how callers compose names (the Python printer and C++ codegen
examples) so they reflect the canonical enum-style "BF16" suffix; keep all other
mappings unchanged and ensure DataTypeToString still returns "UnknownType" for
unknown values.
In `@python/pypto/language/op/block_ops.py`:
- Around line 755-775: The implementations of max and min always call
_ir_ops.max/_ir_ops.min and return a Tile even when overloads expect Scalar
results; update both functions (max and min) to dispatch based on input types:
if tile is a Scalar, call ir.max_ / ir.min_ from pypto.pypto_core.ir (import
them) with the two scalar operands and return a Scalar; if tile is a Tile,
validate that axis is an int (raise TypeError if not), call
_ir_ops.max/_ir_ops.min with tile.unwrap(), axis, keepdim and return a Tile;
ensure imports for ir.max_/ir.min_ are added and no Tile is created for scalar
paths.
In `@python/pypto/language/parser/ast_parser.py`:
- Around line 1518-1551: In _parse_scalar_op, enforce exact arity and reject
keywords: for binary ops (self._SCALAR_BINARY_OPS) require exactly 2 positional
args and no keywords on the ast.Call (check len(call.args) == 2 and not
call.keywords), and for unary ops (self._SCALAR_UNARY_OPS) require exactly 1
positional arg and no keywords (len(call.args) == 1 and not call.keywords); when
the checks fail raise InvalidOperationError with a clear message and
span=call_span (same pattern used currently), then continue to parse via
self.parse_expression and call the IR constructor retrieved via getattr(ir,
ir_func_name).
In `@src/backend/910B_CCE/backend_910b_cce_ops.cpp`:
- Around line 131-133: The access to offsets_tuple->elements_[0] is unguarded
and can crash for zero-length tuples; add the same size guard used for
col_offset before reading elements_[0] in all three functions
(MakeBlockStoreCodegenCCE, MakeBlockL0CStoreCodegenCCE and the third identical
block), e.g., CHECK(offsets_tuple->elements_.size() > 0) (or equivalent) and
only call codegen.GetExprAsCode(offsets_tuple->elements_[0]) after that check so
both row_offset and col_offset are protected for 0D/1D tuples.
In `@src/codegen/cce/cce_codegen.cpp`:
- Around line 452-453: The generated loop uses an unsigned uint64_t for the loop
index (see emitter_.EmitLine call that constructs the for with loop_var_name,
start, stop, step), which will wrap for negative/descending ranges; change the
emitted loop index type to a signed type (e.g., int64_t) and ensure comparisons
and increments use the same signed type and expressions (loop_var_name, start,
stop, step) so negative values behave correctly; alternatively, if you choose to
keep unsigned, add a precondition/assertion or runtime check before code
generation that start/stop/step are non-negative and refuse generation
otherwise—update any places that assume unsigned loop semantics to use the
signed index or the new precondition.
In `@src/codegen/orchestration/orchestration_codegen.cpp`:
- Around line 44-73: GetSSABaseName is over-eagerly treating any trailing
"_<digits>" as SSA and will collapse real user names like "foo_1"; change the
heuristic to require an explicit SSA marker (e.g., "_ssa_" / "_iter_ssa_")
instead of plain underscores before digits: update GetSSABaseName to check for
"_iter_ssa_" and "_ssa_" (and only strip those markers plus trailing digits),
and update the other analogous spots called out in the review (the code blocks
at ~357-360, 491-495, 714-720, 879-883) so SSA generation/consumption uses the
new reserved marker consistently rather than stripping any "_<digits>" suffix.
Ensure callers that emit SSA names are changed to emit the reserved "_ssa_" form
so original user identifiers ending in "_<digits>" are preserved.
- Around line 420-435: The emitted shape array uses a zero-length declaration
when ndim==0, which is invalid C++; update GenerateMakeTensorExternal and the
inline intermediate tensor creation that emits "uint64_t <name>_shapes[ndim]" to
allocate at least one element: compute a shape storage length like shape_len =
std::max<size_t>(1, ndim) and declare "uint64_t <var_name>_shapes[shape_len]"
(or equivalent), fill only the first ndim entries (no-op when ndim==0), but
continue to pass the actual ndim to make_tensor_external (and any runtime call)
so the runtime still sees a 0-rank tensor; adjust uses of ndim in the
initializer/loop accordingly and reference GenerateMakeTensorExternal, the
generated "<var_name>_shapes" array, and the make_tensor_external call to locate
changes.
- Around line 671-675: The current dedup branch that returns when (op_name ==
"tensor.create" && declared_vars_.count(result_var)) silently drops subsequent
tensor.create allocations after SSA collapse; update the logic in
orchestration_codegen.cpp to handle repeated tensor.create for the same
result_var by either (A) emitting an assignment statement (e.g., result_var =
make_tensor(...)) instead of returning when declared_vars_ contains result_var,
or (B) enforce/transform to a unique allocation name before codegen; locate the
branch checking op_name == "tensor.create", declared_vars_, and result_var and
replace the early return with code that generates an assignment to result_var
(or performs a unique rename) so each create results in a proper allocation or
re-assignment.
- Around line 732-739: The scalar-arg path in the loop over call->args_
(TryGetVarName, As<ScalarType>, make_scalar_param) emits
make_scalar_param(var_name) unconditionally, which skips dtype-aware packing for
float scalars; update the branch to inspect the ScalarType's dtype and, for
float (and any other types that require special packing), emit
make_scalar_param(float_to_u64(var_name)) (or the appropriate packing function
for that dtype) while preserving the existing behavior for integer/bool scalars;
adjust the logic around As<ScalarType>(arg->GetType()) and the params push to
choose the packed expression based on scalar->dtype() so scalar variables are
packed identically to scalar constants.
In `@src/codegen/tensor_op_codegen.cpp`:
- Around line 99-125: The generated index code must handle rank-0 (empty
indices) and validate indices vs shape length: before building idx_expr check if
indices.empty() and set idx_expr to "0" (or generate a direct scalar load) to
avoid emitting an empty assignment; also check indices.size() == shape.size()
(or emit a runtime/compile-time assertion or throw std::runtime_error) and fail
fast if they differ to avoid silently wrong linear index calculation. Update the
block that builds idx_oss / idx_expr (references: indices, shape,
codegen.GenerateExprString, idx_expr, result_var, ptr_expr, cpp_type) to perform
these validations and then proceed with the existing simple/complex-index
emission logic.
- Around line 55-70: The tensor.create code emits a zero-length C array and
tensor.read emits an empty index for rank-0 tensors; change the shapes array
allocation to use at least one element (e.g., size = std::max<size_t>(1, ndim))
and when ndim==0 emit a single dummy dimension value of 1 in the initializer for
result_var_shapes, and in tensor.read ensure the computed index (idx_var) is "0"
for ndim==0 instead of an empty expression; update the logic around
result_var_shapes, the make_tensor call, and the index computation (referencing
result_var, result_var_shapes, CalculateTensorSizeExpr and the idx_var/index
expression generation) to handle ndim==0 accordingly.
In `@tests/ut/codegen/test_orchestration_codegen.py`:
- Around line 930-935: The parameter `flag` in function kernel_process is unused
and triggers ARG002; rename it to `_flag` (or `_flag: pl.Scalar[pl.INT64]`) or
annotate/ignore it with a noqa to mark it intentional. Locate the kernel_process
definition and update the parameter name or add the noqa inline so the linter no
longer reports ARG002 while preserving the function signature semantics.
---
Nitpick comments:
In `@examples/ir_parser/paged_attention_example.py`:
- Around line 204-214: The calls to the low-level kernels are inconsistent in
type-ignore suppression: kernel_softmax_prepare already has "# type:
ignore[reportArgumentType]" but kernel_qk_matmul and kernel_pv_matmul do not,
causing linter/type-check noise; add the same "# type:
ignore[reportArgumentType]" suppression to the kernel_qk_matmul(...) and
kernel_pv_matmul(...) call sites so all three kernel invocations
(kernel_qk_matmul, kernel_softmax_prepare, kernel_pv_matmul) consistently
suppress the reported argument-type errors.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
examples/ir_parser/paged_attention_example.pyinclude/pypto/codegen/codegen_base.hinclude/pypto/core/dtype.hinclude/pypto/ir/scalar_expr.hpython/bindings/modules/ir.cpppython/pypto/language/op/block_ops.pypython/pypto/language/parser/ast_parser.pypython/pypto/pypto_core/ir.pyisrc/backend/910B_CCE/backend_910b_cce_ops.cppsrc/codegen/cce/cce_codegen.cppsrc/codegen/codegen_base.cppsrc/codegen/orchestration/orchestration_codegen.cppsrc/codegen/tensor_op_codegen.cppsrc/ir/transforms/python_printer.cpptests/ut/codegen/test_cce_codegen.pytests/ut/codegen/test_orchestration_codegen.pytests/ut/ir/expressions/test_scalar_ops.pytests/ut/language/parser/test_scalar_dispatch.py
| @pl.function(type=pl.FunctionType.InCore) | ||
| def kernel_pv_matmul( | ||
| self, | ||
| pij: pl.Tensor[[16, 128], pl.BF16], | ||
| vj: pl.Tensor[[128, 128], pl.BF16], | ||
| output: pl.Tensor[[16, 16], pl.FP32], | ||
| ) -> pl.Tensor[[16, 16], pl.FP32]: | ||
| """PV matmul: oi_tmp = pij @ vj (CUBE).""" | ||
| p_tile: pl.Tile[[16, 16], pl.BF16] = pl.load(pij, [0, 0], [16, 16]) | ||
| v_tile: pl.Tile[[16, 16], pl.BF16] = pl.load(vj, [0, 0], [16, 16]) | ||
| result: pl.Tile[[16, 16], pl.FP32] = pl.matmul(p_tile, v_tile) | ||
| out: pl.Tensor[[16, 128], pl.FP32] = pl.l0c_store(result, [0, 0], [16, 16], output) | ||
| return out |
There was a problem hiding this comment.
out local annotation [16, 128] contradicts the output parameter and return type [16, 16].
Line 98 annotates out as pl.Tensor[[16, 128], pl.FP32], but both the output parameter (line 92) and the return type (line 93) are [16, 16]. The annotation should be [16, 16] to be consistent.
🐛 Proposed fix
- out: pl.Tensor[[16, 128], pl.FP32] = pl.l0c_store(result, [0, 0], [16, 16], output)
+ out: pl.Tensor[[16, 16], pl.FP32] = pl.l0c_store(result, [0, 0], [16, 16], output)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pl.function(type=pl.FunctionType.InCore) | |
| def kernel_pv_matmul( | |
| self, | |
| pij: pl.Tensor[[16, 128], pl.BF16], | |
| vj: pl.Tensor[[128, 128], pl.BF16], | |
| output: pl.Tensor[[16, 16], pl.FP32], | |
| ) -> pl.Tensor[[16, 16], pl.FP32]: | |
| """PV matmul: oi_tmp = pij @ vj (CUBE).""" | |
| p_tile: pl.Tile[[16, 16], pl.BF16] = pl.load(pij, [0, 0], [16, 16]) | |
| v_tile: pl.Tile[[16, 16], pl.BF16] = pl.load(vj, [0, 0], [16, 16]) | |
| result: pl.Tile[[16, 16], pl.FP32] = pl.matmul(p_tile, v_tile) | |
| out: pl.Tensor[[16, 128], pl.FP32] = pl.l0c_store(result, [0, 0], [16, 16], output) | |
| return out | |
| `@pl.function`(type=pl.FunctionType.InCore) | |
| def kernel_pv_matmul( | |
| self, | |
| pij: pl.Tensor[[16, 128], pl.BF16], | |
| vj: pl.Tensor[[128, 128], pl.BF16], | |
| output: pl.Tensor[[16, 16], pl.FP32], | |
| ) -> pl.Tensor[[16, 16], pl.FP32]: | |
| """PV matmul: oi_tmp = pij @ vj (CUBE).""" | |
| p_tile: pl.Tile[[16, 16], pl.BF16] = pl.load(pij, [0, 0], [16, 16]) | |
| v_tile: pl.Tile[[16, 16], pl.BF16] = pl.load(vj, [0, 0], [16, 16]) | |
| result: pl.Tile[[16, 16], pl.FP32] = pl.matmul(p_tile, v_tile) | |
| out: pl.Tensor[[16, 16], pl.FP32] = pl.l0c_store(result, [0, 0], [16, 16], output) | |
| return out |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/ir_parser/paged_attention_example.py` around lines 87 - 99, The
local variable annotation for out in kernel_pv_matmul is incorrect: change out's
declared shape from pl.Tensor[[16, 128], pl.FP32] to pl.Tensor[[16, 16],
pl.FP32] so it matches the output parameter and the function return type; ensure
the pl.l0c_store call remains the same but that out's annotation and any uses
reflect the [16, 16] shape.
| mi: pl.Tensor[[16], pl.FP32], | ||
| li: pl.Tensor[[16], pl.FP32], | ||
| oi_tmp: pl.Tensor[[16, 128], pl.FP32], |
There was a problem hiding this comment.
Unused kernel parameters mi, li, oi_tmp flagged by static analysis.
These parameters are accepted by the kernel but never referenced in the body (Ruff ARG002). Since the docstring notes that this is a simplified example body, consider adding a brief inline comment or suppression (e.g., # noqa: ARG002) to signal intent and silence the linter.
🧰 Tools
🪛 Ruff (0.15.2)
[warning] 105-105: Unused method argument: mi
(ARG002)
[warning] 106-106: Unused method argument: li
(ARG002)
[warning] 107-107: Unused method argument: oi_tmp
(ARG002)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/ir_parser/paged_attention_example.py` around lines 105 - 107, The
kernel function signature includes unused parameters mi, li, and oi_tmp which
trigger ARG002; update the kernel (the function that declares mi, li, oi_tmp in
its signature) to either reference them with a short explanatory inline comment
or add a linter suppression (e.g., append “# noqa: ARG002” to the parameter list
or to the function definition line) to indicate these are intentionally unused
in this simplified example and silence the static analyzer.
| is_first: pl.Scalar[pl.INT64], | ||
| is_last: pl.Scalar[pl.INT64], |
There was a problem hiding this comment.
is_first/is_last scalar type mismatches between kernel signature (INT64) and orchestration usage (UINT64).
The kernel declares is_first and is_last as pl.Scalar[pl.INT64] (lines 112–113), but the orchestration assigns and passes them as pl.Scalar[pl.UINT64] (lines 218 and 222). Pick one type consistently; since pl.yield_(1) naturally produces an unsigned value here, prefer pl.UINT64 in the kernel signature.
🐛 Proposed fix
- is_first: pl.Scalar[pl.INT64],
- is_last: pl.Scalar[pl.INT64],
+ is_first: pl.Scalar[pl.UINT64],
+ is_last: pl.Scalar[pl.UINT64],📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| is_first: pl.Scalar[pl.INT64], | |
| is_last: pl.Scalar[pl.INT64], | |
| is_first: pl.Scalar[pl.UINT64], | |
| is_last: pl.Scalar[pl.UINT64], |
🧰 Tools
🪛 Ruff (0.15.2)
[warning] 112-112: Unused method argument: is_first
(ARG002)
[warning] 113-113: Unused method argument: is_last
(ARG002)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/ir_parser/paged_attention_example.py` around lines 112 - 113, The
kernel signature for the paging kernel declares parameters is_first and is_last
as pl.Scalar[pl.INT64] but the orchestration sets/passes them as
pl.Scalar[pl.UINT64]; change the kernel parameter types to pl.Scalar[pl.UINT64]
to match the orchestration (or vice versa if you prefer signed everywhere) so
the types are consistent—update the kernel function signature where is_first and
is_last are declared to use pl.UINT64 and ensure any related
annotations/reference to those symbols (is_first, is_last, kernel signature)
reflect the same UINT64 type.
| @pl.function(type=pl.FunctionType.InCore) | ||
| def kernel_init_inplace( | ||
| self, | ||
| oi: pl.Tensor[[16, 16], pl.FP32], | ||
| li: pl.Tensor[[16], pl.FP32], | ||
| mi: pl.Tensor[[16], pl.FP32], | ||
| ) -> tuple[ | ||
| pl.Tensor[[16, 128], pl.FP32], | ||
| pl.Tensor[[16], pl.FP32], | ||
| pl.Tensor[[16], pl.FP32], | ||
| ]: | ||
| """Initialize inplace accumulators to zero (VECTOR).""" | ||
| return oi, li, mi |
There was a problem hiding this comment.
oi parameter shape conflicts with the return type annotation and call site.
oi is declared [16, 16] (line 135) but the first return type is [16, 128] (line 139). Since the body directly returns oi, these must match. The call site (line 185) also passes an oi created as [q_tile, head_dim] (i.e., [16, 128]), confirming the parameter annotation should be [16, 128].
🐛 Proposed fix
- oi: pl.Tensor[[16, 16], pl.FP32],
+ oi: pl.Tensor[[16, 128], pl.FP32],📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pl.function(type=pl.FunctionType.InCore) | |
| def kernel_init_inplace( | |
| self, | |
| oi: pl.Tensor[[16, 16], pl.FP32], | |
| li: pl.Tensor[[16], pl.FP32], | |
| mi: pl.Tensor[[16], pl.FP32], | |
| ) -> tuple[ | |
| pl.Tensor[[16, 128], pl.FP32], | |
| pl.Tensor[[16], pl.FP32], | |
| pl.Tensor[[16], pl.FP32], | |
| ]: | |
| """Initialize inplace accumulators to zero (VECTOR).""" | |
| return oi, li, mi | |
| `@pl.function`(type=pl.FunctionType.InCore) | |
| def kernel_init_inplace( | |
| self, | |
| oi: pl.Tensor[[16, 128], pl.FP32], | |
| li: pl.Tensor[[16], pl.FP32], | |
| mi: pl.Tensor[[16], pl.FP32], | |
| ) -> tuple[ | |
| pl.Tensor[[16, 128], pl.FP32], | |
| pl.Tensor[[16], pl.FP32], | |
| pl.Tensor[[16], pl.FP32], | |
| ]: | |
| """Initialize inplace accumulators to zero (VECTOR).""" | |
| return oi, li, mi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/ir_parser/paged_attention_example.py` around lines 132 - 144, The
function kernel_init_inplace has a shape mismatch: parameter oi is annotated as
pl.Tensor[[16, 16], pl.FP32] but the function returns oi as the first element
with return type pl.Tensor[[16, 128], pl.FP32] and the call site passes an oi of
shape [q_tile, head_dim] (16,128). Fix this by updating the oi parameter
annotation in kernel_init_inplace to pl.Tensor[[16, 128], pl.FP32] so it matches
the declared return type and the call site; verify the other parameters (li, mi)
and the return tuple remain unchanged.
| num_heads: pl.Scalar[pl.UINT64] = pl.tensor.read(config, [1]) | ||
| head_dim: pl.Scalar[pl.UINT64] = pl.tensor.read(config, [3]) | ||
| block_size: pl.Scalar[pl.UINT64] = pl.tensor.read(config, [4]) | ||
| block_num: pl.Scalar[pl.UINT64] = pl.tensor.read(config, [5]) |
There was a problem hiding this comment.
block_num is assigned but never used — Ruff F841.
🐛 Proposed fix
- block_num: pl.Scalar[pl.UINT64] = pl.tensor.read(config, [5])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| block_num: pl.Scalar[pl.UINT64] = pl.tensor.read(config, [5]) |
🧰 Tools
🪛 Ruff (0.15.2)
[error] 167-167: Local variable block_num is assigned to but never used
Remove assignment to unused variable block_num
(F841)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/ir_parser/paged_attention_example.py` at line 167, The variable
block_num is assigned via pl.tensor.read(config, [5]) in
paged_attention_example.py but never used (Ruff F841); remove the unused
assignment or replace its usage: either delete the line "block_num:
pl.Scalar[pl.UINT64] = pl.tensor.read(config, [5])" if it's vestigial, or use
the value where intended (e.g., pass block_num to the function that needs the
block index or incorporate it into the paging logic). Locate the assignment by
the symbol block_num and the call pl.tensor.read and either remove it or wire it
into the downstream code that expects the block number.
| // Dedup: skip if this tensor was already declared (SSA name collapse) | ||
| if (op_name == "tensor.create" && declared_vars_.count(result_var)) { | ||
| return; | ||
| } | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the file and the specific lines mentioned
wc -l src/codegen/orchestration/orchestration_codegen.cppRepository: hw-native-sys/pypto
Length of output: 118
🏁 Script executed:
# Read the context around lines 671-675
sed -n '650,700p' src/codegen/orchestration/orchestration_codegen.cpp | cat -nRepository: hw-native-sys/pypto
Length of output: 2104
🏁 Script executed:
# Search for declared_vars_ to understand its purpose
rg "declared_vars_" src/codegen/orchestration/orchestration_codegen.cpp -B 2 -A 2Repository: hw-native-sys/pypto
Length of output: 981
🏁 Script executed:
# Search for tensor.create to understand how it's used
rg "tensor\.create" src/codegen/orchestration/orchestration_codegen.cpp -B 3 -A 3Repository: hw-native-sys/pypto
Length of output: 1480
🏁 Script executed:
# Search for SSA name collapse context
rg "SSA" src/codegen/orchestration/orchestration_codegen.cpp -B 3 -A 3Repository: hw-native-sys/pypto
Length of output: 8456
🏁 Script executed:
# Look for how result_var is computed and passed to GenerateTensorOpCode
rg "GenerateTensorOpCode" src/codegen/orchestration/orchestration_codegen.cpp -B 5 -A 2Repository: hw-native-sys/pypto
Length of output: 936
🏁 Script executed:
# Search for GetSSABaseName to understand name collapse
rg "GetSSABaseName" src/codegen/orchestration/orchestration_codegen.cpp -B 2 -A 2Repository: hw-native-sys/pypto
Length of output: 5239
🏁 Script executed:
# Look at the VisitStmt for more context on tensor.create handling
sed -n '750,850p' src/codegen/orchestration/orchestration_codegen.cpp | cat -nRepository: hw-native-sys/pypto
Length of output: 5250
🏁 Script executed:
# Look for tests related to tensor.create dedup
find . -type f -name "*.py" -o -name "*.txt" -o -name "*.md" | grep -E "(test|spec)" | head -20Repository: hw-native-sys/pypto
Length of output: 785
🏁 Script executed:
# Search for any comments or documentation about the dedup logic
rg "tensor.create" --type=cpp -B 5 -A 5 | grep -A 10 -B 10 "dedup\|SSA"Repository: hw-native-sys/pypto
Length of output: 2715
🏁 Script executed:
# Check if there are scenarios where the same base name gets assigned in multiple branches
rg "tensor\.create" --type=cpp --context=10 | head -100Repository: hw-native-sys/pypto
Length of output: 7266
🏁 Script executed:
# Look at the VisitStmt for assignments to understand the flow better
sed -n '870,950p' src/codegen/orchestration/orchestration_codegen.cpp | cat -nRepository: hw-native-sys/pypto
Length of output: 3784
🏁 Script executed:
# Let's look at the test file for orchestration codegen
cat tests/ut/codegen/test_orchestration_codegen.py | head -100Repository: hw-native-sys/pypto
Length of output: 3997
🏁 Script executed:
# Check if there are scenarios with multiple tensor.create calls in tests
rg "tensor\.create" tests/ --type=py -B 3 -A 3Repository: hw-native-sys/pypto
Length of output: 5790
🏁 Script executed:
# Let's look at the actual codegen for tensor.create to see what it generates
sed -n '1,100p' src/codegen/tensor_op_codegen.cpp | cat -nRepository: hw-native-sys/pypto
Length of output: 4533
🏁 Script executed:
# Find the REGISTER_ORCHESTRATION_OP for tensor.create to see what it generates
rg "REGISTER_ORCHESTRATION_OP.*tensor_create" -A 30 src/codegen/tensor_op_codegen.cppRepository: hw-native-sys/pypto
Length of output: 1394
🏁 Script executed:
# Search for test cases with multiple tensor.create in a single function
rg "tensor.create" tests/ut/codegen/test_orchestration_codegen.py -A 20 | head -80Repository: hw-native-sys/pypto
Length of output: 1879
🏁 Script executed:
# Check if there's any documentation about the dedup intent
rg "dedup\|SSA.*collapse" src/codegen/orchestration/ -B 3 -A 3Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
# Look for scenarios in loops or conditionals where the same base var might be reassigned
sed -n '730,900p' src/codegen/orchestration/orchestration_codegen.cpp | cat -nRepository: hw-native-sys/pypto
Length of output: 8531
🏁 Script executed:
# Let's look for loop or conditional scenarios where the same base name might be reused
sed -n '500,700p' src/codegen/orchestration/orchestration_codegen.cpp | cat -nRepository: hw-native-sys/pypto
Length of output: 8376
🏁 Script executed:
# Check the full VisitStmt method
sed -n '860,950p' src/codegen/orchestration/orchestration_codegen.cpp | cat -nRepository: hw-native-sys/pypto
Length of output: 4276
🏁 Script executed:
# Let's check if there are any examples or comments about loop-based tensor reallocation
rg "tensor.create" tests/ --type=py -B 5 -A 5 | grep -A 10 -B 10 "loop\|for\|while" | head -60Repository: hw-native-sys/pypto
Length of output: 4667
🏁 Script executed:
# Check if the comment mentions anything about the intent
sed -n '600,650p' src/codegen/orchestration/orchestration_codegen.cpp | cat -nRepository: hw-native-sys/pypto
Length of output: 2051
🏁 Script executed:
# Let's check if there are tests with tensor.create in loops to understand the expected behavior
rg "tensor\.create" tests/ut/codegen/ --type=py -A 30 | grep -B 5 -A 20 "for\|ForStmt"Repository: hw-native-sys/pypto
Length of output: 4425
🏁 Script executed:
# Check if declared_vars_ is scoped to control flow regions or global
rg "declared_vars_" src/codegen/orchestration/orchestration_codegen.cpp -B 2 -A 2Repository: hw-native-sys/pypto
Length of output: 981
Don't silently skip tensor.create when the same base name appears multiple times after SSA collapse.
In loops and conditionals, multiple tensor.create operations can map to the same base name after SSA name collapse (e.g., mi_0, mi_1 → mi). The current dedup check skips codegen for all but the first occurrence, causing subsequent allocations to be dropped and the old tensor to be silently reused. Either emit an assignment (mi = make_tensor(...)) for each create, or enforce that result variable names are unique per allocation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/codegen/orchestration/orchestration_codegen.cpp` around lines 671 - 675,
The current dedup branch that returns when (op_name == "tensor.create" &&
declared_vars_.count(result_var)) silently drops subsequent tensor.create
allocations after SSA collapse; update the logic in orchestration_codegen.cpp to
handle repeated tensor.create for the same result_var by either (A) emitting an
assignment statement (e.g., result_var = make_tensor(...)) instead of returning
when declared_vars_ contains result_var, or (B) enforce/transform to a unique
allocation name before codegen; locate the branch checking op_name ==
"tensor.create", declared_vars_, and result_var and replace the early return
with code that generates an assignment to result_var (or performs a unique
rename) so each create results in a proper allocation or re-assignment.
| // Input args | ||
| for (const auto& arg : call->args_) { | ||
| std::string var_name = TryGetVarName(arg); | ||
| if (!var_name.empty()) { | ||
| // Check if this is a scalar variable (not a tensor) -> make_scalar_param | ||
| if (As<ScalarType>(arg->GetType())) { | ||
| params.push_back({"make_scalar_param", var_name}); | ||
| continue; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the file structure and find the target file
find . -type f -name "orchestration_codegen.cpp" | head -20Repository: hw-native-sys/pypto
Length of output: 116
🏁 Script executed:
# Look at the specific lines mentioned in the review
if [ -f src/codegen/orchestration/orchestration_codegen.cpp ]; then
wc -l src/codegen/orchestration/orchestration_codegen.cpp
fiRepository: hw-native-sys/pypto
Length of output: 118
🏁 Script executed:
# Get context around lines 732-739
if [ -f src/codegen/orchestration/orchestration_codegen.cpp ]; then
sed -n '720,750p' src/codegen/orchestration/orchestration_codegen.cpp
fiRepository: hw-native-sys/pypto
Length of output: 1211
🏁 Script executed:
# Search for make_scalar_param definition
rg "make_scalar_param" --type cpp -B 2 -A 5 | head -100Repository: hw-native-sys/pypto
Length of output: 3438
🏁 Script executed:
# Search for float_to_u64 to see if it exists
rg "float_to_u64" --type cpp -B 2 -A 5Repository: hw-native-sys/pypto
Length of output: 1671
🏁 Script executed:
# Get broader context around the scalar handling code
sed -n '730,810p' src/codegen/orchestration/orchestration_codegen.cppRepository: hw-native-sys/pypto
Length of output: 3690
🏁 Script executed:
# Search for how ConstFloat and other constants are distinguished from variables
rg "ConstFloat|ConstInt|ConstBool" --type cpp -B 5 -A 2 | grep -A 5 "As<ConstFloat>"Repository: hw-native-sys/pypto
Length of output: 1464
🏁 Script executed:
# Look for TryGetVarName to understand what it returns
rg "TryGetVarName" --type cpp -B 2 -A 10 | head -60Repository: hw-native-sys/pypto
Length of output: 4773
🏁 Script executed:
# Find the definition/implementation of TryGetVarName in CodegenBase
rg "CodegenBase::TryGetVarName" --type cpp -B 2 -A 15Repository: hw-native-sys/pypto
Length of output: 2869
🏁 Script executed:
# Look for where scalar variables are declared/assigned
rg "ScalarType" --type cpp -B 3 -A 3 | grep -A 5 "var.*=" | head -80Repository: hw-native-sys/pypto
Length of output: 1022
🏁 Script executed:
# Search for how scalar parameters are used/expected in make_scalar_param calls
rg "make_scalar_param" --type cpp -B 2 -A 2 | head -100Repository: hw-native-sys/pypto
Length of output: 2746
🏁 Script executed:
# Look for comments or documentation about scalar parameter format
rg "scalar.*param|param.*scalar" --type cpp -B 3 -A 3 | head -100Repository: hw-native-sys/pypto
Length of output: 4946
🏁 Script executed:
# Search for test files related to orchestration_codegen
find . -type f -name "*test*.cpp" -o -name "*test*.h" | xargs grep -l "orchestration\|make_scalar_param" 2>/dev/null | head -10Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
# Look for how scalar variables are declared/used in the context of function calls
rg "ScalarType.*var_name|var_name.*ScalarType" --type cpp -B 5 -A 5Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
# Check what type of values are passed to the scalar variable path - look for function signatures
rg "make_scalar_param\|PTOParam" --type cpp -B 5 -A 1 | grep -A 10 "make_scalar_param.*var_name"Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
# Search for any assertions or CHECKs about scalar param types
rg "scalar.*param|make_scalar_param" --type cpp -B 3 -A 3 | grep -i "uint64\|float\|check"Repository: hw-native-sys/pypto
Length of output: 1490
🏁 Script executed:
# Search for variable declarations with ScalarType to understand their storage format
rg "Var|IterArg" --type cpp -B 3 -A 3 | grep -A 5 "ScalarType" | head -80Repository: hw-native-sys/pypto
Length of output: 2043
🏁 Script executed:
# Look for how the params are actually used after being collected
sed -n '800,900p' src/codegen/orchestration/orchestration_codegen.cppRepository: hw-native-sys/pypto
Length of output: 4215
🏁 Script executed:
# Check if there are any tests or examples showing scalar parameter usage
find . -type f \( -name "*.cpp" -o -name "*.h" \) | xargs grep -l "float_to_u64\|make_scalar_param" | head -5Repository: hw-native-sys/pypto
Length of output: 116
🏁 Script executed:
# Look at the broader context to see how params are emitted/used after collection
rg "params\[" --type cpp src/codegen/orchestration/orchestration_codegen.cpp -B 2 -A 2 | head -100Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
# Look for how scalar variables are generated/declared in the context before make_scalar_param
sed -n '600,750p' src/codegen/orchestration/orchestration_codegen.cpp | head -200Repository: hw-native-sys/pypto
Length of output: 5437
🏁 Script executed:
# Search for how scalar types are handled in code generation elsewhere
rg "ScalarType.*ToCTypeString\|ToCTypeString.*float" --type cpp -B 3 -A 3 | head -80Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
# Check if there's any documentation or tests showing what make_scalar_param expects
find . -type f \( -name "*.md" -o -name "*.txt" -o -name "*.h" \) | xargs grep -l "make_scalar_param\|PTOParam" 2>/dev/nullRepository: hw-native-sys/pypto
Length of output: 124
🏁 Script executed:
# Look for the PTOParam definition to understand what it expects
rg "struct PTOParam|class PTOParam|typedef.*PTOParam" --type cpp -B 2 -A 10Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
# Check the header file for PTOParam definition
cat include/pypto/codegen/orchestration/orchestration_codegen.hRepository: hw-native-sys/pypto
Length of output: 2913
🏁 Script executed:
# Search for PTOParam in header files
find . -type f -name "*.h" | xargs grep -l "PTOParam"Repository: hw-native-sys/pypto
Length of output: 124
🏁 Script executed:
# Look for the actual generated code to see how the parameters are used
rg "pto2_rt_submit_task\|PTOParam" --type cpp -B 5 -A 5 | head -150Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
# Search more broadly for PTOParam definition
rg "PTOParam" --type cpp --type h | head -50Repository: hw-native-sys/pypto
Length of output: 470
🏁 Script executed:
# Look for runtime headers that might define PTOParam
find . -type f -name "*.h" | xargs grep -l "PTOParam\|pto2_rt_submit_task" 2>/dev/nullRepository: hw-native-sys/pypto
Length of output: 124
🏁 Script executed:
# Check if PTOParam is defined in external headers or if it's a macro
rg "make_scalar_param\|make_input_param" --type cpp -B 10 | head -100Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
# Look at the full context of how params are emitted to understand what make_scalar_param expects
sed -n '820,850p' src/codegen/orchestration/orchestration_codegen.cppRepository: hw-native-sys/pypto
Length of output: 1288
🏁 Script executed:
# Let me verify if there are any other scalar types besides float that might need special handling
rg "ToCTypeString\|dtype_" --type cpp src/codegen/orchestration/orchestration_codegen.cpp | grep -i "float\|double\|int" | head -20Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
# Check if there are any tests or examples that show the expected behavior of scalar parameters
find . -type f \( -name "*test*" -o -name "*.md" \) | xargs grep -l "make_scalar_param\|scalar.*param" 2>/dev/nullRepository: hw-native-sys/pypto
Length of output: 294
🏁 Script executed:
# Look at how scalar variables would be declared/initialized to understand their runtime type
sed -n '600,700p' src/codegen/orchestration/orchestration_codegen.cpp | grep -B 5 -A 5 "GetCppType"Repository: hw-native-sys/pypto
Length of output: 728
🏁 Script executed:
# Check the test file for orchestration_codegen to understand expected behavior
cat tests/ut/codegen/test_orchestration_codegen.py | head -200Repository: hw-native-sys/pypto
Length of output: 8126
🏁 Script executed:
# Look for any test cases involving scalar parameters
rg "scalar.*param\|make_scalar_param" tests/ --type python -B 3 -A 3Repository: hw-native-sys/pypto
Length of output: 93
🏁 Script executed:
# Check if there are examples of float scalar variables being passed to functions
rg "ScalarType.*float\|float.*scalar" --type python tests/ -B 2 -A 2 | head -60Repository: hw-native-sys/pypto
Length of output: 93
🏁 Script executed:
# Search for test cases involving scalar parameter passing
rg "Scalar\[.*FP32\]|scalar.*function" tests/ut/codegen/test_orchestration_codegen.py -B 3 -A 5Repository: hw-native-sys/pypto
Length of output: 989
🏁 Script executed:
# Look for where scalar variables are actually used as function arguments in tests
grep -n "kernel\|scalar\|Scalar" tests/ut/codegen/test_orchestration_codegen.py | head -50Repository: hw-native-sys/pypto
Length of output: 3329
🏁 Script executed:
# Search the full test file for any scalar parameter tests
wc -l tests/ut/codegen/test_orchestration_codegen.pyRepository: hw-native-sys/pypto
Length of output: 114
🏁 Script executed:
# Get a larger section of the test file
sed -n '1,400p' tests/ut/codegen/test_orchestration_codegen.py | tail -200Repository: hw-native-sys/pypto
Length of output: 8708
🏁 Script executed:
# Look for test cases where scalar VARIABLES are passed (not constants like 1.0, 2.0)
grep -n "Scalar\[" tests/ut/codegen/test_orchestration_codegen.py | head -20Repository: hw-native-sys/pypto
Length of output: 621
🏁 Script executed:
# Check if there's a test with scalar variable argument vs scalar constant argument
sed -n '330,450p' tests/ut/codegen/test_orchestration_codegen.pyRepository: hw-native-sys/pypto
Length of output: 5109
🏁 Script executed:
# Look for any test case where a scalar variable is passed to a kernel function
rg "def.*scalar.*\(.*Scalar" tests/ut/codegen/test_orchestration_codegen.py -B 5 -A 10Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
# Search for test cases that pass scalar variables (not constants) to functions
sed -n '850,950p' tests/ut/codegen/test_orchestration_codegen.pyRepository: hw-native-sys/pypto
Length of output: 4441
🏁 Script executed:
# Look for test cases involving tensor.dim or tensor.read that produce scalar variables
sed -n '175,210p' tests/ut/codegen/test_orchestration_codegen.pyRepository: hw-native-sys/pypto
Length of output: 1650
🏁 Script executed:
# Check if there are any tests where scalar variables from operations are passed to kernel functions
rg "pl.tensor.read\|pl.tensor.dim" tests/ut/codegen/test_orchestration_codegen.py -B 2 -A 8 | head -80Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
# Look for the if_statement test's expected output to see how scalar variables are handled
sed -n '900,1000p' tests/ut/codegen/test_orchestration_codegen.pyRepository: hw-native-sys/pypto
Length of output: 4193
🏁 Script executed:
# Search for where the test assertions are for the if_statement test
grep -n "assert.*is_first\|assert.*make_scalar_param.*is_first" tests/ut/codegen/test_orchestration_codegen.pyRepository: hw-native-sys/pypto
Length of output: 146
🏁 Script executed:
# Get the full test_if_statement test including assertions
sed -n '915,1020p' tests/ut/codegen/test_orchestration_codegen.pyRepository: hw-native-sys/pypto
Length of output: 4557
🏁 Script executed:
# Search for where scalar variables from operations are passed to kernel functions
sed -n '920,970p' tests/ut/codegen/test_orchestration_codegen.pyRepository: hw-native-sys/pypto
Length of output: 2116
🏁 Script executed:
# Look at the generated code for the if_statement test to see how is_first is used
rg "test_if_statement" tests/ut/codegen/test_orchestration_codegen.py -A 200 | grep -A 50 "make_scalar_param"Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
# Final check: search the entire codebase for actual usage of scalar variables passed to tasks
rg "make_scalar_param.*var_name\|scalar.*variable.*make_scalar_param" --type cppRepository: hw-native-sys/pypto
Length of output: 45
Scalar variables need dtype-aware packing for make_scalar_param.
The scalar variable path at lines 732-739 emits make_scalar_param(var_name) directly, creating an inconsistency with how scalar constants are handled. For float constants, the code correctly uses float_to_u64() conversion, but scalar float variables bypass this. If var_name refers to a float type variable, it should be packed identically to float constants.
Suggested fix
- if (As<ScalarType>(arg->GetType())) {
- params.push_back({"make_scalar_param", var_name});
+ if (auto scalar_type = As<ScalarType>(arg->GetType())) {
+ std::string cpp_type = scalar_type->dtype_.ToCTypeString();
+ if (cpp_type == "float") {
+ params.push_back({"make_scalar_param", "float_to_u64(" + var_name + ")"});
+ } else {
+ params.push_back({"make_scalar_param", var_name});
+ }
continue;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/codegen/orchestration/orchestration_codegen.cpp` around lines 732 - 739,
The scalar-arg path in the loop over call->args_ (TryGetVarName, As<ScalarType>,
make_scalar_param) emits make_scalar_param(var_name) unconditionally, which
skips dtype-aware packing for float scalars; update the branch to inspect the
ScalarType's dtype and, for float (and any other types that require special
packing), emit make_scalar_param(float_to_u64(var_name)) (or the appropriate
packing function for that dtype) while preserving the existing behavior for
integer/bool scalars; adjust the logic around As<ScalarType>(arg->GetType()) and
the params push to choose the packed expression based on scalar->dtype() so
scalar variables are packed identically to scalar constants.
| // tensor.create -> uint64_t var_shapes[N] = {...}; Tensor var = make_tensor(var_shapes, N, DataType::XX); | ||
| auto result_type = As<TensorType>(op->GetType()); | ||
| CHECK(result_type) << "tensor.create must return TensorType"; | ||
|
|
||
| std::string result_var = codegen.GetCurrentResultTarget(); | ||
| std::string size_expr = CalculateTensorSizeExpr(result_type, codegen); | ||
| size_t ndim = result_type->shape_.size(); | ||
|
|
||
| std::ostringstream oss; | ||
| oss << "Tensor " << result_var << " = make_tensor(" << size_expr << ");"; | ||
| oss << "uint64_t " << result_var << "_shapes[" << ndim << "] = {"; | ||
| for (size_t i = 0; i < ndim; ++i) { | ||
| if (i > 0) oss << ", "; | ||
| oss << codegen.GenerateExprString(result_type->shape_[i]); | ||
| } | ||
| oss << "};\n"; | ||
| oss << "Tensor " << result_var << " = make_tensor(" << result_var << "_shapes, " << ndim << ", " | ||
| << codegen.GetRuntimeDataTypeString(result_type->dtype_) << ");"; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /tmp && find . -name "tensor_op_codegen.cpp" 2>/dev/null | head -5Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
git ls-files | grep -i tensor_op_codegenRepository: hw-native-sys/pypto
Length of output: 96
🏁 Script executed:
git ls-files | head -20Repository: hw-native-sys/pypto
Length of output: 705
🏁 Script executed:
fd tensor_op_codegenRepository: hw-native-sys/pypto
Length of output: 96
🏁 Script executed:
find . -type f -name "*.cpp" | head -20Repository: hw-native-sys/pypto
Length of output: 824
🏁 Script executed:
git ls-files | grep -E "(codegen|tensor)" | head -30Repository: hw-native-sys/pypto
Length of output: 1191
🏁 Script executed:
cat -n src/codegen/tensor_op_codegen.cpp | head -80Repository: hw-native-sys/pypto
Length of output: 3626
🏁 Script executed:
cat -n src/codegen/tensor_op_codegen.cpp | sed -n '50,160p'Repository: hw-native-sys/pypto
Length of output: 5104
🏁 Script executed:
fd -t f "\.(h|cpp)$" | xargs grep -l "make_tensor" | head -10Repository: hw-native-sys/pypto
Length of output: 280
🏁 Script executed:
rg "void.*make_tensor|auto.*make_tensor|Tensor.*make_tensor" -A 3Repository: hw-native-sys/pypto
Length of output: 8614
🏁 Script executed:
rg "shapes\[0\]|ndim.*0|scalar" tests/ut/codegen/test_orchestration_codegen.py -B 2 -A 2Repository: hw-native-sys/pypto
Length of output: 3202
🏁 Script executed:
fd -t h "\.(h|hpp)$" | xargs grep -l "make_tensor" | head -5Repository: hw-native-sys/pypto
Length of output: 231
🏁 Script executed:
rg "Tensor.*make_tensor|make_tensor\s*\(" include/ -A 2 -B 2Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
rg "ndim\s*==\s*0|shape_.*empty\(\)|rank.*0" src/codegen/ -B 3 -A 3Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
find include -name "*.h" -o -name "*.hpp" | xargs grep -l "make_tensor" 2>/dev/nullRepository: hw-native-sys/pypto
Length of output: 162
🏁 Script executed:
rg "make_tensor" include/ --type cpp --type hRepository: hw-native-sys/pypto
Length of output: 409
🏁 Script executed:
rg "Tensor\s+make_tensor|Tensor.*make_tensor" . --type-listRepository: hw-native-sys/pypto
Length of output: 452
🏁 Script executed:
rg "def make_tensor|struct.*make_tensor|typedef.*make_tensor" -A 3Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
cat tests/ut/codegen/test_orchestration_codegen.py | grep -A 5 "shapes\[" | head -30Repository: hw-native-sys/pypto
Length of output: 1553
🏁 Script executed:
rg "VLA|Variable.Length.Array|zero.*length.*array" . --type txt --type mdRepository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
grep -r "make_tensor.*DataType" tests/ | head -10Repository: hw-native-sys/pypto
Length of output: 1396
🏁 Script executed:
cat -n include/pypto/codegen/orchestration/orchestration_codegen.h | grep -A 10 "make_tensor"Repository: hw-native-sys/pypto
Length of output: 748
🏁 Script executed:
rg "Scalar\[|TensorType.*shape_.*empty|rank.*0" python/pypto/ | head -20Repository: hw-native-sys/pypto
Length of output: 1650
🏁 Script executed:
rg "Scalar\|shape.*\[\]|\[\].*shape" tests/ut/ -B 2 -A 2Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
grep -r "tensor.create" tests/ | head -10Repository: hw-native-sys/pypto
Length of output: 994
🏁 Script executed:
rg "class TensorType|struct TensorType" include/ -A 10Repository: hw-native-sys/pypto
Length of output: 674
🏁 Script executed:
cat > /tmp/check_vla.cpp << 'EOF'
`#include` <iostream>
int main() {
int ndim = 0;
// This is invalid C++ - VLA (Variable Length Array) is not standard C++
// uint64_t shapes[ndim] = {}; // Compilation Error
// Standard C++ solution: allocate minimum 1
size_t shape_len = (ndim == 0) ? 1 : ndim;
uint64_t shapes[4] = {}; // Use constant size
return 0;
}
EOF
cat /tmp/check_vla.cppRepository: hw-native-sys/pypto
Length of output: 421
🏁 Script executed:
rg "VLA\|Variable.Length\|VarArrayType" . -B 2 -A 2Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
rg "pl.Scalar|ScalarType\|rank.*0" python/pypto/ir/ | head -15Repository: hw-native-sys/pypto
Length of output: 45
🏁 Script executed:
grep -n "ScalarType\|Scalar\[" python/pypto/language/ | head -10Repository: hw-native-sys/pypto
Length of output: 107
🏁 Script executed:
cat -n src/codegen/tensor_op_codegen.cpp | sed -n '26,52p'Repository: hw-native-sys/pypto
Length of output: 1049
🏁 Script executed:
cat -n src/codegen/tensor_op_codegen.cpp | sed -n '99,126p'Repository: hw-native-sys/pypto
Length of output: 1318
🏁 Script executed:
rg "is_simple.*idx_expr|idx_expr.*empty" src/codegen/tensor_op_codegen.cpp -B 2 -A 2Repository: hw-native-sys/pypto
Length of output: 257
🏁 Script executed:
rg "shape.*\[\]|shape_.*empty\(\)" tests/ -B 3 -A 3Repository: hw-native-sys/pypto
Length of output: 653
🏁 Script executed:
grep -n "tensor.create.*\[\]" tests/ut/Repository: hw-native-sys/pypto
Length of output: 94
🏁 Script executed:
cat CMakeLists.txt | grep -i "std\|c++\|cxx" | head -20Repository: hw-native-sys/pypto
Length of output: 383
🏁 Script executed:
rg "CMAKE_CXX_STANDARD|std=" . --type cmakeRepository: hw-native-sys/pypto
Length of output: 159
Fix zero‑rank tensor handling in tensor.create and tensor.read code generation.
The current implementation emits invalid C++ for rank‑0 tensors. When shape_ is empty, line 63 generates uint64_t <name>_shapes[0] = {}, which violates C++17 (zero‑length arrays are non‑standard). Additionally, tensor.read at lines 99‑125 produces an empty index expression size_t idx_var = ; for rank‑0 tensors, causing a syntax error.
Allocate a minimum of 1 element for the shapes array and provide a dummy value when the shape is empty, following the pattern used in CalculateTensorSizeExpr (lines 41‑44). The codebase has test coverage for rank‑0 tensors (test_scalar_shape_dimensions), so this must be supported.
🔧 Suggested fix for tensor.create
- size_t ndim = result_type->shape_.size();
+ size_t ndim = result_type->shape_.size();
+ size_t shape_len = (ndim == 0) ? 1 : ndim;
std::ostringstream oss;
- oss << "uint64_t " << result_var << "_shapes[" << ndim << "] = {";
- for (size_t i = 0; i < ndim; ++i) {
- if (i > 0) oss << ", ";
- oss << codegen.GenerateExprString(result_type->shape_[i]);
- }
+ oss << "uint64_t " << result_var << "_shapes[" << shape_len << "] = {";
+ if (ndim == 0) {
+ oss << "1";
+ } else {
+ for (size_t i = 0; i < ndim; ++i) {
+ if (i > 0) oss << ", ";
+ oss << codegen.GenerateExprString(result_type->shape_[i]);
+ }
+ }
oss << "};\n";
oss << "Tensor " << result_var << " = make_tensor(" << result_var << "_shapes, " << ndim << ", "
>> codegen.GetRuntimeDataTypeString(result_type->dtype_) << ");";🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/codegen/tensor_op_codegen.cpp` around lines 55 - 70, The tensor.create
code emits a zero-length C array and tensor.read emits an empty index for rank-0
tensors; change the shapes array allocation to use at least one element (e.g.,
size = std::max<size_t>(1, ndim)) and when ndim==0 emit a single dummy dimension
value of 1 in the initializer for result_var_shapes, and in tensor.read ensure
the computed index (idx_var) is "0" for ndim==0 instead of an empty expression;
update the logic around result_var_shapes, the make_tensor call, and the index
computation (referencing result_var, result_var_shapes, CalculateTensorSizeExpr
and the idx_var/index expression generation) to handle ndim==0 accordingly.
| // Build linear index expression | ||
| std::ostringstream idx_oss; | ||
| for (size_t i = 0; i < indices.size(); ++i) { | ||
| if (i > 0) oss << " + "; | ||
| oss << codegen.GenerateExprString(indices[i]); | ||
| if (i > 0) idx_oss << " + "; | ||
| idx_oss << codegen.GenerateExprString(indices[i]); | ||
| for (size_t j = i + 1; j < shape.size(); ++j) { | ||
| oss << " * " << codegen.GenerateExprString(shape[j]); | ||
| idx_oss << " * " << codegen.GenerateExprString(shape[j]); | ||
| } | ||
| } | ||
| oss << ";\n"; | ||
| oss << cpp_type << " " << result_var << " = static_cast<" << cpp_type << "*>(" << ptr_expr << ")[idx_" | ||
| << result_var << "];"; | ||
| std::string idx_expr = idx_oss.str(); | ||
|
|
||
| // Check if the index expression is a simple constant (all digits) | ||
| bool is_simple = !idx_expr.empty() && std::all_of(idx_expr.begin(), idx_expr.end(), ::isdigit); | ||
|
|
||
| std::ostringstream oss; | ||
| if (is_simple) { | ||
| // Inline constant index directly | ||
| oss << cpp_type << " " << result_var << " = static_cast<" << cpp_type << "*>(" << ptr_expr << ")[" | ||
| << idx_expr << "];"; | ||
| } else { | ||
| // Use intermediate variable for complex index expressions | ||
| oss << "size_t idx_" << result_var << " = " << idx_expr << ";\n"; | ||
| oss << cpp_type << " " << result_var << " = static_cast<" << cpp_type << "*>(" << ptr_expr << ")[idx_" | ||
| << result_var << "];"; | ||
| } | ||
|
|
||
| return oss.str(); |
There was a problem hiding this comment.
Handle empty index tuples and validate rank.
If indices is empty (rank‑0 tensor), idx_expr becomes empty and the generated size_t idx_... = ; is invalid. Also, mismatched indices.size() vs shape.size() will silently compute a wrong linear index. Please add validation and a scalar fallback.
🔧 Suggested fix
const auto& indices = indices_tuple->elements_;
const auto& shape = input_type->shape_;
+ CHECK(indices.size() == shape.size())
+ << "tensor.read indices size must match tensor rank";
- // Build linear index expression
- std::ostringstream idx_oss;
- for (size_t i = 0; i < indices.size(); ++i) {
- if (i > 0) idx_oss << " + ";
- idx_oss << codegen.GenerateExprString(indices[i]);
- for (size_t j = i + 1; j < shape.size(); ++j) {
- idx_oss << " * " << codegen.GenerateExprString(shape[j]);
- }
- }
- std::string idx_expr = idx_oss.str();
+ std::string idx_expr;
+ if (indices.empty()) {
+ idx_expr = "0";
+ } else {
+ std::ostringstream idx_oss;
+ for (size_t i = 0; i < indices.size(); ++i) {
+ if (i > 0) idx_oss << " + ";
+ idx_oss << codegen.GenerateExprString(indices[i]);
+ for (size_t j = i + 1; j < shape.size(); ++j) {
+ idx_oss << " * " << codegen.GenerateExprString(shape[j]);
+ }
+ }
+ idx_expr = idx_oss.str();
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/codegen/tensor_op_codegen.cpp` around lines 99 - 125, The generated index
code must handle rank-0 (empty indices) and validate indices vs shape length:
before building idx_expr check if indices.empty() and set idx_expr to "0" (or
generate a direct scalar load) to avoid emitting an empty assignment; also check
indices.size() == shape.size() (or emit a runtime/compile-time assertion or
throw std::runtime_error) and fail fast if they differ to avoid silently wrong
linear index calculation. Update the block that builds idx_oss / idx_expr
(references: indices, shape, codegen.GenerateExprString, idx_expr, result_var,
ptr_expr, cpp_type) to perform these validations and then proceed with the
existing simple/complex-index emission logic.
| def kernel_process( | ||
| self, | ||
| a: pl.Tensor[[16, 16], pl.FP32], | ||
| flag: pl.Scalar[pl.INT64], | ||
| output: pl.Tensor[[16, 16], pl.FP32], | ||
| ) -> pl.Tensor[[16, 16], pl.FP32]: |
There was a problem hiding this comment.
Avoid unused parameter in kernel_process.
flag is unused and will trigger ARG002. Consider prefixing with _ or adding a noqa.
🔧 Suggested fix
- def kernel_process(
- self,
- a: pl.Tensor[[16, 16], pl.FP32],
- flag: pl.Scalar[pl.INT64],
- output: pl.Tensor[[16, 16], pl.FP32],
- ) -> pl.Tensor[[16, 16], pl.FP32]:
+ def kernel_process(
+ self,
+ a: pl.Tensor[[16, 16], pl.FP32],
+ _flag: pl.Scalar[pl.INT64],
+ output: pl.Tensor[[16, 16], pl.FP32],
+ ) -> pl.Tensor[[16, 16], pl.FP32]:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def kernel_process( | |
| self, | |
| a: pl.Tensor[[16, 16], pl.FP32], | |
| flag: pl.Scalar[pl.INT64], | |
| output: pl.Tensor[[16, 16], pl.FP32], | |
| ) -> pl.Tensor[[16, 16], pl.FP32]: | |
| def kernel_process( | |
| self, | |
| a: pl.Tensor[[16, 16], pl.FP32], | |
| _flag: pl.Scalar[pl.INT64], | |
| output: pl.Tensor[[16, 16], pl.FP32], | |
| ) -> pl.Tensor[[16, 16], pl.FP32]: |
🧰 Tools
🪛 Ruff (0.15.2)
[warning] 933-933: Unused method argument: flag
(ARG002)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/ut/codegen/test_orchestration_codegen.py` around lines 930 - 935, The
parameter `flag` in function kernel_process is unused and triggers ARG002;
rename it to `_flag` (or `_flag: pl.Scalar[pl.INT64]`) or annotate/ignore it
with a noqa to mark it intentional. Locate the kernel_process definition and
update the parameter name or add the noqa inline so the linter no longer reports
ARG002 while preserving the function signature semantics.
…n codegen Multiple tuple-returning kernel calls (e.g. kernel_init_inplace, kernel_softmax_prepare, kernel_online_update) shared each other's output params due to GetSSABaseName() collapsing all _tuple_tmp_N names to _tuple_tmp. Use per-call unique keys with Call* pointer coordination to isolate each call's tuple elements.
Summary
Refactor orchestration codegen to use shape/dtype-based tensor APIs, add scalar min/max support to the DSL, and implement codegen for
IfOpandViewOp.Changes
Orchestration Codegen Refactor
make_tensor(bytes)/make_tensor_external(ptr, size)with shape/dtype-based APIs:make_tensor(shapes, ndim, dtype)/make_tensor_external(ptr, shapes, ndim, dtype), providing the runtime with richer tensor metadata.TaskRecord, scope analysis, re-indentation). Scope management is now handled structurally byForStmtandIfStmtvisitors which emitPTO2_SCOPE(rt)blocks directly.non_task_code_stream into the maincode_stream — all generated code now flows through a single output path.Scalar Min/Max Operations
MakeMin/MakeMaxfactory functions inscalar_expr.hwith automatic type promotion.ir.min_()/ir.max_()in Python bindings and type stubs.pl.min(scalar, scalar)andpl.max(scalar, scalar)now route to scalar IR ops via_parse_scalar_op(), whilepl.min(tile, axis=...)continues to work as tile reduction.@overloadtype signatures forpl.min/pl.maxinblock_ops.pyto satisfy pyright.Min/Maxexpressions inGenerateExprString(emitsstd::min()/std::max()).New Codegen: IfOp and ViewOp
IfStmtcodegen: emitsif/elseblocks withPTO2_SCOPE(rt)wrapping, pre-declares return variables, and handles both branches.tensor.viewcodegen: emitsinput.view({shapes}, {offsets})for tensor slicing.Shared Utilities
DataTypeToString()intodtype.has a shared utility, replacing the duplicatedDataTypeToPythonString()inpython_printer.cpp.GetRuntimeDataTypeString()andGetExternalTensorName()as virtual methods onCodegenBasefor subclass customization.GenerateExprStringwith support forFloorMod, comparison ops (Eq/Ne/Lt/Le/Gt/Ge),Neg,ConstFloat,ConstBool.Type Stubs
eq,ne,lt,le,gt,ge,neg,truediv,floordiv,mod,min_,max_.spanparameter optional (default= ...) across all scalar op stubs.Tests
test_scalar_ops.pyfor IR-level scalar min/max node creation and type promotion.test_scalar_dispatch.pyfor parser-levelpl.min/pl.maxscalar dispatch.