feat(codegen): Add 910B PTO backend op support for paged attention by lwDavid · Pull Request #195 · hw-native-sys/pypto

lwDavid · 2026-02-13T08:42:24Z

feat(codegen): Add 910B PTO backend op support for paged attention

Add codegen for row reduction ops (row_sum, row_max, row_min)
Add codegen for row expansion ops (row_expand_div, row_expand_mul, row_expand_sub)
Add codegen for fillpad and full operation
Fix block.cast parameter name from target_dtype to target_type
Add paged attention test demo
Update test to use OptimizationStrategy.PTOAS and improve formatting

gemini-code-assist · 2026-02-13T08:42:40Z

Summary of Changes

Hello @lwDavid, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the 910B PTO backend's capabilities by integrating support for several new tensor operations critical for optimizing deep learning workloads, particularly paged attention. It introduces codegen for various row-wise reduction and expansion operations, as well as a fillpad operation. A minor but important fix to a parameter name in the block.cast operation ensures correctness, and a comprehensive new test case validates the functionality and demonstrates the use of these new features within a paged attention context.

Highlights

New Backend Support: Added 910B PTO backend support for various axis reduction operations, including row_sum, row_max, and row_min.
Row Expansion Operations: Implemented codegen for row expansion operations such as row_expand_div, row_expand_mul, and row_expand_sub.
Fillpad Operation: Introduced codegen for the fillpad operation, enhancing padding capabilities.
Parameter Name Fix: Corrected a parameter name in block.cast from target_dtype to target_type for consistency.
Paged Attention Test Demo: Added a new test demo specifically for paged attention, utilizing the newly supported operations and OptimizationStrategy.PTOAS.

Changelog

python/pypto/ir/op/block_ops.py
- Fixed parameter name from 'target_dtype' to 'target_type' in the 'block.cast' operation.
src/backend/910B_PTO/backend_910b_pto_ops.cpp
- Added helper functions MakeFillPadCodegenPTO and MakeBinaryAxisCodegenPTO for new operations.
- Registered backend support for block.row_sum, block.row_max, block.row_min operations.
- Registered backend support for block.row_expand_div, block.row_expand_mul, block.row_expand_sub operations.
- Registered backend support for the block.fillpad operation.
tests/ut/codegen/test_pto_codegen_pa.py
- Added a new unit test file for paged attention codegen.
- Implemented PagedAttention program with qk_matmul, pv_matmul, softmax_prepare, and online_update functions.
- Demonstrated usage of new block.row_max, block.row_sum, block.row_expand_sub, block.row_expand_mul, block.row_expand_div, and block.fillpad operations.
- Configured the test to use OptimizationStrategy.PTOAS.

Activity

No human activity has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for new axis reduction and padding operations to the 910B PTO backend, primarily for paged attention. The changes include new codegen logic in C++ and a comprehensive Python test case demonstrating the new functionality. Overall, the implementation is good, but I've identified a critical bug in the test file's casting logic that could lead to incorrect behavior. I've also noted some minor issues in the C++ code, such as an incorrect error message and swapped comment headers, which impact maintainability. Additionally, the new test could be strengthened by adding assertions to verify the generated code's correctness instead of just printing it.

tests/ut/codegen/test_pto_codegen_pa.py

src/backend/910B_PTO/backend_910b_pto_ops.cpp

tests/ut/codegen/test_pto_codegen_pa.py

…dding ops for paged attention

lwDavid · 2026-02-14T02:17:41Z

/gemini review

gemini-code-assist

Code Review

This pull request adds support for several new operations for the 910B PTO backend, primarily to support paged attention. This includes row reduction and expansion ops, fillpad, and full. The changes span from the Python IR and language layers down to the C++ backend implementations for both CCE and PTO. A new test for paged attention codegen is also added.

My review focuses on improving the correctness and maintainability of the new C++ backend code and the new Python test. I've found a few confusing or incorrect error messages in the C++ code. More importantly, I've identified a bug in the new paged attention test related to data types and variable usage, along with opportunities to improve the test's effectiveness by adding assertions. I've provided suggestions to fix these issues.

tests/ut/codegen/test_pto_codegen_pa.py

gemini-code-assist · 2026-02-14T02:19:48Z

src/backend/910B_PTO/backend_910b_pto_ops.cpp

+  CHECK(op->args_.size() == 2) << "full op requires 3 arguments."
+                               << op->args_.size();  // Actually 2 args, two of them are conbined!


The error message in this CHECK is confusing. It states that full op requires 3 arguments, but the check is for op->args_.size() == 2. The comment also clarifies there are 2 arguments. The error message should be updated to reflect that 2 arguments are expected.

CHECK(op->args_.size() == 2) << "full op requires 2 arguments, but got " << op->args_.size();

src/backend/910B_PTO/backend_910b_pto_ops.cpp

tests/ut/codegen/test_pto_codegen_pa.py

feat(codegen): Add ops for paged attention

lwDavid · 2026-02-14T02:43:40Z

@Hzfengsy Request review.

tests/ut/codegen/test_pto_codegen_pa.py

coderabbitai · 2026-02-24T01:02:00Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR introduces a new block.fillpad operation across the IR, language, and backend layers. The IR operation is registered with type deduction logic, wrapped in a language-level API, and implemented in both CCE and PTO backends. Additionally, several new PTO codegen helpers support existing operations (full, transpose, row operations), and a new test module validates PTO code generation.

Changes

Cohort / File(s)	Summary
IR and Language API `python/pypto/ir/op/block_ops.py`, `python/pypto/language/op/block_ops.py`	Added `fillpad` function at IR level to emit block.fillpad operation; added public language wrapper; updated `__all__` export. Also changed cast operation's keyword parameter from "target_dtype" to "target_type".
IR Operation Registration `src/ir/op/block_ops/elementwise.cpp`	Registered `block.fillpad` operation with type deduction that validates a single TileType input and returns a TileType with matching shape and dtype.
Backend CCE Implementation `src/backend/910B_CCE/backend_910b_cce_ops.cpp`	Registered `block.fillpad` operation for CCE backend using vertical pipeline and unary codegen handler emitting TFILLPAD.
Backend PTO Implementation `src/backend/910B_PTO/backend_910b_pto_ops.cpp`	Added five new PTO codegen helpers (MakeFullCodegenPTO, MakeFillPadCodegenPTO, MakeTernaryDataMoveLayoutCodegenPTO, MakeBinaryAxisCodegenPTO) and registered multiple block operations (full, transpose, fillpad, row operations).
Test Coverage `tests/ut/codegen/test_pto_codegen_pa.py`	New test module introducing PagedAttention class with kernel methods and Test910BBlockOpsCodegen class validating MLIR generation for PTO block operations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A new fillpad hops into the stack,
From IR down to backends, no turning back,
CCE and PTO both lend a hand,
With codegen helpers perfectly planned!
Tests bloom bright with paged attention's grace,
Completing this operation's embrace. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 73.68% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title accurately describes the main objective: adding 910B PTO backend operation support for paged attention operations.
Description check	✅ Passed	The pull request description is directly related to the changeset, detailing all major additions including row reduction/expansion ops, fillpad, full operation, block.cast fix, and paged attention test.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

tests/ut/codegen/test_pto_codegen_pa.py (1)

11-184: ⚠️ Potential issue | 🟡 Minor

Switch this test module to pytest instead of unittest.

This repo’s tests are pytest-based; using unittest.TestCase/unittest.main() bypasses pytest plugins and conventions.

🔧 Suggested update

-import unittest
+import pytest
@@
-class Test910BBlockOpsCodegen(unittest.TestCase):
+class Test910BBlockOpsCodegen:
@@
-if __name__ == "__main__":
-    unittest.main()
+if __name__ == "__main__":
+    pytest.main([__file__, "-v"])

#!/bin/bash
# Inspect existing test runner conventions.
rg -n "pytest\.main|unittest\.main" tests/ut

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/ut/codegen/test_pto_codegen_pa.py` around lines 11 - 184, The test uses
unittest.TestCase and unittest.main which bypasses pytest; convert the
Test910BBlockOpsCodegen.test_block_ops_codegen into a pytest-style test
function. Remove "import unittest", replace the Test910BBlockOpsCodegen class
and its method with a top-level function named test_block_ops_codegen that calls
backend.reset_for_testing(), backend.set_backend_type(BackendType.PTO), builds
optimized_program via
PassManager.get_strategy(OptimizationStrategy.PTOAS).run_passes(PagedAttention),
constructs codegen.PTOCodegen(), iterates optimized_program.functions and prints
MLIR as before; also remove the if __name__ == "__main__": unittest.main() block
so pytest will discover the test. Ensure function and symbol names
(test_block_ops_codegen, PagedAttention, PassManager.get_strategy,
codegen.PTOCodegen, backend.reset_for_testing) remain referenced exactly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/pypto/ir/op/block_ops.py`:
- Around line 282-293: The type annotation for the parameter span in the
function fillpad uses Optional but Optional is not imported, causing a NameError
on import; update the annotation to use Python 3.10+ union syntax (Span | None)
or import Optional from typing, e.g., change the signature of fillpad (and any
other occurrences) from span: Optional[Span] to span: Span | None and keep
_get_span_or_capture(span) usage unchanged so the module imports cleanly.

In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp`:
- Around line 208-215: The CHECK message in MakeBinaryAxisCodegenPTO incorrectly
references "Fill pad" — update the CHECK in MakeBinaryAxisCodegenPTO (which
validates op->args_.size() == 2) to use a correct, descriptive message for
binary axis ops (e.g., reference pto_op_name or say "Binary axis op requires 2
arguments") so the error context is accurate when the check fails; ensure you
modify the CHECK call in this function (and not other helpers) to reflect the
proper operation name or generic "Binary axis op" text.
- Around line 115-125: The CHECK in MakeFullCodegenPTO is logging the wrong
expected count ("full op requires 3 arguments.") while the code actually expects
2; update the CHECK message associated with op->args_.size() in
MakeFullCodegenPTO (and/or its inline comment) to accurately state "full op
requires 2 arguments." (keep the existing size output op->args_.size() so the
runtime will still show the actual value).

---

Duplicate comments:
In `@tests/ut/codegen/test_pto_codegen_pa.py`:
- Around line 11-184: The test uses unittest.TestCase and unittest.main which
bypasses pytest; convert the Test910BBlockOpsCodegen.test_block_ops_codegen into
a pytest-style test function. Remove "import unittest", replace the
Test910BBlockOpsCodegen class and its method with a top-level function named
test_block_ops_codegen that calls backend.reset_for_testing(),
backend.set_backend_type(BackendType.PTO), builds optimized_program via
PassManager.get_strategy(OptimizationStrategy.PTOAS).run_passes(PagedAttention),
constructs codegen.PTOCodegen(), iterates optimized_program.functions and prints
MLIR as before; also remove the if __name__ == "__main__": unittest.main() block
so pytest will discover the test. Ensure function and symbol names
(test_block_ops_codegen, PagedAttention, PassManager.get_strategy,
codegen.PTOCodegen, backend.reset_for_testing) remain referenced exactly.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4c99f8 and d920754.

📒 Files selected for processing (6)

python/pypto/ir/op/block_ops.py
python/pypto/language/op/block_ops.py
src/backend/910B_CCE/backend_910b_cce_ops.cpp
src/backend/910B_PTO/backend_910b_pto_ops.cpp
src/ir/op/block_ops/elementwise.cpp
tests/ut/codegen/test_pto_codegen_pa.py

python/pypto/ir/op/block_ops.py

coderabbitai · 2026-02-24T01:10:08Z

src/backend/910B_PTO/backend_910b_pto_ops.cpp

+// Helper function for full op
+static std::string MakeFullCodegenPTO(const std::string& pto_op_name, const CallPtr& op,
+                                      codegen::CodegenBase& codegen_base) {
+  auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base);
+  CHECK(op->args_.size() == 2) << "full op requires 3 arguments."
+                               << op->args_.size();  // Actually 2 args, two of them are conbined!
+  std::string scalar = codegen.GetExprAsCode(op->args_[1]);
+  std::string dst = codegen.GetCurrentResultTarget();
+  codegen.Emit(pto_op_name + " " + "ins(" + scalar + ") outs(" + dst + ")");
+  return "";
+}


⚠️ Potential issue | 🟡 Minor

Fix misleading argument-count message in MakeFullCodegenPTO.

The CHECK message says “3 arguments” even though the code expects 2, which will confuse debugging.

✏️ Suggested fix

- CHECK(op->args_.size() == 2) << "full op requires 3 arguments." - << op->args_.size(); // Actually 2 args, two of them are conbined! + CHECK(op->args_.size() == 2) << "full op requires 2 arguments, got " << op->args_.size();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Helper function for full op

static std::string MakeFullCodegenPTO(const std::string& pto_op_name, const CallPtr& op,

codegen::CodegenBase& codegen_base) {

auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base);

CHECK(op->args_.size() == 2) << "full op requires 3 arguments."

<< op->args_.size(); // Actually 2 args, two of them are conbined!

std::string scalar = codegen.GetExprAsCode(op->args_[1]);

std::string dst = codegen.GetCurrentResultTarget();

codegen.Emit(pto_op_name + " " + "ins(" + scalar + ") outs(" + dst + ")");

return "";

}

// Helper function for full op

static std::string MakeFullCodegenPTO(const std::string& pto_op_name, const CallPtr& op,

codegen::CodegenBase& codegen_base) {

auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base);

CHECK(op->args_.size() == 2) << "full op requires 2 arguments, got " << op->args_.size();

std::string scalar = codegen.GetExprAsCode(op->args_[1]);

std::string dst = codegen.GetCurrentResultTarget();

codegen.Emit(pto_op_name + " " + "ins(" + scalar + ") outs(" + dst + ")");

return "";

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 115 - 125, The CHECK in MakeFullCodegenPTO is logging the wrong expected count ("full op requires 3 arguments.") while the code actually expects 2; update the CHECK message associated with op->args_.size() in MakeFullCodegenPTO (and/or its inline comment) to accurately state "full op requires 2 arguments." (keep the existing size output op->args_.size() so the runtime will still show the actual value).

src/backend/910B_PTO/backend_910b_pto_ops.cpp

lwDavid · 2026-02-24T01:26:27Z

@Hzfengsy Request review.

Hzfengsy · 2026-02-24T03:49:35Z

tests/ut/codegen/test_pto_codegen_paged_attn.py

do not use pa as a shortening; write paged_attn

@Hzfengsy Fixed.

gemini-code-assist bot reviewed Feb 13, 2026

View reviewed changes

lwDavid force-pushed the pa branch 3 times, most recently from 8705c00 to 059de30 Compare February 13, 2026 09:27

feat(codegen): Add 910B PTO backend support for axis reduction and pa…

0ac378c

…dding ops for paged attention

lwDavid force-pushed the pa branch from 059de30 to 0ac378c Compare February 13, 2026 09:33

lwDavid marked this pull request as draft February 13, 2026 09:57

lwDavid force-pushed the pa branch from 86dd817 to b8b73c0 Compare February 14, 2026 01:12

doraemonmj force-pushed the pa branch from b8b73c0 to cfa56c9 Compare February 14, 2026 01:41

lwDavid force-pushed the pa branch from cfa56c9 to 3f099e3 Compare February 14, 2026 02:03

doraemonmj force-pushed the pa branch from bfdc6b8 to a7be1ed Compare February 14, 2026 02:11

lwDavid changed the title ~~feat(codegen): Add 910B PTO backend support for axis reduction and padding ops for paged attention~~ feat(codegen): Add 910B PTO backend op support for paged attention Feb 14, 2026

lwDavid marked this pull request as ready for review February 14, 2026 02:17

gemini-code-assist bot reviewed Feb 14, 2026

View reviewed changes

feat(op):Add fillpad op

524a87c

feat(codegen): Add ops for paged attention

lwDavid force-pushed the pa branch from a7be1ed to 524a87c Compare February 14, 2026 02:28

Hzfengsy reviewed Feb 16, 2026

View reviewed changes

tests/ut/codegen/test_pto_codegen_pa.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 24, 2026

View reviewed changes

lwDavid force-pushed the pa branch 2 times, most recently from d9143cc to b24ae44 Compare February 24, 2026 01:21

Hzfengsy reviewed Feb 24, 2026

View reviewed changes

Merge branch 'hw-native-sys:main' into pa

b7f9df8

lwDavid force-pushed the pa branch from b24ae44 to b7f9df8 Compare February 24, 2026 03:54

Hzfengsy approved these changes Feb 24, 2026

View reviewed changes

Hzfengsy merged commit 7201a97 into hw-native-sys:main Feb 24, 2026
5 checks passed

lwDavid deleted the pa branch February 24, 2026 06:21

		CHECK(op->args_.size() == 2) << "full op requires 3 arguments."
		<< op->args_.size(); // Actually 2 args, two of them are conbined!

Comments

Conversation

lwDavid commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lwDavid commented Feb 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lwDavid commented Feb 14, 2026

Uh oh!

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lwDavid commented Feb 24, 2026

Uh oh!

Hzfengsy Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

lwDavid Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lwDavid commented Feb 13, 2026 •

edited

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading