Skip to content

Comments

feat(codegen): Add 910B PTO backend op support for paged attention#195

Merged
Hzfengsy merged 3 commits intohw-native-sys:mainfrom
lwDavid:pa
Feb 24, 2026
Merged

feat(codegen): Add 910B PTO backend op support for paged attention#195
Hzfengsy merged 3 commits intohw-native-sys:mainfrom
lwDavid:pa

Conversation

@lwDavid
Copy link
Contributor

@lwDavid lwDavid commented Feb 13, 2026

feat(codegen): Add 910B PTO backend op support for paged attention

  • Add codegen for row reduction ops (row_sum, row_max, row_min)
  • Add codegen for row expansion ops (row_expand_div, row_expand_mul, row_expand_sub)
  • Add codegen for fillpad and full operation
  • Fix block.cast parameter name from target_dtype to target_type
  • Add paged attention test demo
  • Update test to use OptimizationStrategy.PTOAS and improve formatting

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @lwDavid, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the 910B PTO backend's capabilities by integrating support for several new tensor operations critical for optimizing deep learning workloads, particularly paged attention. It introduces codegen for various row-wise reduction and expansion operations, as well as a fillpad operation. A minor but important fix to a parameter name in the block.cast operation ensures correctness, and a comprehensive new test case validates the functionality and demonstrates the use of these new features within a paged attention context.

Highlights

  • New Backend Support: Added 910B PTO backend support for various axis reduction operations, including row_sum, row_max, and row_min.
  • Row Expansion Operations: Implemented codegen for row expansion operations such as row_expand_div, row_expand_mul, and row_expand_sub.
  • Fillpad Operation: Introduced codegen for the fillpad operation, enhancing padding capabilities.
  • Parameter Name Fix: Corrected a parameter name in block.cast from target_dtype to target_type for consistency.
  • Paged Attention Test Demo: Added a new test demo specifically for paged attention, utilizing the newly supported operations and OptimizationStrategy.PTOAS.
Changelog
  • python/pypto/ir/op/block_ops.py
    • Fixed parameter name from 'target_dtype' to 'target_type' in the 'block.cast' operation.
  • src/backend/910B_PTO/backend_910b_pto_ops.cpp
    • Added helper functions MakeFillPadCodegenPTO and MakeBinaryAxisCodegenPTO for new operations.
    • Registered backend support for block.row_sum, block.row_max, block.row_min operations.
    • Registered backend support for block.row_expand_div, block.row_expand_mul, block.row_expand_sub operations.
    • Registered backend support for the block.fillpad operation.
  • tests/ut/codegen/test_pto_codegen_pa.py
    • Added a new unit test file for paged attention codegen.
    • Implemented PagedAttention program with qk_matmul, pv_matmul, softmax_prepare, and online_update functions.
    • Demonstrated usage of new block.row_max, block.row_sum, block.row_expand_sub, block.row_expand_mul, block.row_expand_div, and block.fillpad operations.
    • Configured the test to use OptimizationStrategy.PTOAS.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for new axis reduction and padding operations to the 910B PTO backend, primarily for paged attention. The changes include new codegen logic in C++ and a comprehensive Python test case demonstrating the new functionality. Overall, the implementation is good, but I've identified a critical bug in the test file's casting logic that could lead to incorrect behavior. I've also noted some minor issues in the C++ code, such as an incorrect error message and swapped comment headers, which impact maintainability. Additionally, the new test could be strengthened by adding assertions to verify the generated code's correctness instead of just printing it.

@lwDavid lwDavid force-pushed the pa branch 3 times, most recently from 8705c00 to 059de30 Compare February 13, 2026 09:27
@lwDavid lwDavid marked this pull request as draft February 13, 2026 09:57
@lwDavid lwDavid changed the title feat(codegen): Add 910B PTO backend support for axis reduction and padding ops for paged attention feat(codegen): Add 910B PTO backend op support for paged attention Feb 14, 2026
@lwDavid lwDavid marked this pull request as ready for review February 14, 2026 02:17
@lwDavid
Copy link
Contributor Author

lwDavid commented Feb 14, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for several new operations for the 910B PTO backend, primarily to support paged attention. This includes row reduction and expansion ops, fillpad, and full. The changes span from the Python IR and language layers down to the C++ backend implementations for both CCE and PTO. A new test for paged attention codegen is also added.

My review focuses on improving the correctness and maintainability of the new C++ backend code and the new Python test. I've found a few confusing or incorrect error messages in the C++ code. More importantly, I've identified a bug in the new paged attention test related to data types and variable usage, along with opportunities to improve the test's effectiveness by adding assertions. I've provided suggestions to fix these issues.

Comment on lines +116 to +117
CHECK(op->args_.size() == 2) << "full op requires 3 arguments."
<< op->args_.size(); // Actually 2 args, two of them are conbined!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message in this CHECK is confusing. It states that full op requires 3 arguments, but the check is for op->args_.size() == 2. The comment also clarifies there are 2 arguments. The error message should be updated to reflect that 2 arguments are expected.

  CHECK(op->args_.size() == 2) << "full op requires 2 arguments, but got " << op->args_.size();

feat(codegen): Add ops for paged attention
@lwDavid
Copy link
Contributor Author

lwDavid commented Feb 14, 2026

@Hzfengsy Request review.

@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces a new block.fillpad operation across the IR, language, and backend layers. The IR operation is registered with type deduction logic, wrapped in a language-level API, and implemented in both CCE and PTO backends. Additionally, several new PTO codegen helpers support existing operations (full, transpose, row operations), and a new test module validates PTO code generation.

Changes

Cohort / File(s) Summary
IR and Language API
python/pypto/ir/op/block_ops.py, python/pypto/language/op/block_ops.py
Added fillpad function at IR level to emit block.fillpad operation; added public language wrapper; updated __all__ export. Also changed cast operation's keyword parameter from "target_dtype" to "target_type".
IR Operation Registration
src/ir/op/block_ops/elementwise.cpp
Registered block.fillpad operation with type deduction that validates a single TileType input and returns a TileType with matching shape and dtype.
Backend CCE Implementation
src/backend/910B_CCE/backend_910b_cce_ops.cpp
Registered block.fillpad operation for CCE backend using vertical pipeline and unary codegen handler emitting TFILLPAD.
Backend PTO Implementation
src/backend/910B_PTO/backend_910b_pto_ops.cpp
Added five new PTO codegen helpers (MakeFullCodegenPTO, MakeFillPadCodegenPTO, MakeTernaryDataMoveLayoutCodegenPTO, MakeBinaryAxisCodegenPTO) and registered multiple block operations (full, transpose, fillpad, row operations).
Test Coverage
tests/ut/codegen/test_pto_codegen_pa.py
New test module introducing PagedAttention class with kernel methods and Test910BBlockOpsCodegen class validating MLIR generation for PTO block operations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A new fillpad hops into the stack,
From IR down to backends, no turning back,
CCE and PTO both lend a hand,
With codegen helpers perfectly planned!
Tests bloom bright with paged attention's grace,
Completing this operation's embrace. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately describes the main objective: adding 910B PTO backend operation support for paged attention operations.
Description check ✅ Passed The pull request description is directly related to the changeset, detailing all major additions including row reduction/expansion ops, fillpad, full operation, block.cast fix, and paged attention test.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
tests/ut/codegen/test_pto_codegen_pa.py (1)

11-184: ⚠️ Potential issue | 🟡 Minor

Switch this test module to pytest instead of unittest.

This repo’s tests are pytest-based; using unittest.TestCase/unittest.main() bypasses pytest plugins and conventions.

🔧 Suggested update
-import unittest
+import pytest
@@
-class Test910BBlockOpsCodegen(unittest.TestCase):
+class Test910BBlockOpsCodegen:
@@
-if __name__ == "__main__":
-    unittest.main()
+if __name__ == "__main__":
+    pytest.main([__file__, "-v"])
#!/bin/bash
# Inspect existing test runner conventions.
rg -n "pytest\.main|unittest\.main" tests/ut
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ut/codegen/test_pto_codegen_pa.py` around lines 11 - 184, The test uses
unittest.TestCase and unittest.main which bypasses pytest; convert the
Test910BBlockOpsCodegen.test_block_ops_codegen into a pytest-style test
function. Remove "import unittest", replace the Test910BBlockOpsCodegen class
and its method with a top-level function named test_block_ops_codegen that calls
backend.reset_for_testing(), backend.set_backend_type(BackendType.PTO), builds
optimized_program via
PassManager.get_strategy(OptimizationStrategy.PTOAS).run_passes(PagedAttention),
constructs codegen.PTOCodegen(), iterates optimized_program.functions and prints
MLIR as before; also remove the if __name__ == "__main__": unittest.main() block
so pytest will discover the test. Ensure function and symbol names
(test_block_ops_codegen, PagedAttention, PassManager.get_strategy,
codegen.PTOCodegen, backend.reset_for_testing) remain referenced exactly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/pypto/ir/op/block_ops.py`:
- Around line 282-293: The type annotation for the parameter span in the
function fillpad uses Optional but Optional is not imported, causing a NameError
on import; update the annotation to use Python 3.10+ union syntax (Span | None)
or import Optional from typing, e.g., change the signature of fillpad (and any
other occurrences) from span: Optional[Span] to span: Span | None and keep
_get_span_or_capture(span) usage unchanged so the module imports cleanly.

In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp`:
- Around line 208-215: The CHECK message in MakeBinaryAxisCodegenPTO incorrectly
references "Fill pad" — update the CHECK in MakeBinaryAxisCodegenPTO (which
validates op->args_.size() == 2) to use a correct, descriptive message for
binary axis ops (e.g., reference pto_op_name or say "Binary axis op requires 2
arguments") so the error context is accurate when the check fails; ensure you
modify the CHECK call in this function (and not other helpers) to reflect the
proper operation name or generic "Binary axis op" text.
- Around line 115-125: The CHECK in MakeFullCodegenPTO is logging the wrong
expected count ("full op requires 3 arguments.") while the code actually expects
2; update the CHECK message associated with op->args_.size() in
MakeFullCodegenPTO (and/or its inline comment) to accurately state "full op
requires 2 arguments." (keep the existing size output op->args_.size() so the
runtime will still show the actual value).

---

Duplicate comments:
In `@tests/ut/codegen/test_pto_codegen_pa.py`:
- Around line 11-184: The test uses unittest.TestCase and unittest.main which
bypasses pytest; convert the Test910BBlockOpsCodegen.test_block_ops_codegen into
a pytest-style test function. Remove "import unittest", replace the
Test910BBlockOpsCodegen class and its method with a top-level function named
test_block_ops_codegen that calls backend.reset_for_testing(),
backend.set_backend_type(BackendType.PTO), builds optimized_program via
PassManager.get_strategy(OptimizationStrategy.PTOAS).run_passes(PagedAttention),
constructs codegen.PTOCodegen(), iterates optimized_program.functions and prints
MLIR as before; also remove the if __name__ == "__main__": unittest.main() block
so pytest will discover the test. Ensure function and symbol names
(test_block_ops_codegen, PagedAttention, PassManager.get_strategy,
codegen.PTOCodegen, backend.reset_for_testing) remain referenced exactly.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4c99f8 and d920754.

📒 Files selected for processing (6)
  • python/pypto/ir/op/block_ops.py
  • python/pypto/language/op/block_ops.py
  • src/backend/910B_CCE/backend_910b_cce_ops.cpp
  • src/backend/910B_PTO/backend_910b_pto_ops.cpp
  • src/ir/op/block_ops/elementwise.cpp
  • tests/ut/codegen/test_pto_codegen_pa.py

Comment on lines +115 to +125
// Helper function for full op
static std::string MakeFullCodegenPTO(const std::string& pto_op_name, const CallPtr& op,
codegen::CodegenBase& codegen_base) {
auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base);
CHECK(op->args_.size() == 2) << "full op requires 3 arguments."
<< op->args_.size(); // Actually 2 args, two of them are conbined!
std::string scalar = codegen.GetExprAsCode(op->args_[1]);
std::string dst = codegen.GetCurrentResultTarget();
codegen.Emit(pto_op_name + " " + "ins(" + scalar + ") outs(" + dst + ")");
return "";
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix misleading argument-count message in MakeFullCodegenPTO.

The CHECK message says “3 arguments” even though the code expects 2, which will confuse debugging.

✏️ Suggested fix
-  CHECK(op->args_.size() == 2) << "full op requires 3 arguments."
-                               << op->args_.size();  // Actually 2 args, two of them are conbined!
+  CHECK(op->args_.size() == 2) << "full op requires 2 arguments, got " << op->args_.size();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Helper function for full op
static std::string MakeFullCodegenPTO(const std::string& pto_op_name, const CallPtr& op,
codegen::CodegenBase& codegen_base) {
auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base);
CHECK(op->args_.size() == 2) << "full op requires 3 arguments."
<< op->args_.size(); // Actually 2 args, two of them are conbined!
std::string scalar = codegen.GetExprAsCode(op->args_[1]);
std::string dst = codegen.GetCurrentResultTarget();
codegen.Emit(pto_op_name + " " + "ins(" + scalar + ") outs(" + dst + ")");
return "";
}
// Helper function for full op
static std::string MakeFullCodegenPTO(const std::string& pto_op_name, const CallPtr& op,
codegen::CodegenBase& codegen_base) {
auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base);
CHECK(op->args_.size() == 2) << "full op requires 2 arguments, got " << op->args_.size();
std::string scalar = codegen.GetExprAsCode(op->args_[1]);
std::string dst = codegen.GetCurrentResultTarget();
codegen.Emit(pto_op_name + " " + "ins(" + scalar + ") outs(" + dst + ")");
return "";
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 115 - 125, The
CHECK in MakeFullCodegenPTO is logging the wrong expected count ("full op
requires 3 arguments.") while the code actually expects 2; update the CHECK
message associated with op->args_.size() in MakeFullCodegenPTO (and/or its
inline comment) to accurately state "full op requires 2 arguments." (keep the
existing size output op->args_.size() so the runtime will still show the actual
value).

@lwDavid lwDavid force-pushed the pa branch 2 times, most recently from d9143cc to b24ae44 Compare February 24, 2026 01:21
@lwDavid
Copy link
Contributor Author

lwDavid commented Feb 24, 2026

@Hzfengsy Request review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not use pa as a shortening; write paged_attn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hzfengsy Fixed.

@Hzfengsy Hzfengsy merged commit 7201a97 into hw-native-sys:main Feb 24, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants