feat(codegen): Add 910B PTO backend op support for paged attention#195
feat(codegen): Add 910B PTO backend op support for paged attention#195Hzfengsy merged 3 commits intohw-native-sys:mainfrom
Conversation
Summary of ChangesHello @lwDavid, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the 910B PTO backend's capabilities by integrating support for several new tensor operations critical for optimizing deep learning workloads, particularly paged attention. It introduces codegen for various row-wise reduction and expansion operations, as well as a fillpad operation. A minor but important fix to a parameter name in the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds support for new axis reduction and padding operations to the 910B PTO backend, primarily for paged attention. The changes include new codegen logic in C++ and a comprehensive Python test case demonstrating the new functionality. Overall, the implementation is good, but I've identified a critical bug in the test file's casting logic that could lead to incorrect behavior. I've also noted some minor issues in the C++ code, such as an incorrect error message and swapped comment headers, which impact maintainability. Additionally, the new test could be strengthened by adding assertions to verify the generated code's correctness instead of just printing it.
8705c00 to
059de30
Compare
…dding ops for paged attention
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds support for several new operations for the 910B PTO backend, primarily to support paged attention. This includes row reduction and expansion ops, fillpad, and full. The changes span from the Python IR and language layers down to the C++ backend implementations for both CCE and PTO. A new test for paged attention codegen is also added.
My review focuses on improving the correctness and maintainability of the new C++ backend code and the new Python test. I've found a few confusing or incorrect error messages in the C++ code. More importantly, I've identified a bug in the new paged attention test related to data types and variable usage, along with opportunities to improve the test's effectiveness by adding assertions. I've provided suggestions to fix these issues.
| CHECK(op->args_.size() == 2) << "full op requires 3 arguments." | ||
| << op->args_.size(); // Actually 2 args, two of them are conbined! |
There was a problem hiding this comment.
The error message in this CHECK is confusing. It states that full op requires 3 arguments, but the check is for op->args_.size() == 2. The comment also clarifies there are 2 arguments. The error message should be updated to reflect that 2 arguments are expected.
CHECK(op->args_.size() == 2) << "full op requires 2 arguments, but got " << op->args_.size();feat(codegen): Add ops for paged attention
|
@Hzfengsy Request review. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR introduces a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
tests/ut/codegen/test_pto_codegen_pa.py (1)
11-184:⚠️ Potential issue | 🟡 MinorSwitch this test module to pytest instead of unittest.
This repo’s tests are pytest-based; using
unittest.TestCase/unittest.main()bypasses pytest plugins and conventions.🔧 Suggested update
-import unittest +import pytest @@ -class Test910BBlockOpsCodegen(unittest.TestCase): +class Test910BBlockOpsCodegen: @@ -if __name__ == "__main__": - unittest.main() +if __name__ == "__main__": + pytest.main([__file__, "-v"])#!/bin/bash # Inspect existing test runner conventions. rg -n "pytest\.main|unittest\.main" tests/ut🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ut/codegen/test_pto_codegen_pa.py` around lines 11 - 184, The test uses unittest.TestCase and unittest.main which bypasses pytest; convert the Test910BBlockOpsCodegen.test_block_ops_codegen into a pytest-style test function. Remove "import unittest", replace the Test910BBlockOpsCodegen class and its method with a top-level function named test_block_ops_codegen that calls backend.reset_for_testing(), backend.set_backend_type(BackendType.PTO), builds optimized_program via PassManager.get_strategy(OptimizationStrategy.PTOAS).run_passes(PagedAttention), constructs codegen.PTOCodegen(), iterates optimized_program.functions and prints MLIR as before; also remove the if __name__ == "__main__": unittest.main() block so pytest will discover the test. Ensure function and symbol names (test_block_ops_codegen, PagedAttention, PassManager.get_strategy, codegen.PTOCodegen, backend.reset_for_testing) remain referenced exactly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/pypto/ir/op/block_ops.py`:
- Around line 282-293: The type annotation for the parameter span in the
function fillpad uses Optional but Optional is not imported, causing a NameError
on import; update the annotation to use Python 3.10+ union syntax (Span | None)
or import Optional from typing, e.g., change the signature of fillpad (and any
other occurrences) from span: Optional[Span] to span: Span | None and keep
_get_span_or_capture(span) usage unchanged so the module imports cleanly.
In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp`:
- Around line 208-215: The CHECK message in MakeBinaryAxisCodegenPTO incorrectly
references "Fill pad" — update the CHECK in MakeBinaryAxisCodegenPTO (which
validates op->args_.size() == 2) to use a correct, descriptive message for
binary axis ops (e.g., reference pto_op_name or say "Binary axis op requires 2
arguments") so the error context is accurate when the check fails; ensure you
modify the CHECK call in this function (and not other helpers) to reflect the
proper operation name or generic "Binary axis op" text.
- Around line 115-125: The CHECK in MakeFullCodegenPTO is logging the wrong
expected count ("full op requires 3 arguments.") while the code actually expects
2; update the CHECK message associated with op->args_.size() in
MakeFullCodegenPTO (and/or its inline comment) to accurately state "full op
requires 2 arguments." (keep the existing size output op->args_.size() so the
runtime will still show the actual value).
---
Duplicate comments:
In `@tests/ut/codegen/test_pto_codegen_pa.py`:
- Around line 11-184: The test uses unittest.TestCase and unittest.main which
bypasses pytest; convert the Test910BBlockOpsCodegen.test_block_ops_codegen into
a pytest-style test function. Remove "import unittest", replace the
Test910BBlockOpsCodegen class and its method with a top-level function named
test_block_ops_codegen that calls backend.reset_for_testing(),
backend.set_backend_type(BackendType.PTO), builds optimized_program via
PassManager.get_strategy(OptimizationStrategy.PTOAS).run_passes(PagedAttention),
constructs codegen.PTOCodegen(), iterates optimized_program.functions and prints
MLIR as before; also remove the if __name__ == "__main__": unittest.main() block
so pytest will discover the test. Ensure function and symbol names
(test_block_ops_codegen, PagedAttention, PassManager.get_strategy,
codegen.PTOCodegen, backend.reset_for_testing) remain referenced exactly.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
python/pypto/ir/op/block_ops.pypython/pypto/language/op/block_ops.pysrc/backend/910B_CCE/backend_910b_cce_ops.cppsrc/backend/910B_PTO/backend_910b_pto_ops.cppsrc/ir/op/block_ops/elementwise.cpptests/ut/codegen/test_pto_codegen_pa.py
| // Helper function for full op | ||
| static std::string MakeFullCodegenPTO(const std::string& pto_op_name, const CallPtr& op, | ||
| codegen::CodegenBase& codegen_base) { | ||
| auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base); | ||
| CHECK(op->args_.size() == 2) << "full op requires 3 arguments." | ||
| << op->args_.size(); // Actually 2 args, two of them are conbined! | ||
| std::string scalar = codegen.GetExprAsCode(op->args_[1]); | ||
| std::string dst = codegen.GetCurrentResultTarget(); | ||
| codegen.Emit(pto_op_name + " " + "ins(" + scalar + ") outs(" + dst + ")"); | ||
| return ""; | ||
| } |
There was a problem hiding this comment.
Fix misleading argument-count message in MakeFullCodegenPTO.
The CHECK message says “3 arguments” even though the code expects 2, which will confuse debugging.
✏️ Suggested fix
- CHECK(op->args_.size() == 2) << "full op requires 3 arguments."
- << op->args_.size(); // Actually 2 args, two of them are conbined!
+ CHECK(op->args_.size() == 2) << "full op requires 2 arguments, got " << op->args_.size();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Helper function for full op | |
| static std::string MakeFullCodegenPTO(const std::string& pto_op_name, const CallPtr& op, | |
| codegen::CodegenBase& codegen_base) { | |
| auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base); | |
| CHECK(op->args_.size() == 2) << "full op requires 3 arguments." | |
| << op->args_.size(); // Actually 2 args, two of them are conbined! | |
| std::string scalar = codegen.GetExprAsCode(op->args_[1]); | |
| std::string dst = codegen.GetCurrentResultTarget(); | |
| codegen.Emit(pto_op_name + " " + "ins(" + scalar + ") outs(" + dst + ")"); | |
| return ""; | |
| } | |
| // Helper function for full op | |
| static std::string MakeFullCodegenPTO(const std::string& pto_op_name, const CallPtr& op, | |
| codegen::CodegenBase& codegen_base) { | |
| auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base); | |
| CHECK(op->args_.size() == 2) << "full op requires 2 arguments, got " << op->args_.size(); | |
| std::string scalar = codegen.GetExprAsCode(op->args_[1]); | |
| std::string dst = codegen.GetCurrentResultTarget(); | |
| codegen.Emit(pto_op_name + " " + "ins(" + scalar + ") outs(" + dst + ")"); | |
| return ""; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/910B_PTO/backend_910b_pto_ops.cpp` around lines 115 - 125, The
CHECK in MakeFullCodegenPTO is logging the wrong expected count ("full op
requires 3 arguments.") while the code actually expects 2; update the CHECK
message associated with op->args_.size() in MakeFullCodegenPTO (and/or its
inline comment) to accurately state "full op requires 2 arguments." (keep the
existing size output op->args_.size() so the runtime will still show the actual
value).
d9143cc to
b24ae44
Compare
|
@Hzfengsy Request review. |
There was a problem hiding this comment.
do not use pa as a shortening; write paged_attn
feat(codegen): Add 910B PTO backend op support for paged attention