Skip to content

feat(codegen): Add kernel wrapper generation for PTO backend#272

Merged
Hzfengsy merged 1 commit intohw-native-sys:mainfrom
zhangqi-chen:ptoas
Feb 26, 2026
Merged

feat(codegen): Add kernel wrapper generation for PTO backend#272
Hzfengsy merged 1 commit intohw-native-sys:mainfrom
zhangqi-chen:ptoas

Conversation

@zhangqi-chen
Copy link
Contributor

Extract PTO codegen logic into dedicated pto_codegen module with kernel wrapper generation, config file generation, and PTOAS output preprocessing. Expose OrchestrationResult, GenerateOrchestration, and InferFunctionCoreType through C++ bindings and type stubs. Add PTOAS strategy to pass manager with ConvertToSSA, FlattenCallExpr, and RunVerifier passes.

@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The pull request refactors the PTO backend code generation and compilation pipeline by extracting PTOAS handling from compile.py into a dedicated pto_codegen.py module, adding orchestration code generation support to C++ bindings, updating the pass manager with new optimization passes, and providing comprehensive test coverage for the new implementation.

Changes

Cohort / File(s) Summary
Documentation & Examples
docs/dev/codegen/00-pto_codegen.md, examples/ir_parser/vector_example_dag.py
Added documentation describing kernel wrapper generation pipeline for PTO backend; updated example to use PTO backend and PTOAS optimization strategy.
PTO Code Generation
python/pypto/ir/pto_codegen.py
New module implementing PTO backend code generation with entry point generate(), PTOAS invocation, output preprocessing, kernel wrapper generation with argument unpacking, and error handling.
Compilation Pipeline
python/pypto/ir/compile.py
Refactored to consolidate PTOAS handling by removing direct subprocess invocation and replacing with call to generate() from pto_codegen; introduced unified _write_files() helper for output persistence.
Optimization Passes
python/pypto/ir/pass_manager.py
Added ConvertToSSA, FlattenCallExpr, and RunVerifier passes to PTOAS strategy before InitMemRef, expanding pass sequence from 3 to 6 passes.
C++ Bindings & API Surface
python/bindings/modules/codegen.cpp, python/pypto/pypto_core/codegen.pyi
Exposed orchestration code generation API with new class OrchestrationResult and free functions generate_orchestration() and infer_function_core_type() for backend-agnostic orchestration support.
Tests
tests/ut/codegen/test_pto_codegen.py, tests/ut/ir/transforms/test_pass_manager.py
Added extensive test coverage for PTOAS output preprocessing, argument unpacking generation, kernel wrapper construction; updated pass manager tests to validate expanded PTOAS strategy.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Compile as compile.py
    participant PtoCg as pto_codegen.py
    participant PTO as PTOCodegen
    participant Ptoas as PTOAS Tool
    participant Orch as orchestration<br/>(shared codegen)
    participant Output as Output Files

    App->>Compile: compile with PTO backend
    Compile->>PtoCg: generate(program, output_dir)
    
    PtoCg->>PTO: for each InCore function
    PTO->>PTO: create .pto representation
    
    PtoCg->>Ptoas: invoke with .pto
    Ptoas->>Ptoas: compile to C++
    Ptoas-->>PtoCg: return .cpp output
    
    PtoCg->>PtoCg: _preprocess_ptoas_output()
    PtoCg->>PtoCg: _generate_arg_unpacking()
    PtoCg->>PtoCg: _generate_kernel_wrapper()
    
    PtoCg->>Orch: generate_orchestration(program, func)
    Orch-->>PtoCg: orchestration code + metadata
    
    PtoCg->>Output: _write_files() all artifacts
    Output-->>Compile: files dict
    
    Compile->>Output: persist to ptoas/, kernels/aiv/
    Output-->>App: build output
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • Hzfengsy

Poem

🐰 The PTO paths now shine so bright,
With orchestration wired just right,
Kernel wrappers dance in ptoas delight,
Passes multiply—three becomes six tonight!
From compile to codegen, we hop with might! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 48.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: kernel wrapper generation for the PTO backend, which is the primary focus across multiple modified files.
Description check ✅ Passed The description is directly related to the changeset, providing specific details about the extracted pto_codegen module, exposed bindings, and added PTOAS strategy passes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @zhangqi-chen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the PyPTO IR compiler's PTO backend by introducing a comprehensive kernel wrapper generation mechanism and refactoring its code generation logic into a dedicated module. These changes streamline the compilation process for PTOAS, ensure compatibility with the CCE/orchestration calling convention, and provide clearer separation of concerns. Additionally, new C++ bindings for orchestration functions and an enriched PTOAS optimization strategy contribute to a more robust and flexible compilation framework.

Highlights

  • PTO Backend Kernel Wrapper Generation: Implemented automatic kernel wrapper generation for InCore functions when compiling with the PTO backend, bridging ptoas output to the CCE/orchestration calling convention. This includes preprocessing ptoas code and creating a kernel_entry wrapper for argument unpacking.
  • Dedicated PTO Codegen Module: Extracted the PTO backend code generation logic into a new dedicated Python module (python/pypto/ir/pto_codegen.py), centralizing the generation of kernel wrappers, orchestration code, and configuration files.
  • C++ Bindings for Orchestration: Exposed OrchestrationResult, generate_orchestration, and infer_function_core_type through C++ bindings and Python type stubs, allowing for backend-agnostic orchestration code generation.
  • Enhanced PTOAS Optimization Strategy: Added three new passes (ConvertToSSA, FlattenCallExpr, and RunVerifier) to the OptimizationStrategy.PTOAS in the pass manager, improving the IR transformation pipeline for the PTO backend.
Changelog
  • docs/dev/codegen/00-pto_codegen.md
    • Added a new section detailing Kernel Wrapper Generation for the PTO Backend, including its pipeline, output structure, argument unpacking conventions, and implementation module.
    • Updated the 'See Also' section to reflect the new content.
  • examples/ir_parser/vector_example_dag.py
    • Updated the example to use OptimizationStrategy.PTOAS and BackendType.PTO for compilation, demonstrating the new PTO backend integration.
  • python/bindings/modules/codegen.cpp
    • Included the orchestration_codegen.h header.
    • Bound the OrchestrationResult class, exposing its code, func_name_to_id, and func_name_to_core_type members.
    • Added Python definitions for generate_orchestration and infer_function_core_type functions, making them accessible from Python.
  • python/pypto/ir/compile.py
    • Removed shutil and subprocess imports, as direct ptoas execution logic was moved.
    • Deleted the _run_ptoas helper function.
    • Introduced a new _write_files helper function to handle writing multiple files to the output directory.
    • Refactored the compile function to import and utilize the new pto_codegen.generate function for PTO backend compilation, simplifying the main compilation flow.
  • python/pypto/ir/pass_manager.py
    • Added ConvertToSSA, FlattenCallExpr, and RunVerifier passes to the OptimizationStrategy.PTOAS pass list.
  • python/pypto/ir/pto_codegen.py
    • Added a new module for PTO backend code generation.
    • Implemented _run_ptoas to execute the ptoas tool.
    • Created _preprocess_ptoas_output to modify ptoas generated C++ code for wrapper integration.
    • Developed _generate_arg_unpacking to create C++ code for unpacking kernel arguments.
    • Implemented _generate_kernel_wrapper to assemble the final CCE-compatible kernel wrapper.
    • Added _generate_config_file to produce kernel_config.py.
    • Provided the main generate function to orchestrate the creation of all PTO backend output files.
  • python/pypto/pypto_core/codegen.pyi
    • Imported CoreType and Function from pypto.pypto_core.ir.
    • Defined the OrchestrationResult class with code, func_name_to_id, and func_name_to_core_type properties.
    • Added type hints for generate_orchestration and infer_function_core_type functions.
    • Updated __all__ to include the new OrchestrationResult, generate_orchestration, and infer_function_core_type symbols.
  • tests/ut/codegen/test_pto_codegen.py
    • Imported DataType, ir, IRBuilder, and block for new test utilities.
    • Imported specific functions from pypto.ir.pto_codegen for unit testing.
    • Added SAMPLE_PTOAS_OUTPUT string for testing preprocessing.
    • Created _make_func helper to easily construct ir.Function objects for testing.
    • Introduced TestPreprocessPtoasOutput class with tests for stripping includes, using namespace, replacing __global__ AICORE, and preserving function body/helpers.
    • Added TestGenerateArgUnpacking class with tests for tensor-only, mixed tensor/scalar, and scalar-only argument unpacking scenarios.
    • Implemented TestGenerateKernelWrapper class with tests for the presence of kernel_entry, includes, forward calls, static modification of ptoas code, and no duplicate includes.
  • tests/ut/ir/transforms/test_pass_manager.py
    • Updated test_pass_manager_get_strategy_ptoas to assert the correct number and names of passes (now 6) for the PTOAS strategy.
    • Modified the comment in test_run_passes_on_program_with_ptoa_strategy to reflect the newly added passes for the PTOAS strategy.
Activity
  • No specific pull request activity (comments, reviews, progress updates) was provided in the context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-structured refactoring for the PTO backend codegen. Extracting the logic into a new pto_codegen.py module is a good design choice that improves modularity. The new kernel wrapper generation is a crucial feature that bridges the gap between ptoas output and the CCE calling convention. The added C++ bindings and corresponding tests are also valuable additions.

I've identified one critical issue regarding a hardcoded path that could lead to runtime failures, along with a couple of medium-severity suggestions for improving the robustness of the new module. Overall, this is a solid feature addition, and addressing these points will enhance its reliability.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
examples/ir_parser/vector_example_dag.py (1)

136-142: Minor: Inconsistent step numbers in warning messages.

The warning messages use [5] but appear under different logical steps:

  • Line 137 is checking the kernel directory (Step 4), so should use [4]
  • Line 142 is checking the orchestration file (Step 5), so [5] is correct
📝 Suggested fix
     kernel_dir = os.path.join(output_dir, "kernels")
     if not os.path.isdir(kernel_dir):
-        print(f"\n[5] Warning: {kernel_dir} not found")
+        print(f"\n[4] Warning: {kernel_dir} not found")

     # Step 5: Show orchestration code
     orch_file = os.path.join(output_dir, "orchestration", "orch_vector.cpp")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ir_parser/vector_example_dag.py` around lines 136 - 142, The warning
for the kernel directory uses the wrong step number: update the print for the
kernel_dir existence check to use "[4]" instead of "[5]" so it matches the
logical Step 4 (locate the kernel_dir variable in the block that checks
os.path.isdir(kernel_dir)); leave the orch_file check and its "[5]" warning
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/pypto/ir/pto_codegen.py`:
- Around line 279-281: The wrapper emission always writes to "kernels/aiv/..."
in _generate_kernel_wrapper handling (assignment to kernel_rel and
result_files), but _generate_config_file uses func_name_to_core_type to resolve
source paths (which may be "aic"), causing path mismatches; change the
kernel_rel computation to pick the core type for this function from
func_name_to_core_type (e.g., core = func_name_to_core_type.get(func.name,
<sane-default>)) and join os.path.join("kernels", core, f"{func.name}.cpp")
before assigning result_files[kernel_rel] so emitted wrapper paths match the
config lookup used by _generate_config_file.

---

Nitpick comments:
In `@examples/ir_parser/vector_example_dag.py`:
- Around line 136-142: The warning for the kernel directory uses the wrong step
number: update the print for the kernel_dir existence check to use "[4]" instead
of "[5]" so it matches the logical Step 4 (locate the kernel_dir variable in the
block that checks os.path.isdir(kernel_dir)); leave the orch_file check and its
"[5]" warning unchanged.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a207464 and 33f166b.

📒 Files selected for processing (9)
  • docs/dev/codegen/00-pto_codegen.md
  • examples/ir_parser/vector_example_dag.py
  • python/bindings/modules/codegen.cpp
  • python/pypto/ir/compile.py
  • python/pypto/ir/pass_manager.py
  • python/pypto/ir/pto_codegen.py
  • python/pypto/pypto_core/codegen.pyi
  • tests/ut/codegen/test_pto_codegen.py
  • tests/ut/ir/transforms/test_pass_manager.py

Extract PTO codegen logic into dedicated pto_codegen module with kernel
wrapper generation, config file generation, and PTOAS output preprocessing.
Expose OrchestrationResult, GenerateOrchestration, and InferFunctionCoreType
through C++ bindings and type stubs. Add PTOAS strategy to pass manager
with ConvertToSSA, FlattenCallExpr, and RunVerifier passes.
@zhangqi-chen
Copy link
Contributor Author

@Hzfengsy

@Hzfengsy Hzfengsy merged commit 382d787 into hw-native-sys:main Feb 26, 2026
6 checks passed
@zhangqi-chen zhangqi-chen deleted the ptoas branch February 26, 2026 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants