Skip to content

Comments

[WIP] feat(codegen): Add runtime stride-based tensor offset computation and…#197

Open
YunjiQin wants to merge 1 commit intohw-native-sys:mainfrom
YunjiQin:codegen_v2
Open

[WIP] feat(codegen): Add runtime stride-based tensor offset computation and…#197
YunjiQin wants to merge 1 commit intohw-native-sys:mainfrom
YunjiQin:codegen_v2

Conversation

@YunjiQin
Copy link
Contributor

Summary

  • Add Tensor struct pointer tracking in CodeContext
  • Add runtime stride-based tensor offset computation for block.load, block.store, block.l0c_store codegen
  • Implement codegen for tensor.dim operations
  • Support dynamic strides (-1) in GenerateStrideType
  • Update tests to reflect dynamic stride behavior

… codegen for tensor.dim

Implements dynamic stride-based offset computation for CCE codegen,
replacing compile-time row-major stride calculation with runtime
stride access from Tensor struct.

Key changes:
- Add Tensor struct pointer tracking in CodeContext
- Update block.load, block.store, block.l0c_store to use runtime strides
- Add codegen function for tensor.dim op
- Change GenerateStrideType to emit dynamic strides (-1)
- Update tests to reflect dynamic stride behavior
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @YunjiQin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the CCE code generation framework by introducing robust support for dynamic tensor operations. It enables the system to handle variable tensor shapes and memory layouts more effectively at runtime through stride-based offset computations and the ability to query tensor dimensions. This change improves the flexibility and adaptability of the generated code for various tensor processing scenarios.

Highlights

  • Dynamic Stride-Based Tensor Offset Computation: Implemented runtime stride-based offset calculation for block.load, block.store, and block.l0c_store operations, enabling more flexible tensor memory access.
  • Tensor Struct Pointer Tracking: Introduced a mechanism within CodeContext to track and manage Tensor struct pointers, which encapsulate buffer address, shape, and stride information.
  • Tensor Dimension Query Codegen: Added code generation support for the tensor.dim operation, allowing runtime querying of tensor dimensions.
  • Dynamic Stride Generation: Modified GenerateStrideType to produce dynamic stride placeholders (-1) for runtime resolution, rather than fixed compile-time values.
Changelog
  • include/pypto/codegen/cce/cce_codegen.h
    • Added declarations for GetTensorStruct and RegisterOutputTensorStruct methods.
    • Modified GenerateGlobalTensorTypeDeclaration to accept an optional tensor_struct_ptr.
  • include/pypto/codegen/cce/code_context.h
    • Declared RegisterTensorStruct and GetTensorStruct methods.
    • Added tensor_to_struct_pointer_ map to track tensor struct pointers.
  • include/pypto/codegen/cce/type_converter.h
    • Removed the private CalculateRowMajorStrides helper method.
    • Updated the GenerateStrideType signature.
  • src/backend/910B_CCE/backend_910b_cce_ops.cpp
    • Introduced a static helper function ComputeStrideBasedOffset for dynamic offset calculation.
    • Refactored MakeBlockLoadCodegenCCE, MakeBlockStoreCodegenCCE, and MakeBlockL0CStoreCodegenCCE to use the new stride-based offset computation and Tensor struct access.
    • Added RegisterOutputTensorStruct calls in MakeBlockStoreCodegenCCE and MakeBlockL0CStoreCodegenCCE.
    • Implemented MakeTensorDimCodegenCCE and registered it for the tensor.dim operation.
  • src/codegen/cce/cce_codegen.cpp
    • Updated GeneratePrologue to pass the tensor struct pointer during global tensor declaration.
    • Modified VisitStmt_ for IfStmt and ForStmt to ensure tensor struct pointer mappings are inherited.
    • Implemented GetTensorStruct and RegisterOutputTensorStruct methods.
    • Updated GenerateGlobalTensorTypeDeclaration to initialize GlobalTensor with dynamic strides from the tensor_struct_ptr.
  • src/codegen/cce/code_context.cpp
    • Implemented RegisterTensorStruct and GetTensorStruct methods.
    • Cleared the tensor_to_struct_pointer_ map in the Clear method.
  • src/codegen/cce/type_converter.cpp
    • Removed the implementation of CalculateRowMajorStrides.
    • Modified GenerateStrideType to generate -1 for dynamic strides instead of computed values.
  • tests/ut/codegen/test_type_converter.py
    • Updated test_generate_stride to assert expected dynamic stride patterns (-1) for various tensor shapes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant feature for runtime stride-based tensor offset computation, which is crucial for handling dynamic tensor shapes in the CCE codegen. The changes are well-organized across the codegen engine, backend operator implementations, and context management. The implementation of tensor.dim and the updates to block.load/store to use the new dynamic offset calculation are solid. My review has identified a few minor areas for improvement, including removing unused variables/parameters, correcting documentation to match implementation, and addressing a potential signed/unsigned comparison issue. All comments align with existing guidelines or are not covered by specific rules, and thus no modifications or removals were necessary.

Comment on lines +45 to +46
static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name,
ir::MakeTuplePtr offsets, const ir::TensorTypePtr& tensor_type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The tensor_type parameter is unused within this function. It's good practice to remove unused parameters to keep the function signature clean and improve code clarity.

Suggested change
static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name,
ir::MakeTuplePtr offsets, const ir::TensorTypePtr& tensor_type) {
static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name,
ir::MakeTuplePtr offsets) {

static std::string MakeTensorDimCodegenCCE(const ir::CallPtr& op, codegen::CodegenBase& codegen_base) {
auto& codegen = dynamic_cast<codegen::CCECodegen&>(codegen_base);
std::string target_var = codegen.GetCurrentResultTarget();
std::string input_var = codegen.GetExprAsCode(op->args_[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable input_var is initialized but never used. It should be removed to avoid confusion and clean up the code.

}
if (tensor_struct_ptr.has_value()) {
global_instance << ", {}, {";
for (int i = 0; i < shape_dims.size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The loop counter i is of type int while shape_dims.size() returns size_t. This can lead to signed/unsigned comparison warnings from the compiler and is generally not safe. It's best practice to use size_t for loop counters that iterate over container sizes.

Suggested change
for (int i = 0; i < shape_dims.size(); i++) {
for (size_t i = 0; i < shape_dims.size(); i++) {

Comment on lines +101 to +103
* Returns the Tensor struct pointer name that should be used for accessing
* buffer address and stride information. If no mapping exists, returns the
* input tensor_var_name itself (for compatibility).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation for GetTensorStruct is inconsistent with its implementation. The documentation states that it returns the input tensor_var_name if no mapping exists, but the implementation in code_context.cpp throws an error using CHECK. The documentation should be updated to match the implementation's fail-fast behavior, which is safer.

Suggested change
* Returns the Tensor struct pointer name that should be used for accessing
* buffer address and stride information. If no mapping exists, returns the
* input tensor_var_name itself (for compatibility).
* Returns the Tensor struct pointer name that should be used for accessing
* buffer address and stride information. Throws an error if no mapping exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant