Skip to content

Conversation

@guosran
Copy link
Collaborator

@guosran guosran commented Nov 25, 2025

Summary

Reduced compiled II for the nested_loop mapping from 17 to 11.

Key change

  • Fixed the reachability pre-check so it accounts for register-based waiting. This aligns canReachLocInTime with the actual routing logic used by tryRouteDataMove.
  • Changed backtrack-config to customized (5,3).

Results:

  • nested_loop mapping: The first change reduced compiled_II from 17 → 11 (rec_mii=9, res_mii=6). Using backtrack-config=customized (5,3) further reduced it to 11.
  • (5,3) is a good compromise. Larger backtrack settings increase compile time without further lowering II on this benchmark.

@ShangkunLi
Copy link
Collaborator

ShangkunLi commented Nov 26, 2025

Modifying this logic is supposed to improve all the other mapping results, but why did you only change the mapping result of a nested loop? This may not pass the llvm-lit test.

@guosran
Copy link
Collaborator Author

guosran commented Nov 26, 2025

Modifying this logic is supposed to improve all the other mapping results, but why did you only change the mapping result of a nested loop? This may not pass the llvm-lit test.

Done fixing.

// CTRL2DATA-NEXT: }

// MAPPING: func.func @_Z10bert_node1PA1_A1_A1_A1_A128_bPA1_A128_S1_(%arg0: memref<?x1x1x1x1x128xi8>, %arg1: memref<?x1x128x1x1x128xi8>) attributes {accelerator = "neura", dataflow_mode = "predicate", llvm.linkage = #llvm.linkage<external>, mapping_info = {compiled_ii = 10 : i32, mapping_mode = "spatial-temporal", mapping_strategy = "heuristic", rec_mii = 8 : i32, res_mii = 2 : i32, x_tiles = 4 : i32, y_tiles = 4 : i32}} {
// MAPPING: func.func @_Z10bert_node1PA1_A1_A1_A1_A128_bPA1_A128_S1_(%arg0: memref<?x1x1x1x1x128xi8>, %arg1: memref<?x1x128x1x1x128xi8>) attributes {accelerator = "neura", dataflow_mode = "predicate", llvm.linkage = #llvm.linkage<external>, mapping_info = {compiled_ii = 11 : i32, mapping_mode = "spatial-temporal", mapping_strategy = "heuristic", rec_mii = 8 : i32, res_mii = 2 : i32, x_tiles = 4 : i32, y_tiles = 4 : i32}} {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiled II is abnormally larger, I will check the reasons.

@guosran guosran marked this pull request as draft November 27, 2025 04:29

// Makes sure the link is not occupied.
if (!mapping_state.isAvailableAcrossTime(current_loc_out_link)) {
// Check if link is available at current time step.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Check -> // Checks

@tancheng
Copy link
Contributor

Any update on this PR?

@guosran guosran marked this pull request as ready for review December 18, 2025 04:15
Copilot AI review requested due to automatic review settings December 18, 2025 04:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the spatial-temporal mapping by fixing a reachability pre-check bug and adjusting backtracking parameters. The core fix ensures canReachLocInTime properly accounts for register-based waiting, aligning it with the actual routing logic. The backtrack-config is changed from "simple" to "customized" (5,3 by default).

  • Fixed canReachLocInTime to use BFS with register-based waiting support
  • Updated backtrack-config from "simple" to "customized" in test configurations
  • Updated test expectations to reflect improved compiled II values across multiple benchmarks

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
lib/NeuraDialect/Mapping/mapping_util.cpp Refactored canReachLocInTime BFS to consider both link traversal and register-based waiting, matching the logic in tryRouteDataMove
test/c2llvm2mlir/nested_loop/test.mlir Changed backtrack-config to customized; updated expected compiled_ii from 17 to 11
test/controflow_fuse/perfect_nested/perfect_nested.mlir Changed backtrack-config to customized; updated expected compiled_ii from 10 to 11
test/mapping_quality/branch_for.mlir Updated test comments and YAML/ASM checks for structural stability; updated expected compiled_ii from 5 to 4
test/neura/fusion/test.mlir Updated expected compiled_ii from 14 to 13
test/neura/for_loop/relu_test.mlir Updated expected mapping locations to reflect improved routing paths
test/e2e/bicg/bicg_kernel.mlir Updated expected register assignments and instruction timestamps in YAML and ASM checks
.gitignore Removed trailing "# misc" comment line
Comments suppressed due to low confidence (1)

test/controflow_fuse/perfect_nested/perfect_nested.mlir:197

  • The compiled_ii increased from 10 to 11 in this test, which appears to be a regression rather than an improvement. The PR description mentions that the nested_loop mapping improved from 17 to 11, but this perfect_nested test shows the opposite direction. Please verify if this regression is expected or if there's an issue with the changes. If this is an expected tradeoff, it should be documented in the PR description.
// MAPPING:      func.func @_Z10bert_node1PA1_A1_A1_A1_A128_bPA1_A128_S1_(%arg0: memref<?x1x1x1x1x128xi8>, %arg1: memref<?x1x128x1x1x128xi8>) attributes {accelerator = "neura", dataflow_mode = "predicate", llvm.linkage = #llvm.linkage<external>, mapping_info = {compiled_ii = 10 : i32, mapping_mode = "spatial-temporal", mapping_strategy = "heuristic", rec_mii = 8 : i32, res_mii = 2 : i32, x_tiles = 4 : i32, y_tiles = 4 : i32}} {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@guosran
Copy link
Collaborator Author

guosran commented Dec 18, 2025

Any update on this PR?

Just updated the calculateAward function to resolve the unexpected compiled_ii increment.

@guosran guosran marked this pull request as draft December 18, 2025 04:31
@guosran
Copy link
Collaborator Author

guosran commented Dec 18, 2025

Identified some minor problems to be fixed.

}
int award = 2 * mapping_state.getII();

// === Tile-based award (independent of time) ===
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you summarize what you changed and the corresponding effect? no need to provide concrete example, but just description.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redesign the award calculation algorithm to fix potential negative award
values and improve mapping quality.

Changes:

  • Split award into tile-based and time-based components
  • Add proximity bonus to producers (always non-negative)
  • Add proximity bonus to backward users for better recurrence routing
  • Grant critical ops additional II bonus and routing flexibility bonus
  • Use time_bonus = latest_end_time_step - t for earlier scheduling

Now I am working with different sets of parameters for the award formula, and will update later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is latest_end_time_step determined?

I am working with different sets of parameters for the award formula

Can we do this in another PR? Or it has to be done here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may refer to the following code:

    int latest_end_time_step = earliest_start_time_step + mapping_state.getII();
    ...
    for (Operation *user : backward_users) {
      ...
      latest_end_time_step =
          std::min(latest_end_time_step,
                   backward_user_loc.time_step + mapping_state.getII());
      ...
    }

I have updated the new changes in PR #215 .

@guosran
Copy link
Collaborator Author

guosran commented Dec 18, 2025

Progress: Update the award function with register allocation consideration; regenerate test files with further improved II.
Will finish updating the test files tmr.

@guosran
Copy link
Collaborator Author

guosran commented Dec 19, 2025

Further changes are presented in PR #215 .

@guosran guosran closed this Dec 19, 2025
@guosran
Copy link
Collaborator Author

guosran commented Dec 19, 2025

As updating all the test files for each PR is super time-consuming, could we merge the changes altogether in PR #215 ? thx : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants