-
Notifications
You must be signed in to change notification settings - Fork 14
Fix canReachLocInTime and Customize backtrack-config to Reduce II #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Modifying this logic is supposed to improve all the other mapping results, but why did you only change the mapping result of a nested loop? This may not pass the |
Done fixing. |
| // CTRL2DATA-NEXT: } | ||
|
|
||
| // MAPPING: func.func @_Z10bert_node1PA1_A1_A1_A1_A128_bPA1_A128_S1_(%arg0: memref<?x1x1x1x1x128xi8>, %arg1: memref<?x1x128x1x1x128xi8>) attributes {accelerator = "neura", dataflow_mode = "predicate", llvm.linkage = #llvm.linkage<external>, mapping_info = {compiled_ii = 10 : i32, mapping_mode = "spatial-temporal", mapping_strategy = "heuristic", rec_mii = 8 : i32, res_mii = 2 : i32, x_tiles = 4 : i32, y_tiles = 4 : i32}} { | ||
| // MAPPING: func.func @_Z10bert_node1PA1_A1_A1_A1_A128_bPA1_A128_S1_(%arg0: memref<?x1x1x1x1x128xi8>, %arg1: memref<?x1x128x1x1x128xi8>) attributes {accelerator = "neura", dataflow_mode = "predicate", llvm.linkage = #llvm.linkage<external>, mapping_info = {compiled_ii = 11 : i32, mapping_mode = "spatial-temporal", mapping_strategy = "heuristic", rec_mii = 8 : i32, res_mii = 2 : i32, x_tiles = 4 : i32, y_tiles = 4 : i32}} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiled II is abnormally larger, I will check the reasons.
|
|
||
| // Makes sure the link is not occupied. | ||
| if (!mapping_state.isAvailableAcrossTime(current_loc_out_link)) { | ||
| // Check if link is available at current time step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Check -> // Checks
|
Any update on this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR improves the spatial-temporal mapping by fixing a reachability pre-check bug and adjusting backtracking parameters. The core fix ensures canReachLocInTime properly accounts for register-based waiting, aligning it with the actual routing logic. The backtrack-config is changed from "simple" to "customized" (5,3 by default).
- Fixed
canReachLocInTimeto use BFS with register-based waiting support - Updated backtrack-config from "simple" to "customized" in test configurations
- Updated test expectations to reflect improved compiled II values across multiple benchmarks
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/NeuraDialect/Mapping/mapping_util.cpp | Refactored canReachLocInTime BFS to consider both link traversal and register-based waiting, matching the logic in tryRouteDataMove |
| test/c2llvm2mlir/nested_loop/test.mlir | Changed backtrack-config to customized; updated expected compiled_ii from 17 to 11 |
| test/controflow_fuse/perfect_nested/perfect_nested.mlir | Changed backtrack-config to customized; updated expected compiled_ii from 10 to 11 |
| test/mapping_quality/branch_for.mlir | Updated test comments and YAML/ASM checks for structural stability; updated expected compiled_ii from 5 to 4 |
| test/neura/fusion/test.mlir | Updated expected compiled_ii from 14 to 13 |
| test/neura/for_loop/relu_test.mlir | Updated expected mapping locations to reflect improved routing paths |
| test/e2e/bicg/bicg_kernel.mlir | Updated expected register assignments and instruction timestamps in YAML and ASM checks |
| .gitignore | Removed trailing "# misc" comment line |
Comments suppressed due to low confidence (1)
test/controflow_fuse/perfect_nested/perfect_nested.mlir:197
- The compiled_ii increased from 10 to 11 in this test, which appears to be a regression rather than an improvement. The PR description mentions that the nested_loop mapping improved from 17 to 11, but this perfect_nested test shows the opposite direction. Please verify if this regression is expected or if there's an issue with the changes. If this is an expected tradeoff, it should be documented in the PR description.
// MAPPING: func.func @_Z10bert_node1PA1_A1_A1_A1_A128_bPA1_A128_S1_(%arg0: memref<?x1x1x1x1x128xi8>, %arg1: memref<?x1x128x1x1x128xi8>) attributes {accelerator = "neura", dataflow_mode = "predicate", llvm.linkage = #llvm.linkage<external>, mapping_info = {compiled_ii = 10 : i32, mapping_mode = "spatial-temporal", mapping_strategy = "heuristic", rec_mii = 8 : i32, res_mii = 2 : i32, x_tiles = 4 : i32, y_tiles = 4 : i32}} {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Just updated the |
|
Identified some minor problems to be fixed. |
| } | ||
| int award = 2 * mapping_state.getII(); | ||
|
|
||
| // === Tile-based award (independent of time) === |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you summarize what you changed and the corresponding effect? no need to provide concrete example, but just description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redesign the award calculation algorithm to fix potential negative award
values and improve mapping quality.
Changes:
- Split award into tile-based and time-based components
- Add proximity bonus to producers (always non-negative)
- Add proximity bonus to backward users for better recurrence routing
- Grant critical ops additional II bonus and routing flexibility bonus
- Use time_bonus = latest_end_time_step - t for earlier scheduling
Now I am working with different sets of parameters for the award formula, and will update later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is latest_end_time_step determined?
I am working with different sets of parameters for the award formula
Can we do this in another PR? Or it has to be done here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may refer to the following code:
int latest_end_time_step = earliest_start_time_step + mapping_state.getII();
...
for (Operation *user : backward_users) {
...
latest_end_time_step =
std::min(latest_end_time_step,
backward_user_loc.time_step + mapping_state.getII());
...
}
I have updated the new changes in PR #215 .
|
Progress: Update the award function with register allocation consideration; regenerate test files with further improved II. |
…ck-config to 'customized'
d7d52e3 to
3be9e46
Compare
|
Further changes are presented in PR #215 . |
|
As updating all the test files for each PR is super time-consuming, could we merge the changes altogether in PR #215 ? thx : ) |
Summary
Reduced compiled II for the nested_loop mapping from 17 to 11.
Key change
canReachLocInTimewith the actual routing logic used bytryRouteDataMove.backtrack-configto customized (5,3).Results:
nested_loop mapping: The first change reducedcompiled_IIfrom 17 → 11 (rec_mii=9, res_mii=6). Using backtrack-config=customized (5,3) further reduced it to 11.