-
Notifications
You must be signed in to change notification settings - Fork 54
refactor[next-dace]: New Optimization Scheme in Intra-Map Optimization #2457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor[next-dace]: New Optimization Scheme in Intra-Map Optimization #2457
Conversation
NOT WORKING: 5.7612245082855225
DOES NOT WORK: 5.77982020
NOT WORKING: 5.89192s
SEEMS WORKING: 4.57165s
…tion. But it also has an additional simplify that was present when TF was run in stage 1, but not in the other version. PERFORMANCE: 4.5106589s
…n_map_dataflow_optimization_order_public
|
cscs-ci run |
|
cscs-ci run |
src/gt4py/next/program_processors/runners/dace/transformations/auto_optimize.py
Outdated
Show resolved
Hide resolved
By default it is off.
src/gt4py/next/program_processors/runners/dace/transformations/auto_optimize.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace/transformations/auto_optimize.py
Outdated
Show resolved
Hide resolved
edopao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I would be very surprised if this PR had a performance impact. It only changes the order of applying LoopBlocking and it adds TaskletFusion, but both transformations are disabled in blueline. |


This adds
TaskletFusionto the intra Map dataflow optimization stage of the optimizer. This transformation merges Tasklets together, which reduces the number of Tasklets and transient scalars inside a Map. Emprical observations showed that this is highly beneficial to the Graupel implementation in ICON4Py. However, the transformation is not run by default but has to be turned on explicitly.