Skip to content

Conversation

@philip-paul-mueller
Copy link
Contributor

@philip-paul-mueller philip-paul-mueller commented Jan 23, 2026

This adds TaskletFusion to the intra Map dataflow optimization stage of the optimizer. This transformation merges Tasklets together, which reduces the number of Tasklets and transient scalars inside a Map. Emprical observations showed that this is highly beneficial to the Graupel implementation in ICON4Py. However, the transformation is not run by default but has to be turned on explicitly.

NOT WORKING: 5.7612245082855225
DOES NOT WORK: 5.77982020
…tion.

But it also has an additional simplify that was present when TF was run in stage 1, but not in the other version.

PERFORMANCE: 4.5106589s
@philip-paul-mueller philip-paul-mueller changed the title DO NOT MERGE: refactor[dace-next]: New Optimization Scheme in Intra-Map Optimization refactor[dace-next]: New Optimization Scheme in Intra-Map Optimization Jan 26, 2026
@philip-paul-mueller philip-paul-mueller marked this pull request as ready for review January 26, 2026 10:17
@philip-paul-mueller
Copy link
Contributor Author

It seems that this change introduces a 5% performance penalty in compute_advection_in_horizontal_momentum_equation().
To ensure that it is not indeterministic behaviour I lowered it twice, but it persisted.
Due to the importance of that kernel, we have to handle it in some way, maybe add an option compact_tasklets or so.

bench_blueline_stencil_compute

@philip-paul-mueller
Copy link
Contributor Author

cscs-ci run

@edopao edopao changed the title refactor[dace-next]: New Optimization Scheme in Intra-Map Optimization refactor[next-dace]: New Optimization Scheme in Intra-Map Optimization Jan 29, 2026
@philip-paul-mueller
Copy link
Contributor Author

cscs-ci run

Copy link
Contributor

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@philip-paul-mueller
Copy link
Contributor Author

Here are the latest results.
It looks like the performance is okay and there are no regressions.

bench_blueline_stencil_compute

@edopao
Copy link
Contributor

edopao commented Jan 30, 2026

Here are the latest results. It looks like the performance is okay and there are no regressions.

I would be very surprised if this PR had a performance impact. It only changes the order of applying LoopBlocking and it adds TaskletFusion, but both transformations are disabled in blueline.

@philip-paul-mueller philip-paul-mueller merged commit 3807bca into GridTools:main Jan 30, 2026
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants