Add low precision support to conditional SFNO by mcgibbon · Pull Request #800 · ai2cm/ace

mcgibbon · 2026-02-05T22:25:01Z

Short description of why the PR is needed and how it satisfies those requirements, in sentence form.

Changes:

symbol (e.g. fme.core.my_function) or script and concise description of changes or added feature
Can group multiple related symbols on a single bullet
Tests added
If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated

Resolves # (delete if none)

mcgibbon · 2026-02-05T22:27:59Z

fme/core/models/conditional_sfno/s2convolutions.py

    """
-    wc = torch.view_as_complex(weight)
-    return torch.einsum("bixy,iox->boxy", xc, wc)
+    r0 = torch.einsum("bixy,iox->boxy", x[..., 0], w[..., 0])


Definitely some performance being left on the table here by doing these parts separately, but it shouldn't significantly affect the memory reductions from bfloat16/float16.

I would leave optimization of this function to happen alongside profiling tools. Doing so likely requires some manual permutes and matrix multiplications, which could hurt performance if done wrong.

mcgibbon · 2026-02-05T22:28:29Z

fme/core/models/conditional_sfno/s2convolutions.py


        with torch.amp.autocast("cuda", enabled=False):
-            x = self.forward_transform(x.float())
+            x = self.forward_transform(x)


float casting is now handled exactly when it's needed within the forward transform.

mcgibbon · 2026-02-05T22:28:59Z

fme/core/models/conditional_sfno/sht.py

@@ -0,0 +1,222 @@
+# flake8: noqa


This file is added (copy-paste from sht_fix.py) in one commit and edited in the second, I recommend reviewing the second commit separately.

mcgibbon · 2026-02-05T22:42:10Z

fme/core/models/conditional_sfno/makani/spectral_convolution.py


        with amp.autocast(device_type="cuda", enabled=False):
-            x = self.forward_transform(x).contiguous()
+            x = torch.view_as_complex(self.forward_transform(x)).contiguous()


These changes were required because of the change to the API for the forward/inverse transform to take in real arrays with trailing [2] dim, which is needed for them to support low precision.

mcgibbon added 2 commits February 5, 2026 22:21

copy-paste SHT file

d4235c9

enable float16 and bfloat16 operation of sfno

18e96d9

mcgibbon commented Feb 5, 2026

View reviewed changes

use stricter limit for float32

cba6372

mcgibbon changed the title ~~Feature/bfloat16 sfno~~ Add low precision support to conditional SFNO Feb 5, 2026

Merge branch 'main' into feature/bfloat16_sfno

99b9943

mcgibbon commented Feb 5, 2026

View reviewed changes

mcgibbon added 2 commits February 5, 2026 22:52

handle dtypes automatically based on model dtype

94e8339

add failing regression test

5b7bedb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add low precision support to conditional SFNO#800

Add low precision support to conditional SFNO#800
mcgibbon wants to merge 6 commits intomainfrom
feature/bfloat16_sfno

mcgibbon commented Feb 5, 2026

Uh oh!

mcgibbon Feb 5, 2026

Uh oh!

mcgibbon Feb 5, 2026

Uh oh!

mcgibbon Feb 5, 2026

Uh oh!

mcgibbon Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mcgibbon commented Feb 5, 2026

Uh oh!

mcgibbon Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

mcgibbon Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

mcgibbon Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

mcgibbon Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant