Replace the previous ADAPTIVE coordinate with the flux-based AG implementation by angus-g · Pull Request #6 · ACCESS-NRI/MOM6

angus-g · 2025-03-12T22:45:25Z

This changes the old ADAPTIVE coordinate that blended between grids with a new flux-based implementation called AG (Adaptive Grid). In its current state, there are a few things that may need to be changed:

dt gets passed into regridding
There's an extra diagnostic registration stage so we can get information from within regridding (for debugging the adaptivity)
T/S group pass halos have to grow

Due to the use of a stencil for determining the smoothing/adaptivity flux, there's a non-reproducibility between symmetric and non-symmetric memory modes that I haven't yet worked out. This is raised in the tc2.b case that I added for regression testing in the GitHub Actions suite.

This has been a part-time endeavour for ages, so I don't completely remember all the implementation specifics. Please criticise as necessary!

I will update with some of the mathematical justification, and a parameter description, which should go into the documentation.

marshallward · 2025-03-15T16:43:59Z

I was able to replicate the asymmetric grid fail. I compiled (in GCC) with -ffpe-trap=invalid,zero,overflow -finit-real=snan

663	        hdi_sig_v = 0.25 * ((hdi_sig(I,j,K)**2 + hdi_sig(I-1,j+1,K)**2) + &
664	             (hdi_sig(I,j+1,K)**2 + hdi_sig(I-1,j,K)**2))

(gdb) p hdi_sig(i,j,k)
$18 = -3.1731591122687031e-15
(gdb) p hdi_sig(i-1,j+1,k)
$19 = nan(0x4000000000000)
(gdb) p hdi_sig(i,j+1,k)
$20 = -2.4083547071577094e-15
(gdb) p hdi_sig(i-1,j,k)
$21 = nan(0x4000000000000)
(gdb) p i-1
$23 = 3
(gdb) p hdi_sig(i-1,:,:)
$26 = ((nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>))

Left boundary seems to be unset.

You might need either a halo update for nonsymmetric builds. Or change your bounds in L487 to G%IscB-1 to G%IscB-2 (or G%isc-2)? Your alpha_int appears defined on G%isc-2 so maybe that would work.

Anyway I think you get the idea. hdi_sig(3,:,:) needs to be defined somehow.

Feels like you might have a similar issue with hdj_sig.

marshallward · 2025-03-15T16:53:07Z

As for the other errors:

Doxygen is straightforward. Some lines are way too long.
I can't reproduce the OpenMP error, it could be a GCC bug related to code blocks. Perhaps that has been fixed?

Maybe you could either move the entire !$omp parallel block into the Fortran code block? Or just remove the code block altogether.
The regressions may be related to the symmetric boundary. Let's fix that first then come back to this one.

marshallward · 2025-03-15T16:57:25Z

The regressions appear to be due to your changes to tc2.b. Great that we are testing the scheme, but probably not the best way to handle this. Let's fix the bugs, but then revert tc2.b to its original form. Maybe we want to introduce a tc2.c (or preferably create some kind of unit tests).

angus-g · 2025-03-18T01:56:27Z

The regressions appear to be due to your changes to tc2.b. Great that we are testing the scheme, but probably not the best way to handle this. Let's fix the bugs, but then revert tc2.b to its original form. Maybe we want to introduce a tc2.c (or preferably create some kind of unit tests).

Ah yeah, I forgot that I just completely changed an existing config for testing. Definitely not intended to go in that way!

Left boundary seems to be unset.

You might need either a halo update for nonsymmetric builds. Or change your bounds in L487 to G%IscB-1 to G%IscB-2 (or G%isc-2)? Your alpha_int appears defined on G%isc-2 so maybe that would work.

Anyway I think you get the idea. hdi_sig(3,:,:) needs to be defined somehow.

Feels like you might have a similar issue with hdj_sig.

I do vaguely remember messing around with the loop bounds a whole bunch, and it seemed to break one or the other memory configuration no matter what I did...

dougiesquire · 2025-04-23T00:00:18Z

Hey @angus-g, just checking in here. Is there anything you need from us (ACCESS-NRI) here? My impression from the comments above is that a review would not be particularly helpful at this point as there are still changes to come, but let me know if I've misunderstood.

dougiesquire · 2025-08-13T05:45:20Z

Hi @angus-g. As we just discussed in person, for your testing could you please rebase this onto the 2025.02.001 tag?

Then we can create a prerelease of ACCESS-OM3 that's otherwise identical to what was used in the recent ACCESS-OM3 25km beta release config so you can use the control run from that for your z-star comparison.

This is the adaptive coordinate from the Gibson PhD thesis. It involves a flux-based calculation of density and smoothing adaptivity terms, which are weighted depending on the local isopycnal slope. Because these two terms alone aren't a sufficient constraint, interfaces can be nudged toward a pre-defined grid using the filtered_grid_motion mechanism. Additionally, to prevent convective instabilities resulting purely from grid motion, water masses can be "pushed" through vanished layers. As a structural change, the filtered_grid_motion subroutine and its controlling parameters are lifted out to the new ALE/filter_utils module, so that the main AG code can reside in ALE/coord_adapt without a circular dependency back on MOM_regridding.

We can pull out the contributing terms in the coordinate from within the regridding routine, and the framework is set up to pull arbitrary information out from the regridding step. This should be useful for coordinate debugging.

This probably needs slightly more careful analysis to make sure we're really doing the right thing.

These variables are only set on land points, but the diagnostics are including them regardless. While it shouldn't affect the regular operation of the code, the diagnostics could end up containing junk data.

The two timescales (adaptivity diffusivity and restoring) needed to be annotated with the s_to_T scaling in order to pass the dimensional consistency test for T.

We need to pass through the US structure.

We were failing the grid rotation test because of some floating point differences. It makes much more sense to compare to epsilon anyway.

There was a lot of duplicated code between these sections, making it very hard to verify they were even written exactly the same.

Seems like we were bringing in junk data for slopes which was causing differences across restarts.

It's useful to be able to dump the diagnostics out, but we need this weird "attach the diag CS later" method, which we might not always want to do. Instead, let's just make diagnostics conditional.

This filter only acts on points, if their masks are valid. Shouldn't require any halo extension.

A couple of issues here. First, there was a typo in the V-direction filter, so it was using the wrong mask. More importantly, the stencil was reaching through uncalculated cells and past the edges of the domain, leading to FPE.

angus-g · 2025-08-13T23:31:43Z

Thanks! That was a smooth rebase. I guess I forgot that OM3 configurations are still using asymmetric memory, which I hadn't been focusing on. As Marshall pointed out, there is still an issue there (or at least a non-reproducibility across memory layouts). Perhaps that should be a priority so I can run asymmetric memory for a like-for-like comparison with the existing run of the beta config.

I did spend a while playing with loop bounds and group passes, but I could never work out a way to unify the memory layouts on the stencil operation...

dougiesquire · 2025-08-13T23:53:32Z

Up to you.

One alternative is that we deploy ACCESS-OM3 with your changes using symmetric memory. That will mean:

your ACCESS-OM3 runs will not be reproducible across restarts (at least that used to be the case - this is why we are currently using asymmetric, but I haven't checked recently)
you will not be able to use the 25km beta control run for your z-star comparison. But you could easily generate the z-star data you need.

Or if the current issue with using asymmetric with your code changes is only affecting repo across layouts, we could just push ahead with your test runs. We won't be changing the layout within a run anyway.

dougiesquire · 2025-08-14T00:04:07Z

Oh actually, I think the 25km configurations are reproducible across restarts with symmetric memory. It's only the 100km configs that aren't. So I suggest:

deploy ACCESS-OM3 with your changes using symmetric memory
simultaneously check whether we can reproduce the existing 25km beta release config using a build that is identical to what was released, except uses symmetric memory.
- If we can, use the existing 25km beta control run for your z-star comparison
- If we can't, rerun the 25km beta release config using the symmetric build to create the z-star data you need.

Does that make sense?

angus-g · 2025-08-14T00:10:05Z

Perfect, that sounds like a good way forward!

dougiesquire · 2025-08-14T04:37:06Z

Ugh, unfortunately I can't use our prerelease infrastructure to build from code on forks. Would it be a massive pain to reopen this from a branch on this repo?

angus-g · 2025-08-14T05:35:16Z

Nope, that's fine. Should I do it?

dougiesquire · 2025-08-14T05:41:13Z

Yes please - it will make the deployment easier

angus-g · 2025-08-14T05:45:04Z

Superseded by #25.

Adds analogous code to MOM_dynamics_split_RK2b.F90 that is identical to the code that was added to MOM_dynamics_split_RK2.F90 to avoid readjusting the barotropic mass source to try to reconcile the baroclinic and barotropic sea surface heights. By default all answers are bitwise identical, but the runtime parameter BT_ADJ_CORR_MASS_SRC will not appear in the MOM_parameter_doc files for cases with SPLIT_RK2B set to True.

angus-g added 22 commits August 14, 2025 09:23

Instrument the AG coordinate regridding with diagnostics

0e4ef26

We can pull out the contributing terms in the coordinate from within the regridding routine, and the framework is set up to pull arbitrary information out from the regridding step. This should be useful for coordinate debugging.

Bump up halos in tracer group pass

f45a049

This probably needs slightly more careful analysis to make sure we're really doing the right thing.

Add AG to DOME2d

2ba6061

Fix u/v loops in AG

e29c87b

Initialize dz_s and dz_p to 0 to avoid uninitialised diagnostics

4df0927

These variables are only set on land points, but the diagnostics are including them regardless. While it shouldn't affect the regular operation of the code, the diagnostics could end up containing junk data.

Add T scaling to AG parameters

11b0ef9

The two timescales (adaptivity diffusivity and restoring) needed to be annotated with the s_to_T scaling in order to pass the dimensional consistency test for T.

Reorder build_grid_adapt params

37be480

We need to pass through the US structure.

Use epsilon for zero-denominator checks

6b8007b

We were failing the grid rotation test because of some floating point differences. It makes much more sense to compare to epsilon anyway.

Factor out calculation of hd{i,j}_sig and hd{i,j}_sig_phys

d604829

There was a lot of duplicated code between these sections, making it very hard to verify they were even written exactly the same.

Add L and H scaling to AG

bd08a06

add tc2.b case for AG coordinate

a14fdcb

Fix style errors to satisfy Actions

0805679

Use velocity point mask for AG derivatives

7e78b38

Seems like we were bringing in junk data for slopes which was causing differences across restarts.

Remove unused remapCS from build_grid_adaptive

dda2f75

Remove diag_cs requirement for AG

96c3d42

It's useful to be able to dump the diagnostics out, but we need this weird "attach the diag CS later" method, which we might not always want to do. Instead, let's just make diagnostics conditional.

Add OpenMP acceleration to build_adapt_grid

fd0c979

Explicitly set do_diag for adapt coord

f7ab2e0

Implement a 7-point horizontal filter for adapt weights

b2b3b90

This filter only acts on points, if their masks are valid. Shouldn't require any halo extension.

Ignore slope if density gradient inverts

b577015

Fix adapt weight filter calculation

3c0bff5

A couple of issues here. First, there was a typo in the V-direction filter, so it was using the wrong mask. More importantly, the stencil was reaching through uncalculated cells and past the edges of the domain, leading to FPE.

Add a group pass to fix horizontal filter consistency

97f2f08

angus-g force-pushed the angus-g/ag branch from 114c27c to 97f2f08 Compare August 13, 2025 23:29

dougiesquire mentioned this pull request Aug 14, 2025

Prereleases with Angus' MOM6 adaptive grid coordinate ACCESS-NRI/ACCESS-OM3#140

Draft

angus-g mentioned this pull request Aug 14, 2025

Replace the previous ADAPTIVE coordinate with the flux-based AG implementation #25

Open

angus-g closed this Aug 14, 2025

Conversation

angus-g commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marshallward commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marshallward commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marshallward commented Mar 15, 2025

Uh oh!

angus-g commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dougiesquire commented Apr 23, 2025

Uh oh!

dougiesquire commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angus-g commented Aug 13, 2025

Uh oh!

dougiesquire commented Aug 13, 2025

Uh oh!

dougiesquire commented Aug 14, 2025

Uh oh!

angus-g commented Aug 14, 2025

Uh oh!

dougiesquire commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angus-g commented Aug 14, 2025

Uh oh!

dougiesquire commented Aug 14, 2025

Uh oh!

angus-g commented Aug 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

angus-g commented Mar 12, 2025 •

edited

Loading

marshallward commented Mar 15, 2025 •

edited

Loading

marshallward commented Mar 15, 2025 •

edited

Loading

angus-g commented Mar 18, 2025 •

edited

Loading

dougiesquire commented Aug 13, 2025 •

edited

Loading

dougiesquire commented Aug 14, 2025 •

edited

Loading