Skip to content

Replace the previous ADAPTIVE coordinate with the flux-based AG implementation#6

Closed
angus-g wants to merge 22 commits intoACCESS-NRI:2025.02from
angus-g:angus-g/ag
Closed

Replace the previous ADAPTIVE coordinate with the flux-based AG implementation#6
angus-g wants to merge 22 commits intoACCESS-NRI:2025.02from
angus-g:angus-g/ag

Conversation

@angus-g
Copy link
Collaborator

@angus-g angus-g commented Mar 12, 2025

This changes the old ADAPTIVE coordinate that blended between grids with a new flux-based implementation called AG (Adaptive Grid). In its current state, there are a few things that may need to be changed:

  • dt gets passed into regridding
  • There's an extra diagnostic registration stage so we can get information from within regridding (for debugging the adaptivity)
  • T/S group pass halos have to grow

Due to the use of a stencil for determining the smoothing/adaptivity flux, there's a non-reproducibility between symmetric and non-symmetric memory modes that I haven't yet worked out. This is raised in the tc2.b case that I added for regression testing in the GitHub Actions suite.

This has been a part-time endeavour for ages, so I don't completely remember all the implementation specifics. Please criticise as necessary!

I will update with some of the mathematical justification, and a parameter description, which should go into the documentation.

@marshallward
Copy link

marshallward commented Mar 15, 2025

I was able to replicate the asymmetric grid fail. I compiled (in GCC) with -ffpe-trap=invalid,zero,overflow -finit-real=snan

663	        hdi_sig_v = 0.25 * ((hdi_sig(I,j,K)**2 + hdi_sig(I-1,j+1,K)**2) + &
664	             (hdi_sig(I,j+1,K)**2 + hdi_sig(I-1,j,K)**2))
(gdb) p hdi_sig(i,j,k)
$18 = -3.1731591122687031e-15
(gdb) p hdi_sig(i-1,j+1,k)
$19 = nan(0x4000000000000)
(gdb) p hdi_sig(i,j+1,k)
$20 = -2.4083547071577094e-15
(gdb) p hdi_sig(i-1,j,k)
$21 = nan(0x4000000000000)
(gdb) p i-1
$23 = 3
(gdb) p hdi_sig(i-1,:,:)
$26 = ((nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>) (nan(0x4000000000000), <repeats 16 times>))

Left boundary seems to be unset.

You might need either a halo update for nonsymmetric builds. Or change your bounds in L487 to G%IscB-1 to G%IscB-2 (or G%isc-2)? Your alpha_int appears defined on G%isc-2 so maybe that would work.

Anyway I think you get the idea. hdi_sig(3,:,:) needs to be defined somehow.

Feels like you might have a similar issue with hdj_sig.

@marshallward
Copy link

marshallward commented Mar 15, 2025

As for the other errors:

  • Doxygen is straightforward. Some lines are way too long.

  • I can't reproduce the OpenMP error, it could be a GCC bug related to code blocks. Perhaps that has been fixed?

    Maybe you could either move the entire !$omp parallel block into the Fortran code block? Or just remove the code block altogether.

  • The regressions may be related to the symmetric boundary. Let's fix that first then come back to this one.

@marshallward
Copy link

The regressions appear to be due to your changes to tc2.b. Great that we are testing the scheme, but probably not the best way to handle this. Let's fix the bugs, but then revert tc2.b to its original form. Maybe we want to introduce a tc2.c (or preferably create some kind of unit tests).

@angus-g
Copy link
Collaborator Author

angus-g commented Mar 18, 2025

The regressions appear to be due to your changes to tc2.b. Great that we are testing the scheme, but probably not the best way to handle this. Let's fix the bugs, but then revert tc2.b to its original form. Maybe we want to introduce a tc2.c (or preferably create some kind of unit tests).

Ah yeah, I forgot that I just completely changed an existing config for testing. Definitely not intended to go in that way!

Left boundary seems to be unset.

You might need either a halo update for nonsymmetric builds. Or change your bounds in L487 to G%IscB-1 to G%IscB-2 (or G%isc-2)? Your alpha_int appears defined on G%isc-2 so maybe that would work.

Anyway I think you get the idea. hdi_sig(3,:,:) needs to be defined somehow.

Feels like you might have a similar issue with hdj_sig.

I do vaguely remember messing around with the loop bounds a whole bunch, and it seemed to break one or the other memory configuration no matter what I did...

@dougiesquire
Copy link
Collaborator

Hey @angus-g, just checking in here. Is there anything you need from us (ACCESS-NRI) here? My impression from the comments above is that a review would not be particularly helpful at this point as there are still changes to come, but let me know if I've misunderstood.

@dougiesquire
Copy link
Collaborator

dougiesquire commented Aug 13, 2025

Hi @angus-g. As we just discussed in person, for your testing could you please rebase this onto the 2025.02.001 tag?

Then we can create a prerelease of ACCESS-OM3 that's otherwise identical to what was used in the recent ACCESS-OM3 25km beta release config so you can use the control run from that for your z-star comparison.

angus-g added 22 commits August 14, 2025 09:23
This is the adaptive coordinate from the Gibson PhD thesis. It
involves a flux-based calculation of density and smoothing adaptivity
terms, which are weighted depending on the local isopycnal
slope. Because these two terms alone aren't a sufficient constraint,
interfaces can be nudged toward a pre-defined grid using the
filtered_grid_motion mechanism. Additionally, to prevent convective
instabilities resulting purely from grid motion, water masses can be
"pushed" through vanished layers.

As a structural change, the filtered_grid_motion subroutine and its
controlling parameters are lifted out to the new ALE/filter_utils
module, so that the main AG code can reside in ALE/coord_adapt without
a circular dependency back on MOM_regridding.
We can pull out the contributing terms in the coordinate from within
the regridding routine, and the framework is set up to pull arbitrary
information out from the regridding step. This should be useful for
coordinate debugging.
This probably needs slightly more careful analysis to make sure we're
really doing the right thing.
These variables are only set on land points, but the diagnostics are
including them regardless. While it shouldn't affect the regular
operation of the code, the diagnostics could end up containing junk
data.
The two timescales (adaptivity diffusivity and restoring) needed to be
annotated with the s_to_T scaling in order to pass the dimensional
consistency test for T.
We need to pass through the US structure.
We were failing the grid rotation test because of some floating point
differences. It makes much more sense to compare to epsilon anyway.
There was a lot of duplicated code between these sections, making it
very hard to verify they were even written exactly the same.
Seems like we were bringing in junk data for slopes which was causing
differences across restarts.
It's useful to be able to dump the diagnostics out, but we need this
weird "attach the diag CS later" method, which we might not always
want to do. Instead, let's just make diagnostics conditional.
This filter only acts on points, if their masks are valid. Shouldn't
require any halo extension.
A couple of issues here. First, there was a typo in the V-direction
filter, so it was using the wrong mask. More importantly, the stencil
was reaching through uncalculated cells and past the edges of the
domain, leading to FPE.
@angus-g
Copy link
Collaborator Author

angus-g commented Aug 13, 2025

Thanks! That was a smooth rebase. I guess I forgot that OM3 configurations are still using asymmetric memory, which I hadn't been focusing on. As Marshall pointed out, there is still an issue there (or at least a non-reproducibility across memory layouts). Perhaps that should be a priority so I can run asymmetric memory for a like-for-like comparison with the existing run of the beta config.

I did spend a while playing with loop bounds and group passes, but I could never work out a way to unify the memory layouts on the stencil operation...

@dougiesquire
Copy link
Collaborator

Up to you.

One alternative is that we deploy ACCESS-OM3 with your changes using symmetric memory. That will mean:

  • your ACCESS-OM3 runs will not be reproducible across restarts (at least that used to be the case - this is why we are currently using asymmetric, but I haven't checked recently)
  • you will not be able to use the 25km beta control run for your z-star comparison. But you could easily generate the z-star data you need.

Or if the current issue with using asymmetric with your code changes is only affecting repo across layouts, we could just push ahead with your test runs. We won't be changing the layout within a run anyway.

@dougiesquire
Copy link
Collaborator

Oh actually, I think the 25km configurations are reproducible across restarts with symmetric memory. It's only the 100km configs that aren't. So I suggest:

  • deploy ACCESS-OM3 with your changes using symmetric memory
  • simultaneously check whether we can reproduce the existing 25km beta release config using a build that is identical to what was released, except uses symmetric memory.
    • If we can, use the existing 25km beta control run for your z-star comparison
    • If we can't, rerun the 25km beta release config using the symmetric build to create the z-star data you need.

Does that make sense?

@angus-g
Copy link
Collaborator Author

angus-g commented Aug 14, 2025

Perfect, that sounds like a good way forward!

@dougiesquire
Copy link
Collaborator

dougiesquire commented Aug 14, 2025

Ugh, unfortunately I can't use our prerelease infrastructure to build from code on forks. Would it be a massive pain to reopen this from a branch on this repo?

@angus-g
Copy link
Collaborator Author

angus-g commented Aug 14, 2025

Nope, that's fine. Should I do it?

@dougiesquire
Copy link
Collaborator

Yes please - it will make the deployment easier

@angus-g
Copy link
Collaborator Author

angus-g commented Aug 14, 2025

Superseded by #25.

@angus-g angus-g closed this Aug 14, 2025
dougiesquire pushed a commit that referenced this pull request Nov 27, 2025
Adds analogous code to MOM_dynamics_split_RK2b.F90 that is identical to the
code that was added to MOM_dynamics_split_RK2.F90 to avoid readjusting the
barotropic mass source to try to reconcile the baroclinic and barotropic sea
surface heights.  By default all answers are bitwise identical, but the runtime
parameter BT_ADJ_CORR_MASS_SRC will not appear in the MOM_parameter_doc files
for cases with SPLIT_RK2B set to True.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants