Update Utilities/mean_nemo to handle time-varying masks and make other improvements #28
+900
−232
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Summary
Code Reviewer:
Add support for time-varying masks to
mean_nemo, as well as other improvements relating to performance and testing.Closes #26- see this for more details on the changes.
Code Quality Checklist
(Some checks are automatically carried out via the CI pipeline)
readability of the code
Testing
If any tests fail (rose-stem or CI) the reason is understood andacceptable (eg. kgo changes)
tests, unit tests, etc.)
simple_test_data.py) has been added to generate NetCDF files containing testing datasets intended as input tomean_nemo, as well as separate files containing the expected results frommean_nemofor these input filesOther testing
Regression testing
Data from several models were used as input to
mean_nemoand their results before/after the changes compared usingnccmp -F -f -c 1 -d -m -w format:GloSea ORCA025 data
The first daily mean of:
moose:devfc/rosie_u-cw134_ERA5/field.nc.file/prodh_rs_sfc_o1d_T_18_20161101_20161207_001.ncwas extracted and passed as input 30/31 times to simulate 30/31-day averages. The purpose of this was to test that the missing data issues described by MOSRS #652 did not reoccur.
GOSI9 ORCA025 data
The following files were passed as input (separately for each grid type):
moose:/crum/u-ct401/onm.nc.file/*1m_19770[678]*GOSI10 ORCA025 data
The following files were passed as input (separately for each grid type):
moose:/crum/u-dc531/ony.nc.file/*1y_200[012]*UKESM1.3 ORCA1 data
The following files were passed as input (separately for each grid and file type):
moose:/crum/u-dw112/onm.nc.file/*_1m_32{47,57,74}0101-*mean_nemoproduced identical results for all data, except for thickness-weighted diagnostics (those withcell_methods=time: mean (thickness weighted)) which have bit level differences. This is an expected consequence of the following change to the calculation of the average:The average of input data$D$ weighted by cell thickness $C$ is $\frac{\sum{C D}}{\sum{C}}$ . $\sum{C}$ had to be recovered multplying by the number of time records used to calculate the average.
meancellthick_4d*variables were previously calculated as exactly that- the average cell thickness- someancellthick_4d*now represents* ntimescalculation is not necessary. This means the floating point arithmetic produces slightly different results, but since fewer calculations are performed the result will be more accurate (less accumulation of round off error).Correctness testing
The new
simple_test_data.pyscript generates input files formean_nemo, as well as files containing the expected results from runningmean_nemowith these input files. The variables in these files use different precisions, shapes etc and test much moremean_nemofunctionality than standard model data.The files produced by
mean_nemowere compared with the expected results files generated by this script, using the samenccmpcommand as the regression testing.Only one byte type variable fails this test, because
mean_nemocalculates this average using byte precision arithmetic and the result overflows:DIFFER : VARIABLE : byte_4d_unweighted_time_unmasked : POSITION : [10,10,2,1] : VALUES : 0xFFFFFFEB <> 0x15The calculation should instead be performed using a higher precision and the result cast back to byte type. This is not addressed here, since we rarely use byte data types.
Security Considerations
Sensitive data is properly handled (if applicable)Authentication and authorisation are properly implemented (if applicable)Performance Impact
performance measurements have been conducted
In addition to the timing calipers added to the code, the overall elapsed wallclock time and memory usage as given by
qstat -fxwwere monitored (although the memory usage fluctuates too much to be useful). The changes have no significant impact on performance.AI Assistance and Attribution
Some of the content of this change has been produced with the assistanceof Generative AI tool name (e.g., Met Office Github Copilot Enterprise,
Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the
Simulation Systems AI policy
(including attribution labels)
Documentation
Where appropriate I have updated documentation related to this change andconfirmed that it builds correctly
Code Review