Skip to content

Feat/cmip7 awiesm3 veg hr#266

Open
JanStreffing wants to merge 46 commits intoprep-releasefrom
feat/cmip7-awiesm3-veg-hr
Open

Feat/cmip7 awiesm3 veg hr#266
JanStreffing wants to merge 46 commits intoprep-releasefrom
feat/cmip7-awiesm3-veg-hr

Conversation

@JanStreffing
Copy link
Copy Markdown
Contributor

@JanStreffing JanStreffing commented Apr 2, 2026

CMIP7 cmorization for AWI-ESM3-VEG-HR

Adds full CMIP7 support targeting AWI-ESM3-VEG-HR, including a native compound-name
architecture that replaces the legacy cmip6-table-based data request lookup.

Key changes

CMIP7 data request

  • Load DataRequest from CMIP7_DReq_metadata JSON instead of cmip6 tables
  • Refactor to native compound-name architecture (ocean.tos.tavg-u-hxy-sea.mon.GLB)
  • Fix JSON key mismatch: cmip6_tablecmip6_cmor_table in vendored metadata
  • Improve compound_name matching against cmip6_compound_name and cmip7_compound_name attributes
  • Derive table_id from compound name when not set explicitly
  • Strict ValueError on zero DRV matches (instead of silent skip)

Pipeline

  • Add generic vertical_integrate custom pipeline step
  • Remove duplicate convert() step from DefaultPipeline
  • Fix Prefect State objects not being unwrapped to actual results in parallel runs
  • Propagate pipeline/flow errors instead of silently logging them

Standard library

  • Add time bounds support (src/pycmor/std_lib/time_bounds.py)
  • Fix dimension mapping to use getattr + _pycmor_cfg fallback
  • Fix global_attributes to derive table_id from CMIP6/CMIP7 compound names

Xarray accessor API

  • Lazy accessor registration and StdLibAccessor with .process()

Test infrastructure

  • Modernize with entry-point model discovery (pycmor.fixtures.model_runs)
  • Add pycmor.tutorial dataset system (xarray.tutorial-style API)
  • Fix stub generator to use monotonic coordinate values for multi-file datasets

Misc fixes

  • Python 3.9 entry_points() compatibility
  • Guard pyfesom2 imports for environments without it
  • Fix tarball double-nesting extraction on Python 3.12+
  • Rename non-standard time dimension on load (OpenIFS support)

Test plan

  • Unit tests pass: pytest tests/unit/
  • pycmor process examples/awiesm3-cmip7-minimal.yaml runs successfully on Levante
  • core_atm runs successfully on Levante
  • core_land runs successfully on Levante
  • core_ocean runs successfully on Levante
  • core_seaice runs successfully on Levante
  • cap7_atm runs successfully on Levante
  • cap7_ocean runs successfully on Levante
  • cap7_seaice runs successfully on Levante
  • veg_atm runs successfully on Levante
  • veg_land runs successfully on Levante
  • veg_seaice runs successfully on Levante
  • extra_atm runs successfully on Levante
  • extra_land runs successfully on Levante

The entry_points() API changed between Python 3.9 and 3.10:
- Python 3.9: entry_points() returns dict-like object
- Python 3.10+: entry_points(group='name') with keyword argument

Use try/except to detect the API version at runtime.

Fixes TypeError: entry_points() got an unexpected keyword argument 'group'
…6_table-based approach

- Use user-specified CMIP7_DReq_metadata file for DataRequest loading
- Fix cmip6_cmor_table -> cmip6_table key mismatch in table.py
- Extract table IDs from cmip6_table values not compound name prefix
- Add warning when rules have no matching data_request_variables
- Add debug logging to find_matching_rule for troubleshooting

This partially addresses the architectural issue where CMIP7 is forced
into CMIP6's table-based structure. Full compound name matching still
needs implementation (see CMIP7_ARCHITECTURE_ISSUE.md).

Fixes silent failure where rules were dropped with no user feedback.
Add step-by-step failure scenario showing:
- Silent failure symptoms
- Root cause discovery process (3 layered bugs)
- Log output at each debugging stage
- Key symptoms and workarounds
The branch fixes immediate bugs (silent failure, config ignored) but
architectural issues persist (cmip6_table dependency, partial matching).
- Index variables by full compound name instead of cmip6_table
- Implement exact compound name matching for CMIP7 (find_matching_rule_cmip7)
- Generate synthetic table headers from variable metadata
- Remove dependency on cmip6_table field for CMIP7 data loading
- Add comprehensive unit tests for synthetic header generation
- Maintain full backward compatibility with CMIP6 and existing CMIP7 metadata

Resolves critical AttributeError for table_header in CMIP7 processing.
Addresses architectural issues identified in CMIP7_ARCHITECTURE_ISSUE.md.

Tests: 15 passed, 1 skipped
@JanStreffing JanStreffing changed the base branch from main to prep-release April 2, 2026 12:55
Fixes trailing whitespace on blank lines in cmorizer.py and reformats
several other files to be consistent with black when run from root.
@JanStreffing JanStreffing force-pushed the feat/cmip7-awiesm3-veg-hr branch from 1a42875 to 5617a18 Compare April 2, 2026 13:18
Resolves merge conflicts in cmorizer.py and global_attributes.py,
keeping CMIP7_DReq_metadata feature and integrating prep-release
compound_name table_id derivation logic.
…ected

The PycmorConfigManager applies a 'pycmor' namespace, so it looks for
keys like 'pycmor_dask_cluster'. But the YAML 'pycmor:' section provides
unprefixed keys like 'dask_cluster', which were silently ignored and
fell back to defaults (e.g. dask_cluster defaulted to 'local' instead
of 'slurm'). Fix by prefixing dict keys in _create_environments.

Also adds custom_steps.py with vertical_integrate pipeline step and
fixes grid_file path and max_jobs in the minimal example.
Fix two bugs where pipelines didn't get the Dask cluster assigned:
1. _post_init_create_pipelines appended a new Pipeline.from_dict(p)
   instead of the one that had cluster assigned
2. DefaultPipeline created at rule init time bypassed CMORizer cluster
   assignment — now handled in _match_pipelines_in_rules

Switch example config from adaptive to fixed SLURM scaling to avoid
race condition where adaptive scaler kills workers before .compute()
submits the real Dask graph.
@JanStreffing
Copy link
Copy Markdown
Contributor Author

I was able to run both tos and the more complex absscint with the lastest commit on this branch. We may want to work on #267, and certainly need to work on #265. But neither should block us from starting to build up more rules for the picontrol variables.

…ntested)

- Rules for 20 of 28 core ocean variables in cmip7_awiesm3-veg-hr_ocean.yaml
- New custom steps: load_gridfile (generic), compute_deptho, compute_sftof,
  compute_thkcello_fx, compute_masscello_fx (FESOM mesh-derived)
- 6 new pipeline definitions for Ofx variables (fx_extract, fx_deptho, etc.)
- namelist.io: vec_autorotate=.true., hnode output, daily sst/sss/ssh
- Todo tracking and missing.md for variables FESOM cannot output
- Research: FESOM uses potential temp (no bigthetao), MLD3 for mlotst,
  velocities need rotation, u/v on elem grid -> use unod/vnod

Not yet tested — pipelines and rules need validation against actual data.
NOT TESTED — pipelines and custom steps need validation against real data.

- New steps: compute_density (gsw/TEOS-10), compute_mass_transport
  (Boussinesq rho_0*dz), compute_zostoga (global thermosteric SL)
- mass_transport_pipeline for umo/vmo/wmo
- zostoga_pipeline using gsw for EOS computation
- Rules for umo, vmo, wmo, zostoga in ocean rules file
- gsw package installed in pycmor_py312 environment

masscello(Omon) still needs density x hnode pipeline.
…(untested)

- 8 sea ice rules (simass, siu, siv, sithick, snd, ts, siconc, sitimefrac)
- siconc_pipeline (fraction_to_percent) and sitimefrac_pipeline (binary ice presence)
- fraction_to_percent and compute_sitimefrac custom steps
- Runnable sea ice config (examples/awiesm3-cmip7-seaice.yaml)
- namelist.io: added h_ice, h_snow, ist (monthly) and a_ice (daily)
- Moved missing.md and namelist.io up one level per user request
- Removed old awiesm3-cmip7-example.yaml (superseded by ocean/seaice configs)
…th inherit

- Add 45 CAP7 sea ice variable rules (direct mapping, scale, multi-variable
  compute, melt ponds, hemisphere integrals, stress tensor)
- Add custom pipeline steps: scale_by_constant, integrate_over_hemisphere,
  compute_sispeed, compute_ice_mass_transport, compute_sistressave/max,
  compute_siflcondtop, compute_sihc, compute_sisnhc, compute_sitempbot,
  compute_sifb, compute_constant_field, compute_simpeffconc
- Restructure all rules YAMLs into full runnable configs with general,
  pycmor, jobqueue, pipelines, and inherit sections
- Move data_path into inherit section with YAML anchor for reuse in
  inputs.path across all rules
- Update namelist.io with new monthly/daily diagnostics for CAP7 variables
- Add CAP7 sea ice variables todo tracking (~89 variables, 45 done)
- Add 28 CAP7 ocean variable rules covering easy (pbo, volo, global
  means, squaring, wfo), medium (tob, sob, pso, phcint, scint,
  difvho/difvso, difmxylo, masso), decadal (7 variables), and hard
  (opottemptend) categories
- Add custom pipeline steps: compute_square, extract_bottom,
  compute_surface_pressure
- Full runnable config with inherit section (data_path anchor)
- Comprehensive todo tracking ~147 CAP7 ocean variables
  (28 done, ~20 skipped, rest blocked or need model re-run)
…llo_dec, opottempmint, somint)

Second pass over CAP7 ocean variables to identify what can be computed
purely in pycmor post-processing. Adds volcello_fx and volcello_time
custom steps and pipelines, plus rules for virtual salt flux,
static/decadal cell volume, decadal cell mass, and yearly depth-
integrated temperature and salinity.
…vsfcorr, mlotst_day, uos, vos)

Add evap and relaxsalt to monthly output in namelist.io, and MLD3,
unod, vnod to daily output. Write corresponding pycmor rules with
scale_pipeline, surface_extract_pipeline, and direct mappings.
Add extract_surface custom step for daily surface velocity extraction.
Note: daily 3D unod/vnod output is very storage-heavy.
@esm-tools esm-tools deleted a comment from github-actions bot Apr 6, 2026
@esm-tools esm-tools deleted a comment from github-actions bot Apr 6, 2026
@esm-tools esm-tools deleted a comment from github-actions bot Apr 6, 2026
@esm-tools esm-tools deleted a comment from github-actions bot Apr 6, 2026
@esm-tools esm-tools deleted a comment from github-actions bot Apr 6, 2026
@esm-tools esm-tools deleted a comment from github-actions bot Apr 6, 2026
@esm-tools esm-tools deleted a comment from github-actions bot Apr 6, 2026
@esm-tools esm-tools deleted a comment from github-actions bot Apr 6, 2026
@esm-tools esm-tools deleted a comment from github-actions bot Apr 6, 2026
@mandresm
Copy link
Copy Markdown
Contributor

mandresm commented Apr 7, 2026

@pgierz, will you review this or should I go for it?

@pgierz
Copy link
Copy Markdown
Member

pgierz commented Apr 7, 2026

I will look

…fire emissions

- 6x 3hr radiation/flux (hfls, hfss, rlds, rlus, rsds, rsus)
- 5x 3hr plev6 instant (ta, ua, va, wap, hus) with new plev6 axis
- 3hr boundary layer depth (bldep), 3hr surface pressure (ps)
- 6hr snowfall flux (prsn), monthly net radiation (rls, rss), lwp
- 2x daily snow diagnostics (tsns, snmsl)
- 7x fire emission species (BC, CH4, CO, DMS, OA, SO2, NMVOC) from
  LPJ-GUESS fFireAll via Andreae (2019) emission factors
- Custom LPJ-GUESS .out file loader and fire emission pipeline steps
- XIOS field definitions, file_def output sections, plev6 axis/grid
- README and todo updates across all realms
veg_land: 88 variables classified, 58 implemented:
- 22 IFS/HTESSEL rules: 3hr hydrology (mrro, mrros, esn, srfrad, hfdsl,
  tslsi, mrsol), daily (evspsblpot, mrrob, sbl, snm, tsn, snd, dgw, dsn,
  dsw, mrtws), monthly (evspsblpot, sbl)
- 36 LPJ-GUESS rules: 5 yearly fractions, 7 yearly Lut, 9 monthly Lut,
  16 monthly N-cycle/carbon (fBNF, fNgas, nLand, nVeg, etc.)
- 30 blocked (no permafrost/groundwater/river routing, no daily PFT output)

Custom loaders for 3 additional LPJ-GUESS file formats:
- load_lpjguess_yearly (Lon/Lat/Year/Total)
- load_lpjguess_yearly_lut (Lon/Lat/Year/psl/crp/pst/urb)
- load_lpjguess_monthly_lut (Lon/Lat/Year/Mth/psl/crp/pst/urb)

Custom computation steps:
- compute_temporal_diff (dgw, dsn, dsw storage changes)
- compute_mrtws (terrestrial water storage sum)
- compute_snd (physical snow depth from SWE/density)

XIOS deaccumulation fix: all divisors changed from /10800 or /21600 to
/3600 assuming 1-hourly IFS-to-XIOS send frequency (NFRHIS). All freq_op
changed to 1h so XIOS samples at the IFS output rate.

New field_def entries: pev, esn, srfrad, hfdsl, evspsblpot, mrrob, sbl,
snm, mrsol (top 1m). New file_def sections: _3h_land, _day_land, _mon_land.

README: added ice sheet note, updated veg_land entry.
1 of 4 VEG sea ice variables producible (3 blocked: 2 ITD, 1 missing physics).
Daily sisnhc derived from daily m_snow and a_ice since h_snow is monthly-only.
New compute_sisnhc_from_msnow custom step with zero-division protection.
…gy, and regional subsets

19 variables classified: 13 implemented (6 LPJ-GUESS PFT fractions, LAI monthly,
areacellr, orog/tas southern hemisphere, dcw/dslw temporal diff, mrsow soil wetness),
6 blocked (4 irrigation, 1 river routing, 1 root zone moisture).
New custom steps: sum_lpjguess_monthly_files, compute_mrsow, select_southern_hemisphere.
1hr tas output file added to file_def.
…tmosphere fields

43 variables classified: 21 implemented (1hr fluxes/radiation, 30S-90S regional
subsets for clt/hurs/pr/ps/rlds/rsds/sfcWind, 3hr hurs/ts, daily cl/pfull/rls/rss/
evspsbl, monthly 10m wind gust), 22 blocked (aerosol/chemistry, crop tiles, heat
index, WBGT, lightning, CH4 emissions, 100m gust).
New XIOS: ts field (skt), 1hr surface output, daily model-level output, monthly gust.
Note: model has no interactive O3 (prescribed climatology).
@esm-tools esm-tools deleted a comment from github-actions bot Apr 8, 2026
@esm-tools esm-tools deleted a comment from github-actions bot Apr 8, 2026
…g, mrfso

Previously deferred as needing IFS source code changes, but LPJ-GUESS outputs
these directly as monthly .out files. Uses existing load_lpjguess_monthly loader.
3 variables remain blocked (rootd, mrsofc, sftgif: need offline IFS derivation).
sftgif: glacier fraction from vegetation type 12 (Ice Caps and Glaciers)
mrsofc: field capacity from soil type + HTESSEL Van Genuchten lookup table
rootd: effective root depth from vegetation-type-weighted Zeng et al. (1998) values
All 6 deferred core_land variables now implemented (3 LPJ-GUESS + 3 IFS-derived).
New XIOS: slt (soil type) field, monthly static land output file.
@esm-tools esm-tools deleted a comment from github-actions bot Apr 8, 2026
Parses all YAML rule files and CSVs to compute per-rule annual storage,
broken down by realm (with coverage %), frequency, grid type, and top-20
largest rules. Reports ~1.8 TB/year uncompressed, ~740 GB compressed.
These pressure-level derived fields (geopotential_height__zg, relative_humidity_pct__hur)
were placed in the 2D_physical field_group (grid_ref="reduced_sfc") but reference
source fields (z_pl, r_pl) on reduced_pl. XIOS fails at context close_definition
because reduced_sfc has 1 grid element while reduced_pl has 2 (domain + pressure axis).
…h, 1hr instant, monthly ml

New XIOS output: daily CMOR surface fields (_day_cap7), 6hr model-level
and plev7h instantaneous, 1hr instant surface (psl/uas/vas), 1hr ts avg,
1hr max gust, 3hr prsn, monthly model-level (ta/hus/hur/pfull).

New infrastructure: plev7h axis (7 levels: 1000-100 hPa) in axis_def
and grid_def, model-level zg/hur CMOR fields in field_def, prw/clivi/prsn
3hr field definitions.

New custom steps: compute_rtmt (net radiative flux at model top),
extract_single_plevel (ta@700hPa, wap@500hPa from plev19).

92 variables blocked: 17 COSP, 21 tendencies, 9 aerosol, 5 CO2,
4 effective radii, ~40 need IFS source changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants