Skip to content

Secofs 2d ufs#77

Open
mansurjisan wants to merge 100 commits intoufs-ensfrom
secofs-2d-ufs
Open

Secofs 2d ufs#77
mansurjisan wants to merge 100 commits intoufs-ensfrom
secofs-2d-ufs

Conversation

@mansurjisan
Copy link
Copy Markdown
Owner

No description provided.

15-member ensemble (1 GFS control + 14 GEFS) using same horizontal
grid as 3D SECOFS but with 2 vertical levels (barotropic). Fits within
the same 8,400-core budget as the 7-member 3D ensemble (560 cores/member:
60 DATM + 500 SCHISM).

New files:
- parm/systems/secofs_2d_ufs.yaml: 2D config (nvrt=2, ibc=1, no tracers)
- fix/secofs_2d_ufs/: param.nml, vgrid.in, ufs.configure, templates
- pbs/launch_secofs_2d_ufs_ensemble.sh: 15-member launcher
- pbs/jnos_secofs_2d_ufs_ensemble_member.pbs: 5-node, 1h walltime
- pbs/jnos_secofs_2d_ufs_gefs_prep.pbs: 14 GEFS member atmos prep
- docs/secofs_2d_ufs_ensemble_plan.md: implementation plan
- param.nml: Fix ibc=1 -> ibc=0 (nvrt=2 makes it barotropic, ibc must stay 0)
- PBS scripts: Replace hardcoded mansur.jisan with $LOGNAME in output paths
- datm_in.template: Hardcode datm_esmf_mesh.nc filename, add note that
  nx_global/ny_global are patched at runtime by ensemble_run.sh
- launcher: Fix control member (i=0) to use GEFS_ENSEMBLE=false with no
  GEFS_MEMBER_ID, preventing confusion with GEFS member 00
- YAML: Align ensemble member walltime to 01:00:00 (was 03:00:00)
- Cherry-pick schout_*.nc support from ufs-ens (convert_schout_to_split)
  for ensemble post-processing with combined SCHISM output format
The prep PBS script hardcodes OFS=secofs_ufs, causing it to use 3D
fix files and COMOUT path. Pass OFS via qsub -v so the launcher's
OFS=secofs_2d_ufs takes precedence.
YAML null becomes Python None, and None.upper() crashes with
'NoneType has no attribute upper'. Guard all .upper() calls
with (value or '').upper() to handle null/None gracefully.
Fixes 2D barotropic config where ts_source is null.
OCN_petlist_bounds was 60-559 (560 total) but PBS allocates 600 tasks
(5 nodes x 120 mpiprocs). Extra MPI ranks with no PET assignment cause
UFS-Coastal to crash. Updated to 60-599 (600 total = 60 MED/ATM + 540 OCN).
…yout

YAML config was overriding PBS TOTAL_TASKS with 560 (old value).
Must be 600 to match ufs.configure OCN_petlist_bounds 60-599
and PBS select=5:ncpus=128:mpiprocs=120 (5x120=600).
3D hotstart (63 sigma levels) is incompatible with 2D vgrid (2 levels),
causing segfault. Set ihot=0 in param.nml and skip hotstart/restart
file staging when BAROTROPIC=true.
start_year/month/day/hour had *_value placeholders that cause
Fortran namelist parse errors. Replaced with real defaults (2026-03-11 12Z).
These get patched to actual values at runtime by ensemble_run.sh sed commands.
ivcor=2 (SZ) fails on deep grids due to h_c constraint. Use ivcor=1
(LSC2) instead — same format as 3D SECOFS but with nvrt=3 (3 uniform
sigma levels). The generator reads node count from hgrid.gr3 and
creates the per-node sigma file (~87MB for SECOFS grid).

Usage on WCOSS2:
  cd $FIXofs
  python3 gen_vgrid_2d.py secofs_2d_ufs.hgrid.gr3 secofs_2d_ufs.vgrid.in
Instead of always forcing ihot=0 for barotropic, check if a 2D
hotstart file exists in COMOUT (produced by a 2D nowcast run).
If found, use ihot=1 for hot start — eliminates cold start spinup.
Falls back to ihot=0 only when no 2D hotstart is available.
Python-generated ESMF meshes have different node/element counts than
what CDEPS expects, causing scrambled atmospheric forcing fields.
ESMF_Scrip2Unstruct adds boundary nodes that ESMF's internal regridding
requires.

- Add create_esmf_mesh_from_forcing.sh: creates SCRIP file from any
  datm_forcing.nc, then runs ESMF_Scrip2Unstruct + adds elementMask
- JNOS_OFS_ENSEMBLE_ATMOS_PREP: replace inline Python mesh generation
  for both GEFS and control meshes with calls to the new script
- Works for both 2D and 3D (same atmospheric grid, same mesh)
- scale_roughness_2d.py: scales rough.gr3 z0 values for 2D barotropic
  SCHISM to compensate for over-damped tides (depth-averaged velocity
  in log-law gives ~50-60% higher effective friction than 3D)
- deploy_to_wcoss2.sh: add GROUP 9 for secofs_2d_ufs fix files,
  auto-generates scaled rough.gr3 (default scale=0.05, configurable
  via ROUGH_SCALE env var)
The 5 3 0 0 ocean boundary (subtidal SSH + tidal-only velocity) caused
a monotonic domain-wide drawdown of ~1m over 2 days. The boundary
imposed low-frequency sea level changes via elev2D.th.nc without
matching subtidal transport, creating a mass imbalance.

Fix: always downgrade both iettype and ifltype from 5 to 3 (tidal only)
for dynamical consistency. Remove --with-elev2d flag from converter
and both calling sites (exnos_ofs_prep.sh, nos_ofs_model_run.sh).
Now that iettype=3 (tidal only), elev2D.th.nc is unused. Remove:
- RTOFS SSH extraction in prep (gen_elev2d_th.py call)
- elev2D.th.nc staging in barotropic STOFS path
Saves wall time and avoids confusion about dead inputs.
gen_regional_drag.py generates spatially-varying drag.gr3 for nchi=0.
West Florida shelf / Gulf gets higher drag (default 3x) to damp
over-energetic tides (amplitude ratio 1.5-3.5x vs 3D). Atlantic
coast / Chesapeake left at baseline (already well-matched at ~0.94x).

Smooth quadratic taper between Gulf and Atlantic to avoid
discontinuities. South Florida tip also scaled for Keys/Miami.

To use: switch param.nml nchi=1 → nchi=0, deploy drag.gr3.
Replaces gen_regional_drag.py approach (nchi=0) with a cleaner
experiment that preserves the existing log-law formulation (nchi=1).

Key design decisions from review:
- West Florida shelf only (lon <= -82, lat 24-31), NOT South Florida
- South Florida/Keys excluded: good amplitude ratios (~1.0-1.15),
  residual is mean bias (-1.2m), not friction
- Quadratic taper at region boundary (2° wide)
- Conservative default scale (2.0x), recommend testing 5-10x
  due to log-law nonlinearity (10x z0 → only 1.6x Cd)
Post-processing correction that combines 3D deterministic with scaled
2D ensemble anomalies:

  WL_final(t) = WL_3d_det(t) + a_i * (WL_2d_member(t) - WL_2d_control(t))

3D det carries the full baroclinic physics (mean SSH, density currents).
2D ensemble contributes only the spread, scaled by per-station amplitude
ratio a_i = std(3D_det) / std(2D_ctl).

Two-step workflow:
  train: compute a_i from 2D control vs 3D det (single or multi-cycle)
  apply: correct each ensemble member with same coefficients

Coefficients clipped to [0.1, 5.0] to avoid blowup at weak-signal
stations. For production, train on many past cycles.
1. CSV output uses csv.writer with QUOTE_MINIMAL to handle station
   names containing commas (e.g., "Palatka, St Johns River")
2. apply_correction validates station counts match across 3D det,
   2D control, member, and coefficients — raises ValueError on mismatch
3. Low-correlation stations (r < corr_floor, default 0.3) get a_i=1.0
   instead of fitting std ratio from noise. Adds --corr-floor CLI arg
   and gate_reason field to coefficients JSON.
1. Training-time validation: raises ValueError if 2D control and 3D det
   station counts differ. Warns (not errors) if station.in label count
   doesn't match data, since model output can have extra stations.
2. Station identity validation at apply time: coefficients JSON now
   stores station_order list. apply --station-in cross-checks labels
   against trained order, raising ValueError on identity mismatch.
3. corr_floor saved in training metadata for reproducibility.
…--station-in

- Training: hard fail if station.in label count != data station count
  (no more silent truncation to min)
- Apply: --station-in now required (was optional), always validates
  station identity against trained order
After archive, generates CO-OPS standard station timeseries NetCDF
from staout text files using schism_combine_outputs.py. Produces
{prefix}.t{cyc}z.{PDY}.stations.{nowcast|forecast}.nc in COMOUT.

Runs for both nowcast and forecast phases. Non-fatal on failure
(station NC is a product, not a prerequisite for the next stage).
- Create scripts/nosofs/exnos_ofs_post.sh: COMF SCHISM post-processing
  that generates CO-OPS standard station NetCDF from staout text files
  for both nowcast and forecast phases
- Wire into jobs/JNOS_OFS_POST comf case (replaces placeholder)
- model_run.sh: revert station NC generation, keep only staout archival
  (nowcast → restart_outputs, forecast → forecast_outputs)
- Forecast staout files now archived to COMOUT for post-processing access
For 2D barotropic runs (BAROTROPIC=true), the post job now:
1. Generates station NetCDF from staout (existing step)
2. Trains bias correction: a_i from 2D control vs 3D det
3. Applies correction to each perturbed ensemble member:
   WL_final(t) = WL_3d_det(t) + a_i * (WL_2d_member(t) - WL_2d_control(t))

Produces bias_coefficients.json and per-member corrected_wl.csv
in COMOUT. Skips gracefully if 3D det or ensemble outputs are missing.

3D det OFS auto-derived from 2D name (secofs_2d_ufs → secofs_ufs).
Configurable via DET_OFS and DET_COMOUT env vars.
ENSEMBLE_COMOUT changed from ensemble/member_{ID} to
ensemble/{cycle}/member_{ID} so multiple cycles per day
don't clobber each other's staout files.

Updated both JNOS_OFS_ENSEMBLE_MEMBER and exnos_ofs_post.sh
to use the cycle-specific path.
- Step 1.5: bias correction between output combining and statistics
  Trains a_i from 2D control vs 3D det, applies to each perturbed
  member. Only runs when BAROTROPIC=true and 3D det is available.
- Member dirs: check cycle-specific path (ensemble/${cycle}/member_*)
  first, fall back to legacy flat layout for backward compatibility.
- Coefficients saved to ensemble/${cycle}/bias_coefficients.json.
The 3D operational COMOUT is secofs.{PDY}, not secofs_ufs.{PDY}.
Fixed sed pattern in both JNOS_OFS_ENSEMBLE_POST and exnos_ofs_post.sh.
Also handles stofs_2d_atl → stofs_3d_atl correctly.
Prevents 18z stats from overwriting 12z stats, consistent with
the cycle-scoped member dirs fix.
- Launcher: add BAROTROPIC=true to enspost qsub -v so bias
  correction step actually runs
- Post PBS: python 3.8.6 → 3.12.0 (netCDF4 required for bias
  correction to read 3D det station files)
- Prep PBS: hardcoded cyc=00 → ${CYC:-00} so it accepts any cycle
- Launcher: UFS mode uses _00.pbs for all cycles (cycle via -v CYC=)
  Fixed in 3 locations: prep, det-only, and --with-det paths
- Prep qsub now passes CYC and OFS along with PDY
Covers the full pipeline: why 2D differs from 3D, the boundary fix
(iettype 5→3), friction tuning experiments, the anomaly-based
correction architecture, implementation details, CLI usage,
results, and limitations/future work.
schism_combine_outputs.py crashed on 2D runs because:
1. sigma.dat either missing or inherited from 3D (63 levels vs 3)
2. Empty staout_5-9 files not handled

Fix: auto-detect n_levels from staout_5 column count when sigma.dat
is missing. Skip empty 3D staout files. Generate synthetic sigma
array for NetCDF output.
SCHISM writes truncated floats like '-0.281012-220' (missing 'E')
when values are very small. Added _parse_fortran_float() to insert
the missing 'E' exponent marker before the trailing sign.
…match

The 3D sigma.dat (63 levels) was being copied for 2D runs via the
_base fallback, causing reshape errors (data has 3 levels but
sigma.dat says 63). Fix: skip sigma.dat copy when BAROTROPIC=true,
let schism_combine_outputs.py auto-detect n_levels from staout_5.
Previous post runs left 3D sigma.dat (63 levels) in the outputs
directory. The _FIX_EXTS skip logic doesn't re-copy, but the
stale file persists. Add explicit rm for barotropic runs before
the copy step.
YAML config loader sets BAROTROPIC=1, not BAROTROPIC=true.
The sigma.dat skip and rm guards only checked for 'true',
causing the 3D sigma.dat to persist and crash station NC
generation with wrong level count.
nos_ofs_create_forcing_obc.sh was called twice in the non-barotropic
else block, wasting ~30 minutes of prep time. The second call just
overwrites with identical output.
…f 18

The launcher passed lowercase 'cyc' via qsub -v but PBS scripts read
uppercase 'CYC' with default 00. PBS -v is case-sensitive, so the cycle
was silently ignored for nowcast and forecast jobs.

Also adds OFS= to the det-only and ensemble nowcast/forecast qsub calls
for consistency with the prep job.
- Create jnos_secofs_ufs_post_00.pbs for UFS det post-processing
- Create jnos_secofs_post_00.pbs for standalone det post-processing
- Add post step to launcher det-only path: prep → nowcast → forecast → post
- Post depends on forecast completion, calls JNOS_OFS_POST which runs
  exnos_ofs_post.sh to generate station NetCDF from staout files
JNOS_OFS_POST uses nos_ofs_ver to build COMOUT via compath.py, but
the PBS scripts only set nosofs_ver (from run.ver). Without nos_ofs_ver,
COMOUTroot is empty and staout files can't be found.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant