Skip to content

DataWrangling cleanup: standardize download_dataset returns, drop ERA5 dead code#208

Open
ewquon wants to merge 4 commits intomainfrom
eq/datawrangling_cleanup
Open

DataWrangling cleanup: standardize download_dataset returns, drop ERA5 dead code#208
ewquon wants to merge 4 commits intomainfrom
eq/datawrangling_cleanup

Conversation

@ewquon
Copy link
Copy Markdown
Collaborator

@ewquon ewquon commented May 1, 2026

Summary

  • Standardize download_dataset return types across all ~17 methods: String for single-file Metadatum, Vector{String} for multi-file Metadata. Drops the previous mix of String / Vector{String} / nothing returns. Closes Inconsistent return types across download_dataset implementations #201.
  • Refactor ERA5 helpers (download_era5_day, download_era5_multivar_day) to return their already-computed path lists instead of nothing, and to construct paths via joinpath(dir, metadata_filename(...)) directly rather than instantiating throwaway Metadatum objects.
  • Remove dead code in the ERA5 module: unused ERA5_single_level_accumulated_variables Set, unused Day import, redundant metadata_path import, and 1-arg metadata_prefix overloads that duplicated the 4-arg form (tests updated to call 4-arg form directly).

Test plan

  • test_cds_downloading.jl passes locally (skip-existing short-circuit assertions updated to expect path vectors)
  • CI test suite passes
  • Spot-check that download_dataset(...) callers receive paths (previously discarded)

🤖 Generated with Claude Code

ewquon and others added 3 commits May 1, 2026 13:53
- Drop the unused `ERA5_single_level_accumulated_variables` Set in
  `ERA5_single_levels.jl` — it was defined with a comment about CDS
  step types but never wired up to anything (no references in src/,
  ext/, test/, or examples/).
- Drop the unused `Day` from the `using Dates: ...` import in `ERA5.jl`
  (only `DateTime`, `Month`, `Hour` are referenced).
- Drop the redundant `metadata_path` from the `import` block — already
  brought in via `using NumericalEarth.DataWrangling: metadata_path`,
  and the ERA5 module never extends it (the `import` form is for
  extending only).
- Delete the 1-arg `metadata_prefix(::ERA5Metadata)` /
  `(::ERA5PressureMetadata)` overloads. They duplicate the logic of
  the 4-arg form `metadata_prefix(dataset, name, date, region)` (which
  is what `metadata_filename` actually calls). Update the
  metadata_prefix unit tests to call the 4-arg form directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make download_dataset consistently return paths instead of nothing:
String for single-file Metadatum, Vector{String} for multi-file Metadata.
ERA5 helpers now return their already-computed path lists rather than
re-deriving them via throwaway Metadatum construction.

Closes #201.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
download_era5_day and download_era5_multivar_day now return path
vectors instead of nothing on the skip-existing short-circuit; update
the assertions accordingly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ewquon ewquon added clean up 🧹 technical debt is real people labels May 1, 2026
The existing skip-existing tests only exercised single-calendar-day
helper calls. Add tests that span two calendar days for the three
parent dispatches (ERA5Metadata, ERA5PressureMetadata + names, and
names + dataset + datetimes) so a regression that drops paths from
one day group would fail loudly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

❌ Patch coverage is 82.60870% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
ext/NumericalEarthCDSAPIExt.jl 88.23% 2 Missing ⚠️
...c/DataWrangling/OSPapa/OSPapa_flux_observations.jl 0.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@ewquon ewquon requested a review from simone-silvestri May 1, 2026 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clean up 🧹 technical debt is real people

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistent return types across download_dataset implementations

1 participant