Conversation
Add DataSource (CAMS) and ForecastSource (CAMS_FX) for Copernicus Atmosphere Monitoring Service data via the CDS API. CAMS provides atmospheric composition / air quality data not currently available in earth2studio — complementing the existing weather-focused data sources (GFS, IFS, ERA5, etc.). Data sources: - CAMS: EU air quality analysis (0.1 deg, 9 pollutants, 10 height levels) - CAMS_FX: EU + Global forecasts (EU 0.1 deg up to 96h, Global 0.4 deg up to 120h) Variables include: dust, PM2.5, PM10, SO2, NO2, O3, CO, NH3, NO (EU surface and multi-level), plus AOD and total column products (Global). Lexicon: 101 entries covering all 9 pollutants at all 9 EU altitude levels (50-5000m), plus surface and 11 global column/AOD variables. Implementation follows upstream conventions: - Protocol-compliant __call__ and async fetch methods - Badges section for API doc filtering - Time validation, available() classmethod - Lazy CDS client initialization - pathlib-based caching with SHA256 keys - Tests with @pytest.mark.xfail for CI without CDS credentials Requires: cdsapi (already in the 'data' optional dependency group) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… coordinate-based lead-time selection
P1: Use atomic write-then-rename in _download_cams_netcdf to prevent
corrupt partial files from being cached on interrupted downloads.
P1: Fix TypeError in CAMS.available() and CAMS_FX.available() when
called with timezone-aware datetimes (strip tzinfo before comparing
against naive min-time constants, matching _validate_cams_time).
P2: Replace positional lead-time indexing in _extract_field with
coordinate-based selection via forecast_period dimension values,
avoiding silent data misassignment if API reorders slices.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add CAMS to analysis datasources and CAMS_FX to forecast datasources. Add region:europe and product:airquality to badge filters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Deduplicate api_vars via dict.fromkeys() to avoid duplicate variable names in CDS API requests (CAMS and CAMS_FX) - Use dataset-specific min-time validation in CAMS_FX (EU: 2019-07-01, Global: 2015-01-01) instead of global minimum for all datasets - Sort lead_hours in CAMS_FX cache key so identical lead times in different order produce the same cache hit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per reviewer feedback (NickGeneva): - Remove CAMS analysis class (no ML models need it currently) - Remove EU dataset support from CAMS_FX (1:1 mapping with remote store) - Reduce CAMSLexicon to 11 Global variables (AOD, column products) - Update docs and tests accordingly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/blossom-ci |
Greptile SummaryThis PR adds a new
|
| Filename | Overview |
|---|---|
| earth2studio/data/cams.py | New CAMS_FX data source; contains a P1 bug where fractional-hour lead times are silently truncated instead of rejected, and two P2 concerns around available() inconsistency and silent nearest-neighbor matching in _extract_field. |
| earth2studio/lexicon/cams.py | New CAMSGlobalLexicon; VOCAB entries are consistent, tco3 maps to nc_key "gtco3" which matches ECMWF conventions. Surface "z" and pressure-level "z*" both use nc_key "z" but live in separate datasets so no collision. |
| test/data/test_cams.py | New test file with mocked unit tests and slow/xfail integration tests; covers surface, pressure-level, mixed fetches, deduplication, cache behaviour, and lead-time validation. |
| earth2studio/lexicon/base.py | Adds 10 new CAMS variable descriptions to E2STUDIO_VOCAB; no conflicts with existing entries. |
| test/data/test_cds.py | Adds autouse fixture to point cdsapi at the CDS endpoint, preventing test modules from accidentally hitting the ADS endpoint. |
| test/lexicon/test_cams_lexicon.py | New lexicon tests covering all VOCAB entries and the four-part key format. |
Reviews (2): Last reviewed commit: "Fix" | Re-trigger Greptile
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
|
/blossom-ci |
|
/blossom-ci |
|
/blossom-ci |
1 similar comment
|
/blossom-ci |
* feat: add CAMS atmospheric composition data source and lexicon
Add DataSource (CAMS) and ForecastSource (CAMS_FX) for Copernicus
Atmosphere Monitoring Service data via the CDS API.
CAMS provides atmospheric composition / air quality data not currently
available in earth2studio — complementing the existing weather-focused
data sources (GFS, IFS, ERA5, etc.).
Data sources:
- CAMS: EU air quality analysis (0.1 deg, 9 pollutants, 10 height levels)
- CAMS_FX: EU + Global forecasts (EU 0.1 deg up to 96h, Global 0.4 deg up to 120h)
Variables include: dust, PM2.5, PM10, SO2, NO2, O3, CO, NH3, NO (EU surface
and multi-level), plus AOD and total column products (Global).
Lexicon: 101 entries covering all 9 pollutants at all 9 EU altitude levels
(50-5000m), plus surface and 11 global column/AOD variables.
Implementation follows upstream conventions:
- Protocol-compliant __call__ and async fetch methods
- Badges section for API doc filtering
- Time validation, available() classmethod
- Lazy CDS client initialization
- pathlib-based caching with SHA256 keys
- Tests with @pytest.mark.xfail for CI without CDS credentials
Requires: cdsapi (already in the 'data' optional dependency group)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address review findings — atomic download, tz-aware available(), coordinate-based lead-time selection
P1: Use atomic write-then-rename in _download_cams_netcdf to prevent
corrupt partial files from being cached on interrupted downloads.
P1: Fix TypeError in CAMS.available() and CAMS_FX.available() when
called with timezone-aware datetimes (strip tzinfo before comparing
against naive min-time constants, matching _validate_cams_time).
P2: Replace positional lead-time indexing in _extract_field with
coordinate-based selection via forecast_period dimension values,
avoiding silent data misassignment if API reorders slices.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add CAMS and CAMS_FX to datasource documentation pages
Add CAMS to analysis datasources and CAMS_FX to forecast datasources.
Add region:europe and product:airquality to badge filters.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address P2 review findings in CAMS data source
- Deduplicate api_vars via dict.fromkeys() to avoid duplicate variable
names in CDS API requests (CAMS and CAMS_FX)
- Use dataset-specific min-time validation in CAMS_FX (EU: 2019-07-01,
Global: 2015-01-01) instead of global minimum for all datasets
- Sort lead_hours in CAMS_FX cache key so identical lead times in
different order produce the same cache hit
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: decouple CAMS to Global-only forecast source
Per reviewer feedback (NickGeneva):
- Remove CAMS analysis class (no ML models need it currently)
- Remove EU dataset support from CAMS_FX (1:1 mapping with remote store)
- Reduce CAMSLexicon to 11 Global variables (AOD, column products)
- Update docs and tests accordingly
* Changelog
* Fix
---------
Co-authored-by: Claude Sonnet 4.5 <claude@anthropic.com>
Co-authored-by: Nicholas Geneva <5533524+NickGeneva@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Earth2Studio Pull Request
Description
Clean up and getting test working of the PR: #780
Sample script:
Checklist
Dependencies