Skip to content

CAMs Global FX Datasource#790

Merged
NickGeneva merged 17 commits intoNVIDIA:mainfrom
NickGeneva:feat/cams-datasource
Apr 2, 2026
Merged

CAMs Global FX Datasource#790
NickGeneva merged 17 commits intoNVIDIA:mainfrom
NickGeneva:feat/cams-datasource

Conversation

@NickGeneva
Copy link
Copy Markdown
Collaborator

@NickGeneva NickGeneva commented Apr 2, 2026

Earth2Studio Pull Request

Description

Clean up and getting test working of the PR: #780

Sample script:

#!/usr/bin/env python
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES.
# SPDX-FileCopyrightText: All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""
CAMS Global Forecast – fetch a variable and plot it on a map.

Usage:
    python cams_visualization.py                    # default: aod550 at lead_time=0h
    python cams_visualization.py u500 6             # u-wind at 500 hPa, 6h lead
    python cams_visualization.py t2m 0 --no-show    # save only, no GUI window

Requires:
    pip install 'earth2studio[data]' matplotlib cartopy

Note:
    You need ADS API credentials configured.
    See: https://ads.atmosphere.copernicus.eu/how-to-api
"""

import argparse
import datetime

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import numpy as np

from earth2studio.data import CAMS_FX


def main() -> None:
    parser = argparse.ArgumentParser(description="Plot a CAMS Global forecast field.")
    parser.add_argument(
        "variable",
        nargs="?",
        default="aod550",
        help="Variable name from CAMSGlobalLexicon (default: aod550)",
    )
    parser.add_argument(
        "lead_hours",
        nargs="?",
        type=int,
        default=0,
        help="Forecast lead time in hours (default: 0)",
    )
    parser.add_argument(
        "--date",
        type=str,
        default=None,
        help="Init date YYYY-MM-DD (default: yesterday 00Z)",
    )
    parser.add_argument(
        "--no-show",
        action="store_true",
        help="Save the figure but do not open a GUI window",
    )
    args = parser.parse_args()

    # Resolve init time
    if args.date:
        init_time = datetime.datetime.strptime(args.date, "%Y-%m-%d").replace(
            tzinfo=datetime.timezone.utc
        )
    else:
        init_time = datetime.datetime.now(datetime.timezone.utc).replace(
            hour=0, minute=0, second=0, microsecond=0
        ) - datetime.timedelta(days=1)

    lead_time = datetime.timedelta(hours=args.lead_hours)
    variable = args.variable

    print(f"Fetching CAMS: var={variable}  init={init_time}  lead={lead_time}")

    ds = CAMS_FX(cache=True, verbose=True)
    data = ds(init_time, lead_time, variable)

    # Extract the 2-D field (time=0, lead_time=0, variable=0)
    field = data.values[0, 0, 0]
    lat = data.coords["lat"].values
    lon = data.coords["lon"].values

    valid_time = init_time + lead_time
    title = (
        f"CAMS Global – {variable}\n"
        f"Init: {init_time:%Y-%m-%d %H}Z   "
        f"Valid: {valid_time:%Y-%m-%d %H}Z  (T+{args.lead_hours}h)"
    )

    # Plot
    fig, ax = plt.subplots(
        figsize=(14, 7),
        subplot_kw={"projection": ccrs.Robinson()},
    )
    ax.set_global()
    ax.add_feature(cfeature.COASTLINE, linewidth=0.5)
    ax.add_feature(cfeature.BORDERS, linewidth=0.3, linestyle="--")

    im = ax.pcolormesh(
        lon,
        lat,
        field,
        transform=ccrs.PlateCarree(),
        cmap="turbo",
        shading="auto",
    )

    cbar = fig.colorbar(im, ax=ax, orientation="horizontal", pad=0.05, shrink=0.7)
    cbar.set_label(variable)
    ax.set_title(title, fontsize=13)

    out_path = f"cams_{variable}_T{args.lead_hours:03d}.png"
    fig.savefig(out_path, dpi=150, bbox_inches="tight")
    print(f"Saved → {out_path}")

    if not args.no_show:
        plt.show()


if __name__ == "__main__":
    main()
cams_u500_T006

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.
  • Assess and address Greptile feedback (AI code review bot for guidance; use discretion, addressing all feedback is not required).

Dependencies

claude and others added 13 commits March 29, 2026 19:22
Add DataSource (CAMS) and ForecastSource (CAMS_FX) for Copernicus
Atmosphere Monitoring Service data via the CDS API.

CAMS provides atmospheric composition / air quality data not currently
available in earth2studio — complementing the existing weather-focused
data sources (GFS, IFS, ERA5, etc.).

Data sources:
- CAMS: EU air quality analysis (0.1 deg, 9 pollutants, 10 height levels)
- CAMS_FX: EU + Global forecasts (EU 0.1 deg up to 96h, Global 0.4 deg up to 120h)

Variables include: dust, PM2.5, PM10, SO2, NO2, O3, CO, NH3, NO (EU surface
and multi-level), plus AOD and total column products (Global).

Lexicon: 101 entries covering all 9 pollutants at all 9 EU altitude levels
(50-5000m), plus surface and 11 global column/AOD variables.

Implementation follows upstream conventions:
- Protocol-compliant __call__ and async fetch methods
- Badges section for API doc filtering
- Time validation, available() classmethod
- Lazy CDS client initialization
- pathlib-based caching with SHA256 keys
- Tests with @pytest.mark.xfail for CI without CDS credentials

Requires: cdsapi (already in the 'data' optional dependency group)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… coordinate-based lead-time selection

P1: Use atomic write-then-rename in _download_cams_netcdf to prevent
    corrupt partial files from being cached on interrupted downloads.

P1: Fix TypeError in CAMS.available() and CAMS_FX.available() when
    called with timezone-aware datetimes (strip tzinfo before comparing
    against naive min-time constants, matching _validate_cams_time).

P2: Replace positional lead-time indexing in _extract_field with
    coordinate-based selection via forecast_period dimension values,
    avoiding silent data misassignment if API reorders slices.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add CAMS to analysis datasources and CAMS_FX to forecast datasources.
Add region:europe and product:airquality to badge filters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Deduplicate api_vars via dict.fromkeys() to avoid duplicate variable
  names in CDS API requests (CAMS and CAMS_FX)
- Use dataset-specific min-time validation in CAMS_FX (EU: 2019-07-01,
  Global: 2015-01-01) instead of global minimum for all datasets
- Sort lead_hours in CAMS_FX cache key so identical lead times in
  different order produce the same cache hit

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per reviewer feedback (NickGeneva):
- Remove CAMS analysis class (no ML models need it currently)
- Remove EU dataset support from CAMS_FX (1:1 mapping with remote store)
- Reduce CAMSLexicon to 11 Global variables (AOD, column products)
- Update docs and tests accordingly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@NickGeneva
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@NickGeneva NickGeneva changed the title CAMs Datasource CAMs Global FX Datasource Apr 2, 2026
@NickGeneva NickGeneva added the external An awesome external contributor PR label Apr 2, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 2, 2026

Greptile Summary

This PR adds a new CAMS_FX forecast data source backed by the CAMS Global Atmospheric Composition Forecasts API (cdsapi), along with a CAMSGlobalLexicon covering surface, aerosol, trace-gas, and pressure-level variables. It also adds an autouse fixture to test_cds.py to ensure the CDS tests hit the correct API endpoint.

  • P1 — silent lead-time truncation: np.timedelta64(lt, \"h\").astype(int) truncates fractional hours before _validate_leadtime is called; the guard for non-whole-hour inputs never fires and the wrong lead time is silently requested from the API.

Confidence Score: 4/5

Safe to merge after addressing the sub-hour lead-time truncation bug; P2 findings are non-blocking.

One P1 defect remains: fractional-hour lead times are silently truncated rather than rejected, causing the wrong data to be fetched without any error. The two P2 findings (available() inconsistency, nearest-neighbor without tolerance) are quality improvements but do not block users on the primary path.

earth2studio/data/cams.py — specifically the lead_hours conversion and _extract_field nearest-neighbor logic.

Important Files Changed

Filename Overview
earth2studio/data/cams.py New CAMS_FX data source; contains a P1 bug where fractional-hour lead times are silently truncated instead of rejected, and two P2 concerns around available() inconsistency and silent nearest-neighbor matching in _extract_field.
earth2studio/lexicon/cams.py New CAMSGlobalLexicon; VOCAB entries are consistent, tco3 maps to nc_key "gtco3" which matches ECMWF conventions. Surface "z" and pressure-level "z*" both use nc_key "z" but live in separate datasets so no collision.
test/data/test_cams.py New test file with mocked unit tests and slow/xfail integration tests; covers surface, pressure-level, mixed fetches, deduplication, cache behaviour, and lead-time validation.
earth2studio/lexicon/base.py Adds 10 new CAMS variable descriptions to E2STUDIO_VOCAB; no conflicts with existing entries.
test/data/test_cds.py Adds autouse fixture to point cdsapi at the CDS endpoint, preventing test modules from accidentally hitting the ADS endpoint.
test/lexicon/test_cams_lexicon.py New lexicon tests covering all VOCAB entries and the four-part key format.

Reviews (2): Last reviewed commit: "Fix" | Re-trigger Greptile

NickGeneva and others added 3 commits April 1, 2026 21:21
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@NickGeneva
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@NickGeneva NickGeneva closed this Apr 2, 2026
@NickGeneva
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@NickGeneva NickGeneva reopened this Apr 2, 2026
@NickGeneva
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

1 similar comment
@NickGeneva
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@NickGeneva NickGeneva requested a review from pzharrington April 2, 2026 15:11
@NickGeneva NickGeneva merged commit 3ab2ada into NVIDIA:main Apr 2, 2026
8 checks passed
NickGeneva added a commit to NickGeneva/earth2studio that referenced this pull request Apr 2, 2026
* feat: add CAMS atmospheric composition data source and lexicon

Add DataSource (CAMS) and ForecastSource (CAMS_FX) for Copernicus
Atmosphere Monitoring Service data via the CDS API.

CAMS provides atmospheric composition / air quality data not currently
available in earth2studio — complementing the existing weather-focused
data sources (GFS, IFS, ERA5, etc.).

Data sources:
- CAMS: EU air quality analysis (0.1 deg, 9 pollutants, 10 height levels)
- CAMS_FX: EU + Global forecasts (EU 0.1 deg up to 96h, Global 0.4 deg up to 120h)

Variables include: dust, PM2.5, PM10, SO2, NO2, O3, CO, NH3, NO (EU surface
and multi-level), plus AOD and total column products (Global).

Lexicon: 101 entries covering all 9 pollutants at all 9 EU altitude levels
(50-5000m), plus surface and 11 global column/AOD variables.

Implementation follows upstream conventions:
- Protocol-compliant __call__ and async fetch methods
- Badges section for API doc filtering
- Time validation, available() classmethod
- Lazy CDS client initialization
- pathlib-based caching with SHA256 keys
- Tests with @pytest.mark.xfail for CI without CDS credentials

Requires: cdsapi (already in the 'data' optional dependency group)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review findings — atomic download, tz-aware available(), coordinate-based lead-time selection

P1: Use atomic write-then-rename in _download_cams_netcdf to prevent
    corrupt partial files from being cached on interrupted downloads.

P1: Fix TypeError in CAMS.available() and CAMS_FX.available() when
    called with timezone-aware datetimes (strip tzinfo before comparing
    against naive min-time constants, matching _validate_cams_time).

P2: Replace positional lead-time indexing in _extract_field with
    coordinate-based selection via forecast_period dimension values,
    avoiding silent data misassignment if API reorders slices.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add CAMS and CAMS_FX to datasource documentation pages

Add CAMS to analysis datasources and CAMS_FX to forecast datasources.
Add region:europe and product:airquality to badge filters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address P2 review findings in CAMS data source

- Deduplicate api_vars via dict.fromkeys() to avoid duplicate variable
  names in CDS API requests (CAMS and CAMS_FX)
- Use dataset-specific min-time validation in CAMS_FX (EU: 2019-07-01,
  Global: 2015-01-01) instead of global minimum for all datasets
- Sort lead_hours in CAMS_FX cache key so identical lead times in
  different order produce the same cache hit

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: decouple CAMS to Global-only forecast source

Per reviewer feedback (NickGeneva):
- Remove CAMS analysis class (no ML models need it currently)
- Remove EU dataset support from CAMS_FX (1:1 mapping with remote store)
- Reduce CAMSLexicon to 11 Global variables (AOD, column products)
- Update docs and tests accordingly
* Changelog
* Fix

---------

Co-authored-by: Claude Sonnet 4.5 <claude@anthropic.com>
Co-authored-by: Nicholas Geneva <5533524+NickGeneva@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external An awesome external contributor PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants