Skip to content

Find and load data from the Brightband AI-ready mirror of the NOAA NASA Joint Archive (NNJA) of Observations for Earth System Reanalysis

License

Notifications You must be signed in to change notification settings

brightbandtech/nnja-ai

Repository files navigation

GitHub Actions - CI ReadTheDocs - Status PyPI - Latest zenodo


nnja-ai: multi-modal, AI-ready weather observations

This is the companion Python SDK to the Brightband AI-ready reprocessing of the NOAA NASA Joint Archive (NNJA). It is meant to serve as a helpful interface between a user and the underlying NNJA datasets (which currently consist of parquet files on GCS).

The V1 release of the NNJA-AI dataset and SDK represents a major increment in availability of NNJA data, with ~50 TiB of observations made available in parquet form along with a data catalog and code examples in this SDK.

Background

The NNJA archive project is a curated archive of Earth system data from 1979 to present. This data represents a rich trove of observational data for use in AI weather modelling, however the archival format in which the data is originally available (BUFR) is cumbersome to work with. In partnership with NOAA, Brightband is processing that data to make it more accessible to the community.

Data

NNJA datasets are organized by sensor/source (e.g. all-sky radiances from the GOES ABI). The list of all NNJA datasets can be found on the NNJA project page, while the subset that is currently found in the NNJA-AI archive can be found here or by exploring the data catalog (this will be be expanding rapidly).

Getting Started

Official releases are available on PyPI, and you can directly install into your working environment:

# Basic installation
pip install nnja-ai

# Installation with optional dependencies for complete functionality
pip install "nnja-ai[complete]>=1.0.0"

# Add to an existing `uv` workspace
uv add nnja-ai

# Launch an interactive IPython session with the package installed
uv run --with nnja-ai ipython

To install bleeding-edge versions of this package directly from the GitHub repository, you can use the following pip command:

pip install git+https://github.com/brightbandtech/nnja-ai.git

You can find an example notebook here showing the basics of opening the data catalog, finding a dataset, subsetting, and finally loading the data to pandas.

Though to get started, you can open the data catalog like so:

from nnja_ai import DataCatalog
catalog = DataCatalog()
print("datasets in catalog:", catalog.list_datasets())
datasets in catalog:

['amsua-1bamua-NC021023',
 'atms-atms-NC021203',
 'mhs-1bmhs-NC021027',
 'cris-crisf4-NC021206',
 ...]

How to Cite

If you use this library or the Brightband reprocessed NNJA data, please cite it using the following DOI:

DOI

Additionally, please follow the citation guidance on the NNJA project page.

The NNJA-AI data is distributed with the same license as the original NNJA data, CC BY 4.0.

About

Find and load data from the Brightband AI-ready mirror of the NOAA NASA Joint Archive (NNJA) of Observations for Earth System Reanalysis

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •