nnja-ai: multi-modal, AI-ready weather observations

This is the companion Python SDK to the Brightband AI-ready reprocessing of the NOAA NASA Joint Archive (NNJA). It is meant to serve as a helpful interface between a user and the underlying NNJA datasets (which currently consist of parquet files on GCS).

The V1 release of the NNJA-AI dataset and SDK represents a major increment in availability of NNJA data, with ~50 TiB of observations made available in parquet form along with a data catalog and code examples in this SDK.

Background

The NNJA archive project is a curated archive of Earth system data from 1979 to present. This data represents a rich trove of observational data for use in AI weather modelling, however the archival format in which the data is originally available (BUFR) is cumbersome to work with. In partnership with NOAA, Brightband is processing that data to make it more accessible to the community.

Data

NNJA datasets are organized by sensor/source (e.g. all-sky radiances from the GOES ABI). The list of all NNJA datasets can be found on the NNJA project page, while the subset that is currently found in the NNJA-AI archive can be found here or by exploring the data catalog (this will be be expanding rapidly).

Getting Started

Official releases are available on PyPI, and you can directly install into your working environment:

# Basic installation
pip install nnja-ai

# Installation with optional dependencies for complete functionality
pip install "nnja-ai[complete]>=1.0.0"

# Add to an existing `uv` workspace
uv add nnja-ai

# Launch an interactive IPython session with the package installed
uv run --with nnja-ai ipython

To install bleeding-edge versions of this package directly from the GitHub repository, you can use the following pip command:

pip install git+https://github.com/brightbandtech/nnja-ai.git

You can find an example notebook here showing the basics of opening the data catalog, finding a dataset, subsetting, and finally loading the data to pandas.

Though to get started, you can open the data catalog like so:

from nnja_ai import DataCatalog
catalog = DataCatalog()
print("datasets in catalog:", catalog.list_datasets())

datasets in catalog:

['amsua-1bamua-NC021023',
 'atms-atms-NC021203',
 'mhs-1bmhs-NC021027',
 'cris-crisf4-NC021206',
 ...]

How to Cite

If you use this library or the Brightband reprocessed NNJA data, please cite it using the following DOI:

Additionally, please follow the citation guidance on the NNJA project page.

The NNJA-AI data is distributed with the same license as the original NNJA data, CC BY 4.0.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
.github/workflows		.github/workflows
docs		docs
example_notebooks		example_notebooks
src/nnja_ai		src/nnja_ai
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Justfile		Justfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nnja-ai: multi-modal, AI-ready weather observations

Background

Data

Getting Started

How to Cite

About

Uh oh!

Releases 7

Packages

Contributors 4

Uh oh!

Languages

License

brightbandtech/nnja-ai

Folders and files

Latest commit

History

Repository files navigation

nnja-ai: multi-modal, AI-ready weather observations

Background

Data

Getting Started

How to Cite

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 4

Uh oh!

Languages

Packages