Skip to content

JOSS paper preparation#1249

Draft
danielfromearth wants to merge 15 commits intomainfrom
joss-paper
Draft

JOSS paper preparation#1249
danielfromearth wants to merge 15 commits intomainfrom
joss-paper

Conversation

@danielfromearth
Copy link
Copy Markdown
Contributor

@danielfromearth danielfromearth commented Mar 5, 2026

Manuscript draft

This PR is intended for revisions and improvements to the manuscript draft being prepared for submission to the Journal of Open Source Software (JOSS).

Paper format: The manuscript is prepared as a Markdown (paper.md) file with references in a paper.bib file, following the JOSS formatting guidelines.

For a PDF preview: With docker installed locally, a PDF preview of the draft manuscript can be generated, by running the following from the earthaccess root directory (as described in the JOSS guidelines's docker section):

docker run --rm \
    --volume $PWD/paper:/data \
    --user $(id -u):$(id -g) \
    --env JOURNAL=joss \
    openjournals/inara

📚 Documentation preview 📚: https://earthaccess--1249.org.readthedocs.build/en/1249/

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 5, 2026

Binder 👈 Launch a binder notebook on this branch for commit 16fb7b9

I will automatically update this comment whenever this PR is modified

Binder 👈 Launch a binder notebook on this branch for commit 38cad6a

Binder 👈 Launch a binder notebook on this branch for commit 6af0701

Binder 👈 Launch a binder notebook on this branch for commit 767ad52

Binder 👈 Launch a binder notebook on this branch for commit dce192c

Binder 👈 Launch a binder notebook on this branch for commit ae74db7

Binder 👈 Launch a binder notebook on this branch for commit 05f7616

Binder 👈 Launch a binder notebook on this branch for commit bb5fd2f

Binder 👈 Launch a binder notebook on this branch for commit db3a969

Binder 👈 Launch a binder notebook on this branch for commit cf0f975

Binder 👈 Launch a binder notebook on this branch for commit 5852fa8

Binder 👈 Launch a binder notebook on this branch for commit 1b479c5

Binder 👈 Launch a binder notebook on this branch for commit 5029e59

Binder 👈 Launch a binder notebook on this branch for commit 2f8cab3

Copy link
Copy Markdown
Contributor

@jules32 jules32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Great work on this Danny! A few commits and some suggestions to consider.


Several deliberate design decisions shape the library:

**Build on, don't replace, existing libraries.** `earthaccess` composes existing
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know some decisions of what not to do were really important. If I remember correctly, it was features that were discussed then developed elsewhere rather than earthaccess. I'd suggest emphasizing this here, it's a big deal!

Copy link
Copy Markdown
Contributor Author

@danielfromearth danielfromearth Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put in a blurb about this below ("Contribute upstream, don't accumulate") – could really use more eyes on it!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could symlink this in to our docs!

@mfisher87
Copy link
Copy Markdown
Member

after the v1.0.0 release

I would say let's not wait. We've demonstrated impact and I think that matters more.

Alternatively, let's just go 1.0.0 in the short term and be OK with quickly moving to a 2.0.0 release with breaking changes.

I think both are fine, but the latter sets more a precedent of maintainers taking the user impact of breaking changes too lightly.

Co-authored-by: Matt Fisher <3608264+mfisher87@users.noreply.github.com>
@danielfromearth danielfromearth changed the title Joss paper JOSS paper preparation Mar 6, 2026
@danielfromearth
Copy link
Copy Markdown
Contributor Author

after the v1.0.0 release

I would say let's not wait. We've demonstrated impact and I think that matters more.

Alternatively, let's just go 1.0.0 in the short term and be OK with quickly moving to a 2.0.0 release with breaking changes.

I think both are fine, but the latter sets more a precedent of maintainers taking the user impact of breaking changes too lightly.

I'm fine with either too. I also think the decision could be on hold until one of the two things – (i) co-author reviews/revisions, (ii) development for v1.0.0 – is completely ready-to-go.

Copy link
Copy Markdown
Contributor

@itcarroll itcarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danielfromearth for all the excellent work on this!

# Acknowledgements

The development of `earthaccess` was supported by NASA's Earth Science Data Systems
(ESDS) program through the Openscapes project (NASA award **______**, PIs Julia
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't forget to dig up the award number

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jules32, would you be able to include this?

Co-authored-by: Ian Carroll <carroll.ian@gmail.com>
}

@software{virtualizarr,
title = {{VirtualiZarr}: Create virtual {Zarr} stores from archival data using {xarray} syntax},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title = {{VirtualiZarr}: Create virtual {Zarr} stores from archival data using {xarray} syntax},
title = {{VirtualiZarr}: Create virtual {Zarr} stores from archival data using {Xarray} syntax},

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the capitalization is preferred as "xarray" (all lowercase) unless it's at the start of a sentence, based on this discussion and what I see in the xarray docs.

# Statement of need

NASA's Earth science data archive is one of the largest and most diverse collections of
Earth observation data in the world, used by tens of thousands of researchers, educators,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Earth observation data in the world, used by tens of thousands of researchers, educators,
Earth observation data in the world, used by tens of millions of researchers, educators,

ESDS metrics, accessed 25 Mar 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! seems to me like the reference should be added too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about metrics as well, thanks for adding this reference @JessicaS11 . Though, is more appropriate to say "... used by over ten million researchers..." instead of tens of millions, since I only see 10 mil listed on that page?

Comment on lines +144 to +147
`earthaccess` was created to address this gap: it provides uniform access to NASA
Earthdata regardless of data storage location, enabling researchers to focus on science
rather than data engineering.

Copy link
Copy Markdown
Contributor

@JessicaS11 JessicaS11 Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`earthaccess` was created to address this gap: it provides uniform access to NASA
Earthdata regardless of data storage location, enabling researchers to focus on science
rather than data engineering.
`earthaccess` was created to address this gap: it provides uniform access to NASA
Earthdata regardless of data storage location and handles authentication, credentials, and tokening behind the scenes, enabling researchers to focus on science rather than data engineering.

Not sure what would be better ("data access and authentication"? "API engineering"?), but I'm wanting to change the "data engineering" at the end of the sentence. It might be pedantic, but to me data analysis doesn't necessarily exclude data engineering in the modern world (though I think that's the ideal we're working towards).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something like "...enabling researchers to focus more on scientific interpretation and discovery."?

Comment on lines +148 to +150
The target audience includes Earth scientists, remote sensing researchers, climate modelers,
hydrologists, ecologists, and any researcher, application developer, or educator who needs
to work with NASA Earth science data. The library is designed to be approachable for those new to Python -- with a
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The target audience includes Earth scientists, remote sensing researchers, climate modelers,
hydrologists, ecologists, and any researcher, application developer, or educator who needs
to work with NASA Earth science data. The library is designed to be approachable for those new to Python -- with a
The target audience includes Earth scientists, remote sensing researchers, modelers, and any researcher, data user, application developer, or educator who needs
to work with NASA Earth science data. The library is designed to be approachable for those new to Python -- with a

We could add a more specific list of Earth scientists (hydrologists, ecologists, oceanographers, cryospheric scientists, climate modelers, some other broad categories, etc.) as another sentence (not sure how close to the word limit we are).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that sounds fine to me. How about flipping the order and expanding it to something like...

Suggested change
The target audience includes Earth scientists, remote sensing researchers, climate modelers,
hydrologists, ecologists, and any researcher, application developer, or educator who needs
to work with NASA Earth science data. The library is designed to be approachable for those new to Python -- with a
The target audience spans the breadth of Earth system science, including atmospheric scientists, oceanographers, cryospheric researchers, hydrologists, ecologists, biogeochemists, land surface modelers, air quality researchers, natural hazards researchers, and agricultural scientists. It also serves remote sensing researchers, application developers, natural resource and environmental decision makers, and educators who work with NASA Earth science data. The library is designed to be approachable for those new to Python -- with a

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the list can go on and on but I think another important audience to emphasize is in regards to operational weather and natural disaster monitoring.

supporting environment variables, `.netrc` files, and interactive prompts. Once
authenticated, the library creates HTTP sessions that correctly handle NASA's
cross-domain redirects and retrieves temporary AWS S3 credentials for in-region
cloud access.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the tokens still expire after an hour? And if so, I think earthaccess will renew them? If that's the case, I'd suggest adding this to the last sentence ("retrieves and renews temporary AWS...").

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me!

3. **Access**: Detects at runtime whether the process is running within AWS `us-west-2`
and automatically selects the optimal access path -- direct S3 reads for in-region
access or HTTPS downloads otherwise. Files can be opened as `fsspec`-compatible
file-like objects for streaming into libraries such as xarray [@xarray], or
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file-like objects for streaming into libraries such as xarray [@xarray], or
file-like objects for streaming into libraries such as Xarray [@xarray], or

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment about "xarray" capitalization.

hard-coded URLs and custom authentication logic.

**Multi-institutional development.** Contributors span NASA's Distributed Active Archive Centers (DAACs) — including ASDC, ASF, GES DISC, LP DAAC, NSIDC, OB.DAAC, ORNL DAAC, and PO.DAAC — as well as other federal and academic institutions (USGS,
University of New Hampshire), private industry (Coiled, Development Seed),
Copy link
Copy Markdown
Contributor

@JessicaS11 JessicaS11 Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
University of New Hampshire), private industry (Coiled, Development Seed),
Goddard, University of New Hampshire, University of Maryland), private industry (Coiled, Development Seed),

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JessicaS11 Perhaps we need to be more specific than "Goddard" here? GES DISC is included in the DAAC listing, so is it the Ocean Ecology Lab that's not yet represented?

@JessicaS11
Copy link
Copy Markdown
Contributor

@danielfromearth Love this - thank you so much for putting it together!

Sorry it looks like a lot of edits - most of them are pretty minor (grammatical or editorial), with a few suggestions for the text. All that said, none of them are non-starters for me.

danielfromearth and others added 4 commits March 26, 2026 10:40
Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>
Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>
Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>
Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@asteiker asteiker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such great work @danielfromearth - Thank you for spearheading this!


`earthaccess` is an open-source Python library that simplifies the discovery, authentication,
and access of NASA Earth science data. NASA's Earth Observing System Data and Information System
(EOSDIS) distributes over 100 petabytes of data across 12 Distributed Active Archive Centers
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(EOSDIS) distributes over 100 petabytes of data across 12 Distributed Active Archive Centers
(EOSDIS) distributes over 100 petabytes of data across 11 Distributed Active Archive Centers

Earthdata Login (EDL) service [@nasa_edl], exposes NASA's Common Metadata Repository
(CMR) [@nasa_cmr] for data discovery, and transparently manages data retrieval via
either HTTPS download or direct S3 access when running in the Amazon Web Services (AWS)
`us-west-2` region -- where NASA's cloud-hosted data resides. `earthaccess` also supports
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting for now, but this may be described below: A key feature worth emphasizing is the fact that the code does not need to be modified depending on compute location, which was a notable Earthdata Cloud pain point based on direct user engagement.

Earthdata Login (EDL) service [@nasa_edl], exposes NASA's Common Metadata Repository
(CMR) [@nasa_cmr] for data discovery, and transparently manages data retrieval via
either HTTPS download or direct S3 access when running in the Amazon Web Services (AWS)
`us-west-2` region -- where NASA's cloud-hosted data resides. `earthaccess` also supports
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`us-west-2` region -- where NASA's cloud-hosted data resides. `earthaccess` also supports
`us-west-2` region -- where data within NASA's Earthdata Cloud reside. `earthaccess` also supports

`us-west-2` region -- where NASA's cloud-hosted data resides. `earthaccess` also supports
streaming data directly into analysis-ready formats using `fsspec` [@fsspec] and
constructing virtual Zarr stores from archival formats (e.g., HDF5 and NetCDF4) using
DMR++ metadata [@dmrpp], powered by VirtualiZarr [@virtualizarr] and kerchunk [@kerchunk].
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DMR++ metadata [@dmrpp], powered by VirtualiZarr [@virtualizarr] and kerchunk [@kerchunk].
DMR++ metadata [@dmrpp], powered by VirtualiZarr [@virtualizarr] and kerchunk [@kerchunk], enabling drastic improvements in access performance.

# Statement of need

NASA's Earth science data archive is one of the largest and most diverse collections of
Earth observation data in the world, used by tens of thousands of researchers, educators,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about metrics as well, thanks for adding this reference @JessicaS11 . Though, is more appropriate to say "... used by over ten million researchers..." instead of tens of millions, since I only see 10 mil listed on that page?


**Peer-reviewed publications.** `earthaccess` has been used in published research,
including studies on multi-sensor drought observations in forested environments
[@andreadis2024] and tidal bore detection using SWOT satellite data [@arildsen2025].
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you perform this search, @danielfromearth ? I'm interested in a better way to surface research outcomes utilizing earthaccess (see #1216) so maybe your process could be applied somehow here.

been installed and used in cloud-hosted Jupyter environments provided by NASA and
partner organizations. As one example of downstream adoption, icepack -- a finite
element library for ice sheet and glacier modeling [@shapero2021] -- replaced its
hand-written NSIDC data-fetching routines with `earthaccess` calls, eliminating
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hand-written NSIDC data-fetching routines with `earthaccess` calls, eliminating
hand-written NSIDC DAAC data-fetching routines with `earthaccess` calls, eliminating

hand-written NSIDC data-fetching routines with `earthaccess` calls, eliminating
hard-coded URLs and custom authentication logic. `earthaccess` has replaced tens of lines of code across countless NASA data access tutorials, increasing user accessibility and reducing the amount of "getting started" overhead.

**Multi-institutional development.** Contributors span NASA's Distributed Active Archive Centers (DAACs) — including ASDC, ASF, GES DISC, LP DAAC, NSIDC, OB.DAAC, ORNL DAAC, and PO.DAAC — as well as other federal and academic institutions (USGS,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Multi-institutional development.** Contributors span NASA's Distributed Active Archive Centers (DAACs) — including ASDC, ASF, GES DISC, LP DAAC, NSIDC, OB.DAAC, ORNL DAAC, and PO.DAAC — as well as other federal and academic institutions (USGS,
**Multi-institutional development.** Contributors span NASA's Distributed Active Archive Centers (DAACs) — including ASDC, ASF, GES DISC, LP DAAC, NSIDC DAAC, OB.DAAC, ORNL DAAC, and PO.DAAC — as well as other federal and academic institutions (USGS,

and independent open-source contributors. This breadth reflects both the library's
relevance across domains and the health of its contributor community.

**Integration with the NASA ecosystem.** `earthaccess` is featured in official NASA
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Integration with the NASA ecosystem.** `earthaccess` is featured in official NASA
**Integration with the NASA ecosystem.** `earthaccess` is featured in the official NASA

relevance across domains and the health of its contributor community.

**Integration with the NASA ecosystem.** `earthaccess` is featured in official NASA
Earthdata tutorials, including <https://www.earthdata.nasa.gov/data/tools/earthaccess>, has been presented at multiple large professional meetings (including several American Geophysical Union Annual and Earth System Information Partnership (ESIP) meetings), and was the subject of
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Earthdata tutorials, including <https://www.earthdata.nasa.gov/data/tools/earthaccess>, has been presented at multiple large professional meetings (including several American Geophysical Union Annual and Earth System Information Partnership (ESIP) meetings), and was the subject of
Earthdata tools catalog, including <https://www.earthdata.nasa.gov/data/tools/earthaccess>, and other NASA Earthdata tutorials, and has been presented at multiple large professional meetings (including several American Geophysical Union Annual and Earth System Information Partnership (ESIP) meetings), and was the subject of

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.