Conversation
|
I will automatically update this comment whenever this PR is modified
|
jules32
left a comment
There was a problem hiding this comment.
Hi! Great work on this Danny! A few commits and some suggestions to consider.
|
|
||
| Several deliberate design decisions shape the library: | ||
|
|
||
| **Build on, don't replace, existing libraries.** `earthaccess` composes existing |
There was a problem hiding this comment.
I know some decisions of what not to do were really important. If I remember correctly, it was features that were discussed then developed elsewhere rather than earthaccess. I'd suggest emphasizing this here, it's a big deal!
There was a problem hiding this comment.
I put in a blurb about this below ("Contribute upstream, don't accumulate") – could really use more eyes on it!
Co-authored-by: Julia Stewart Lowndes <julia@openscapes.org>
There was a problem hiding this comment.
We could symlink this in to our docs!
I would say let's not wait. We've demonstrated impact and I think that matters more. Alternatively, let's just go 1.0.0 in the short term and be OK with quickly moving to a 2.0.0 release with breaking changes. I think both are fine, but the latter sets more a precedent of maintainers taking the user impact of breaking changes too lightly. |
Co-authored-by: Matt Fisher <3608264+mfisher87@users.noreply.github.com>
I'm fine with either too. I also think the decision could be on hold until one of the two things – (i) co-author reviews/revisions, (ii) development for v1.0.0 – is completely ready-to-go. |
itcarroll
left a comment
There was a problem hiding this comment.
Thanks @danielfromearth for all the excellent work on this!
| # Acknowledgements | ||
|
|
||
| The development of `earthaccess` was supported by NASA's Earth Science Data Systems | ||
| (ESDS) program through the Openscapes project (NASA award **______**, PIs Julia |
There was a problem hiding this comment.
don't forget to dig up the award number
There was a problem hiding this comment.
@jules32, would you be able to include this?
Co-authored-by: Ian Carroll <carroll.ian@gmail.com>
| } | ||
|
|
||
| @software{virtualizarr, | ||
| title = {{VirtualiZarr}: Create virtual {Zarr} stores from archival data using {xarray} syntax}, |
There was a problem hiding this comment.
| title = {{VirtualiZarr}: Create virtual {Zarr} stores from archival data using {xarray} syntax}, | |
| title = {{VirtualiZarr}: Create virtual {Zarr} stores from archival data using {Xarray} syntax}, |
There was a problem hiding this comment.
I think the capitalization is preferred as "xarray" (all lowercase) unless it's at the start of a sentence, based on this discussion and what I see in the xarray docs.
| # Statement of need | ||
|
|
||
| NASA's Earth science data archive is one of the largest and most diverse collections of | ||
| Earth observation data in the world, used by tens of thousands of researchers, educators, |
There was a problem hiding this comment.
| Earth observation data in the world, used by tens of thousands of researchers, educators, | |
| Earth observation data in the world, used by tens of millions of researchers, educators, |
ESDS metrics, accessed 25 Mar 2026
There was a problem hiding this comment.
cool! seems to me like the reference should be added too.
There was a problem hiding this comment.
I was wondering about metrics as well, thanks for adding this reference @JessicaS11 . Though, is more appropriate to say "... used by over ten million researchers..." instead of tens of millions, since I only see 10 mil listed on that page?
| `earthaccess` was created to address this gap: it provides uniform access to NASA | ||
| Earthdata regardless of data storage location, enabling researchers to focus on science | ||
| rather than data engineering. | ||
|
|
There was a problem hiding this comment.
| `earthaccess` was created to address this gap: it provides uniform access to NASA | |
| Earthdata regardless of data storage location, enabling researchers to focus on science | |
| rather than data engineering. | |
| `earthaccess` was created to address this gap: it provides uniform access to NASA | |
| Earthdata regardless of data storage location and handles authentication, credentials, and tokening behind the scenes, enabling researchers to focus on science rather than data engineering. |
Not sure what would be better ("data access and authentication"? "API engineering"?), but I'm wanting to change the "data engineering" at the end of the sentence. It might be pedantic, but to me data analysis doesn't necessarily exclude data engineering in the modern world (though I think that's the ideal we're working towards).
There was a problem hiding this comment.
maybe something like "...enabling researchers to focus more on scientific interpretation and discovery."?
| The target audience includes Earth scientists, remote sensing researchers, climate modelers, | ||
| hydrologists, ecologists, and any researcher, application developer, or educator who needs | ||
| to work with NASA Earth science data. The library is designed to be approachable for those new to Python -- with a |
There was a problem hiding this comment.
| The target audience includes Earth scientists, remote sensing researchers, climate modelers, | |
| hydrologists, ecologists, and any researcher, application developer, or educator who needs | |
| to work with NASA Earth science data. The library is designed to be approachable for those new to Python -- with a | |
| The target audience includes Earth scientists, remote sensing researchers, modelers, and any researcher, data user, application developer, or educator who needs | |
| to work with NASA Earth science data. The library is designed to be approachable for those new to Python -- with a |
We could add a more specific list of Earth scientists (hydrologists, ecologists, oceanographers, cryospheric scientists, climate modelers, some other broad categories, etc.) as another sentence (not sure how close to the word limit we are).
There was a problem hiding this comment.
sure, that sounds fine to me. How about flipping the order and expanding it to something like...
| The target audience includes Earth scientists, remote sensing researchers, climate modelers, | |
| hydrologists, ecologists, and any researcher, application developer, or educator who needs | |
| to work with NASA Earth science data. The library is designed to be approachable for those new to Python -- with a | |
| The target audience spans the breadth of Earth system science, including atmospheric scientists, oceanographers, cryospheric researchers, hydrologists, ecologists, biogeochemists, land surface modelers, air quality researchers, natural hazards researchers, and agricultural scientists. It also serves remote sensing researchers, application developers, natural resource and environmental decision makers, and educators who work with NASA Earth science data. The library is designed to be approachable for those new to Python -- with a |
There was a problem hiding this comment.
I know the list can go on and on but I think another important audience to emphasize is in regards to operational weather and natural disaster monitoring.
| supporting environment variables, `.netrc` files, and interactive prompts. Once | ||
| authenticated, the library creates HTTP sessions that correctly handle NASA's | ||
| cross-domain redirects and retrieves temporary AWS S3 credentials for in-region | ||
| cloud access. |
There was a problem hiding this comment.
Do the tokens still expire after an hour? And if so, I think earthaccess will renew them? If that's the case, I'd suggest adding this to the last sentence ("retrieves and renews temporary AWS...").
There was a problem hiding this comment.
That sounds good to me!
| 3. **Access**: Detects at runtime whether the process is running within AWS `us-west-2` | ||
| and automatically selects the optimal access path -- direct S3 reads for in-region | ||
| access or HTTPS downloads otherwise. Files can be opened as `fsspec`-compatible | ||
| file-like objects for streaming into libraries such as xarray [@xarray], or |
There was a problem hiding this comment.
| file-like objects for streaming into libraries such as xarray [@xarray], or | |
| file-like objects for streaming into libraries such as Xarray [@xarray], or |
There was a problem hiding this comment.
See above comment about "xarray" capitalization.
| hard-coded URLs and custom authentication logic. | ||
|
|
||
| **Multi-institutional development.** Contributors span NASA's Distributed Active Archive Centers (DAACs) — including ASDC, ASF, GES DISC, LP DAAC, NSIDC, OB.DAAC, ORNL DAAC, and PO.DAAC — as well as other federal and academic institutions (USGS, | ||
| University of New Hampshire), private industry (Coiled, Development Seed), |
There was a problem hiding this comment.
| University of New Hampshire), private industry (Coiled, Development Seed), | |
| Goddard, University of New Hampshire, University of Maryland), private industry (Coiled, Development Seed), |
There was a problem hiding this comment.
@JessicaS11 Perhaps we need to be more specific than "Goddard" here? GES DISC is included in the DAAC listing, so is it the Ocean Ecology Lab that's not yet represented?
|
@danielfromearth Love this - thank you so much for putting it together! Sorry it looks like a lot of edits - most of them are pretty minor (grammatical or editorial), with a few suggestions for the text. All that said, none of them are non-starters for me. |
Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>
Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>
Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>
Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>
asteiker
left a comment
There was a problem hiding this comment.
Such great work @danielfromearth - Thank you for spearheading this!
|
|
||
| `earthaccess` is an open-source Python library that simplifies the discovery, authentication, | ||
| and access of NASA Earth science data. NASA's Earth Observing System Data and Information System | ||
| (EOSDIS) distributes over 100 petabytes of data across 12 Distributed Active Archive Centers |
There was a problem hiding this comment.
| (EOSDIS) distributes over 100 petabytes of data across 12 Distributed Active Archive Centers | |
| (EOSDIS) distributes over 100 petabytes of data across 11 Distributed Active Archive Centers |
| Earthdata Login (EDL) service [@nasa_edl], exposes NASA's Common Metadata Repository | ||
| (CMR) [@nasa_cmr] for data discovery, and transparently manages data retrieval via | ||
| either HTTPS download or direct S3 access when running in the Amazon Web Services (AWS) | ||
| `us-west-2` region -- where NASA's cloud-hosted data resides. `earthaccess` also supports |
There was a problem hiding this comment.
Noting for now, but this may be described below: A key feature worth emphasizing is the fact that the code does not need to be modified depending on compute location, which was a notable Earthdata Cloud pain point based on direct user engagement.
| Earthdata Login (EDL) service [@nasa_edl], exposes NASA's Common Metadata Repository | ||
| (CMR) [@nasa_cmr] for data discovery, and transparently manages data retrieval via | ||
| either HTTPS download or direct S3 access when running in the Amazon Web Services (AWS) | ||
| `us-west-2` region -- where NASA's cloud-hosted data resides. `earthaccess` also supports |
There was a problem hiding this comment.
| `us-west-2` region -- where NASA's cloud-hosted data resides. `earthaccess` also supports | |
| `us-west-2` region -- where data within NASA's Earthdata Cloud reside. `earthaccess` also supports |
| `us-west-2` region -- where NASA's cloud-hosted data resides. `earthaccess` also supports | ||
| streaming data directly into analysis-ready formats using `fsspec` [@fsspec] and | ||
| constructing virtual Zarr stores from archival formats (e.g., HDF5 and NetCDF4) using | ||
| DMR++ metadata [@dmrpp], powered by VirtualiZarr [@virtualizarr] and kerchunk [@kerchunk]. |
There was a problem hiding this comment.
| DMR++ metadata [@dmrpp], powered by VirtualiZarr [@virtualizarr] and kerchunk [@kerchunk]. | |
| DMR++ metadata [@dmrpp], powered by VirtualiZarr [@virtualizarr] and kerchunk [@kerchunk], enabling drastic improvements in access performance. |
| # Statement of need | ||
|
|
||
| NASA's Earth science data archive is one of the largest and most diverse collections of | ||
| Earth observation data in the world, used by tens of thousands of researchers, educators, |
There was a problem hiding this comment.
I was wondering about metrics as well, thanks for adding this reference @JessicaS11 . Though, is more appropriate to say "... used by over ten million researchers..." instead of tens of millions, since I only see 10 mil listed on that page?
|
|
||
| **Peer-reviewed publications.** `earthaccess` has been used in published research, | ||
| including studies on multi-sensor drought observations in forested environments | ||
| [@andreadis2024] and tidal bore detection using SWOT satellite data [@arildsen2025]. |
There was a problem hiding this comment.
How did you perform this search, @danielfromearth ? I'm interested in a better way to surface research outcomes utilizing earthaccess (see #1216) so maybe your process could be applied somehow here.
| been installed and used in cloud-hosted Jupyter environments provided by NASA and | ||
| partner organizations. As one example of downstream adoption, icepack -- a finite | ||
| element library for ice sheet and glacier modeling [@shapero2021] -- replaced its | ||
| hand-written NSIDC data-fetching routines with `earthaccess` calls, eliminating |
There was a problem hiding this comment.
| hand-written NSIDC data-fetching routines with `earthaccess` calls, eliminating | |
| hand-written NSIDC DAAC data-fetching routines with `earthaccess` calls, eliminating |
| hand-written NSIDC data-fetching routines with `earthaccess` calls, eliminating | ||
| hard-coded URLs and custom authentication logic. `earthaccess` has replaced tens of lines of code across countless NASA data access tutorials, increasing user accessibility and reducing the amount of "getting started" overhead. | ||
|
|
||
| **Multi-institutional development.** Contributors span NASA's Distributed Active Archive Centers (DAACs) — including ASDC, ASF, GES DISC, LP DAAC, NSIDC, OB.DAAC, ORNL DAAC, and PO.DAAC — as well as other federal and academic institutions (USGS, |
There was a problem hiding this comment.
| **Multi-institutional development.** Contributors span NASA's Distributed Active Archive Centers (DAACs) — including ASDC, ASF, GES DISC, LP DAAC, NSIDC, OB.DAAC, ORNL DAAC, and PO.DAAC — as well as other federal and academic institutions (USGS, | |
| **Multi-institutional development.** Contributors span NASA's Distributed Active Archive Centers (DAACs) — including ASDC, ASF, GES DISC, LP DAAC, NSIDC DAAC, OB.DAAC, ORNL DAAC, and PO.DAAC — as well as other federal and academic institutions (USGS, |
| and independent open-source contributors. This breadth reflects both the library's | ||
| relevance across domains and the health of its contributor community. | ||
|
|
||
| **Integration with the NASA ecosystem.** `earthaccess` is featured in official NASA |
There was a problem hiding this comment.
| **Integration with the NASA ecosystem.** `earthaccess` is featured in official NASA | |
| **Integration with the NASA ecosystem.** `earthaccess` is featured in the official NASA |
| relevance across domains and the health of its contributor community. | ||
|
|
||
| **Integration with the NASA ecosystem.** `earthaccess` is featured in official NASA | ||
| Earthdata tutorials, including <https://www.earthdata.nasa.gov/data/tools/earthaccess>, has been presented at multiple large professional meetings (including several American Geophysical Union Annual and Earth System Information Partnership (ESIP) meetings), and was the subject of |
There was a problem hiding this comment.
| Earthdata tutorials, including <https://www.earthdata.nasa.gov/data/tools/earthaccess>, has been presented at multiple large professional meetings (including several American Geophysical Union Annual and Earth System Information Partnership (ESIP) meetings), and was the subject of | |
| Earthdata tools catalog, including <https://www.earthdata.nasa.gov/data/tools/earthaccess>, and other NASA Earthdata tutorials, and has been presented at multiple large professional meetings (including several American Geophysical Union Annual and Earth System Information Partnership (ESIP) meetings), and was the subject of |
Manuscript draft
This PR is intended for revisions and improvements to the manuscript draft being prepared for submission to the Journal of Open Source Software (JOSS).
Paper format: The manuscript is prepared as a Markdown (
paper.md) file with references in apaper.bibfile, following the JOSS formatting guidelines.For a PDF preview: With docker installed locally, a PDF preview of the draft manuscript can be generated, by running the following from the earthaccess root directory (as described in the JOSS guidelines's docker section):
📚 Documentation preview 📚: https://earthaccess--1249.org.readthedocs.build/en/1249/