H5Coro testing and integration by zachghiaccio · Pull Request #678 · icesat2py/icepyx

zachghiaccio · 2025-04-30T18:57:42Z

Opening this to start discussion and development on the H5coro integration piece. Based on discussions with Rachel and Eric, we have several options for implementing h5coro into icepyx. However, it might require significant restructuring of the Read module in order to support the h5coro backend and (potentially) Xarray DataTrees.

* Adding argo search and download script * Create get_argo.py Download the 'classic' argo data with physical variables only * begin implementing argo dataset * 1st draft implementing argo dataset * implement search_data for physical argo * doctests and general cleanup for physical argo query * beginning of BGC Argo download * parse BGC profiles into DF * plan to query BGC profiles * validate BGC param input function * order BGC params in order in which they should be queried * fix bug in parse_into_df() - init blank df to take in union of params from all profiles * identify profiles from initial API request containing all required params * creates df with only profiles that contain all user specified params Need to dload additional params * modified to populate prof df by querying individual profiles * finished up BGC argo download! * assert bounding box type in Argo init, begin framework for unit tests * Adding argo search and download script * Create get_argo.py Download the 'classic' argo data with physical variables only * begin implementing argo dataset * 1st draft implementing argo dataset * implement search_data for physical argo * doctests and general cleanup for physical argo query * beginning of BGC Argo download * parse BGC profiles into DF * plan to query BGC profiles * validate BGC param input function * order BGC params in order in which they should be queried * fix bug in parse_into_df() - init blank df to take in union of params from all profiles * identify profiles from initial API request containing all required params * creates df with only profiles that contain all user specified params Need to dload additional params * modified to populate prof df by querying individual profiles * finished up BGC argo download! * assert bounding box type in Argo init, begin framework for unit tests * need to confirm spatial extent is bbox * begin test case for available profiles * add tests for argo.py * add typing, add example json, and use it to test parsing * update argo to submit successful api request (update keys and values submitted) * first pass at porting argo over to metadata+per profile download (WIP) * basic working argo script * simplify parameter validation (ordered list no longer needed) * add option to delete existing data before new download * continue cleaning up argo.py * fix download_by_profile to properly store all downloaded data * remove old get_argo.py script * remove _filter_profiles function in favor of submitting data kwarg in request * start filling in docstrings * clean up nearly duplicate functions * add more docstrings * get a few minimal argo tests working * add bgc argo params. begin adding merge for second download runs * some changes * WIP test commit to see if can push to GH * WIP handling argo merge issue * update profile to df to return df and move merging to get_dataframe * merge profiles with existing df * clean up docstrings and code * add test_argo.py * add prelim test case for adding to Argo df * remove sandbox files * remove bgc argo test file * update variables notebook from development * simplify import statements * quickfix for granules error * draft subpage on available QUEST datasets * small reference fix in text * add reference to top of .rst file * test argo df merge * add functionality to Quest class to pass search criteria to all datasets * add functionality to Quest class to pass search criteria to all datasets * update dataset docstrings; reorder argo.py to match * implement quest search+download for IS2 * move spatial and temporal properties from query to genquery * add query docstring test for cycles,tracks to test file * add quest test module * standardize print outputs for quest search and download; is2 download needs auth updates * remove extra files from this branch * comment out argo portions of quest for PR * remove argo-branch-only init file * remove argo script from branch * remove argo test file from branch * comment out another line of argo stuff * Update quest.py Added Docstrings to functions within quest.py and edited the primary docstring for the QUEST class here. Note I did not add Docstrings to the implicit __self__ function. * Update test_quest.py Added comments (not Docstrings) to test functions * Update dataset.py Minor edits to the doc strings * Update quest.py Edited docstrings * catch error with downloading datasets in Quest; template test case for multi dataset query --------- Co-authored-by: Kelsey Bisson <48059682+kelseybisson@users.noreply.github.com> Co-authored-by: Romina <romina.piunno@gmail.com> Co-authored-by: zachghiaccio <zhfair@umich.edu> Co-authored-by: Zach Fair <48361714+zachghiaccio@users.noreply.github.com>

for more information, see https://pre-commit.ci

github-actions · 2025-04-30T18:57:54Z

👈 Launch a binder notebook on this branch for commit 245fcac

I will automatically update this comment whenever this PR is modified

👈 Launch a binder notebook on this branch for commit 83ae9a4

👈 Launch a binder notebook on this branch for commit 237871a

👈 Launch a binder notebook on this branch for commit 49ab30c

👈 Launch a binder notebook on this branch for commit 664d517

rwegener2 · 2025-04-30T19:25:33Z

Exciting to see the progress, thanks for posting @zachghiaccio!

One question - does the Quest module have to change in order to use h5coro? I see several modified Quest files, which surprised me because I wouldn't have thought they'd need to change.

I am noticing that there is one commit on this branch from September 2023. Maybe that is the source of the Quest file modifications? (and perhaps also the merge conflicts?)

zachghiaccio · 2025-05-01T18:27:10Z

Yeah I'm a bit confused on those Quest changes as well, because I don't think there's anything in Quest right now that requires h5coro. The changes in said commit are already implemented into icepyx, so I must have forgotten to pull those changes.

for more information, see https://pre-commit.ci

…into development

for more information, see https://pre-commit.ci

rwegener2 · 2025-05-07T19:54:29Z

Hey everyone! I played around with this a bit this afternoon. @zachghiaccio I removed everything from this commit that was related to quest, since it sounds like the h5coro work isn't related to the quest activitiy.

In terms of h5coro I'm starting to develop a short list of things that need to be done to allow for h5coro to be added to icepyx:

specify engine=h5coro in the xr.open_dataset() calls (@zachghiaccio did this one already)
h5coro doesn't want to read an fsspec object, it wants a string to an s3 filepath (without the s3:// at the beginning)
because of the previous point h5coro also needs the s3 access token to be specified using the credentials argument
implement phony_dims kwarg in h5coro

The first 3 pieces above can be completed for the moment pretty easily (I made these changes in a fast way in the most recent commit). The bigger bump is point no. 4. Icepyx relies on the phony_dims: access kwarg on the xr.open_dataset() function, which isn't implemented in h5coro. I don't understand well what this kwarg does, but it is appears to be related to smushing together datasets of differing dimensions. If you try to read data in icepyx without that kwarg an error is returned:

ValueError: conflicting sizes for dimension 'crossing_time': length 1 on 'cycle_number' and length 15 on {'crossing_time': 'bounding_polygon_lat1'}

While I didn't get all the way to the end of implementing h5coro in icepyx, I did notice the reoccuring theme:

the amount that h5coro can speed up data reading in icepyx is limited by the number of times that icepyx splits up the reads

#676 goes into more depth on some profiling to find that reading a single beam of data creates three separate read requests (three groups). As a parallelization library h5coro would provide the most benefit if we could consolidate our reads into one and let h5coro do the work of splitting that up. All interesting points to consider in the broader conversation about what the Read module in icepyx should look like.

zachghiaccio · 2025-05-21T14:32:44Z

Sorry for the delayed response, but thanks for looking into all of this @rwegener2 ! I'm juggling two science team meetings today, but I'll finally have a bit of time this afternoon to check out your additions, and maybe look into the phony_dims kwarg, since I want to understand that better as well.

JessicaS11 and others added 5 commits September 25, 2023 11:45

Adding h5coro to reader module. Making init recognize changes.

b130b92

resolved conflict in quest-available-datasets

68c4a3c

resolved conflict in init

ec5f418

[pre-commit.ci] auto fixes from pre-commit.com hooks

245fcac

for more information, see https://pre-commit.ci

rwegener2 and others added 5 commits May 7, 2025 17:53

restore quest file to development state

83ae9a4

[pre-commit.ci] auto fixes from pre-commit.com hooks

237871a

for more information, see https://pre-commit.ci

add credentials and update filestring

6ba50c6

Merge branch 'development' of https://github.com/zachghiaccio/icepyx …

49ab30c

…into development

[pre-commit.ci] auto fixes from pre-commit.com hooks

664d517

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

H5Coro testing and integration#678

H5Coro testing and integration#678
zachghiaccio wants to merge 10 commits intoicesat2py:developmentfrom
zachghiaccio:development

zachghiaccio commented Apr 30, 2025

Uh oh!

github-actions bot commented Apr 30, 2025 •

edited

Loading

Uh oh!

rwegener2 commented Apr 30, 2025

Uh oh!

zachghiaccio commented May 1, 2025

Uh oh!

rwegener2 commented May 7, 2025 •

edited

Loading

Uh oh!

zachghiaccio commented May 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zachghiaccio commented Apr 30, 2025

Uh oh!

github-actions bot commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwegener2 commented Apr 30, 2025

Uh oh!

zachghiaccio commented May 1, 2025

Uh oh!

rwegener2 commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zachghiaccio commented May 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Apr 30, 2025 •

edited

Loading

rwegener2 commented May 7, 2025 •

edited

Loading