Skip to content

NGWPC PI-7 PR#2

Open
dylanlee wants to merge 75 commits intoNOAA-OWP:mainfrom
NGWPC:main
Open

NGWPC PI-7 PR#2
dylanlee wants to merge 75 commits intoNOAA-OWP:mainfrom
NGWPC:main

Conversation

@dylanlee
Copy link
Copy Markdown

This PR adds the code associated with NGWPC's hand-index repository to OWP. A HAND index allows for indexing a directory of HAND outputs in a way that allows for efficient spatial querying. The repository contains a script to create HAND indexes, an example query, and a schema describing the files necessary to inundate HAND REM's for the FIM100 version of HAND.

dylanlee and others added 30 commits January 23, 2025 12:13
… script and will be referenced from the Hydrotables table now
metrics table linked to hydrotable through benchmark ROI and catchment geometries instead of huc8s. This simplifies things.
…e and discharge are now arrays which should save even more space since its a big reduction in rows
…3 or local since only the portion of the path after the hand version used generate uuid
dylanlee and others added 26 commits June 16, 2025 14:29
…f strict data contract for hydrotable schema. We will need to meet with Fernando, Brad, and Rob about this
Refactor batch insert and data communication with the .ddb file. Simplified hydrotable handling. Code readiblity improvements. Improved memory management.
h3 extension function to lookup cells covering polygon buggy so getting
rid of lookup table. Spatial partitioning catchments and hydrotables by
h3 id still seems to achieve speedups
Revised load.py so that it passes through the schema defined in the
schema directory for a given version of HAND. It will aggregate into
Hydrotable by hydroid depending on if the column in the duckdb
Hydrotable schema has an array type or not.

Updated Readme.

Created a script analyze_hydrotable_columns.py that should be able to
give you an indication of if a column should become an array type or not
* Update code to not leak memory

load.py has been updated to use duckdb for catchment gpkg processing and
for hydrotable aggregation. The query and schema have also been updated
in some places when necessary to work with minor differences in the
parquet files produced this way compared to the previous parquet files

* Remove example-hydrotable.csv

* optimize centroid calc

* Merge geometries

* Remove insertion order setting since closing connection now

* Reduce default batch size and document sizing guidelines

* Fix tolerance to be 100m

* Update gitignore

* Update load.py comments

* Add staging for hydrotables. Remove json columns.

* Add staging for catchments and modify raster table schema

* Replace WKT writing with WKB writing for catchment geometries

* Add more explanatory comments

* Remove broken batch insertion print statements and estimates

* Fix broken row insertion counts printout after table creation

* Add branch lookup table to improve performance

Create a branch lookup table that is used during hydrotable creation.
This table is indexed by branch so should be faster than using the
original catchment table.

* Aggregating before joining in hydrotables

* Flush staging hydrotable to avoid runing out of memory

* Fix parquet index geometry handling to deal with wkb format

Fixed the geometry loading in the test query script and in the
visualization script

* Fix new load.py new partition_tables_to_parquet

partition_tables_to_parquet now partitions files one H3 index at a time
to avoid running out of memory. Was previously not actually writing a
file so query wasn't running sucessfully

* Squash merge feature/simplify-hydrotable into fix/mem-leak. This gets rid of complex hydrotable handling in favor of just passing refs to the existing hydrotable csv's. This change reduced code complexity alot in this repo while requiring only minor changes to the autoeval coordinator.
* Update README, remove uneccessary arguments from load.py

* Proofread README

* Update README.md

Minor syntax updates

* Update README.md

* update README to clarify .env creation

* Remove .env from tracking

---------

Co-authored-by: Brad <bradford.bates@ertcorp.com>
Co-authored-by: Parallel Works app-run user <dylan.lee@mgmt-dylanlee-oefimbenchmarkstac-00067.optimizationuseast1-5.pw.local>
Merge staging main with hand index commit history into OWP main branch with template files
@dylanlee dylanlee marked this pull request as draft September 30, 2025 15:01
@dylanlee dylanlee marked this pull request as ready for review September 30, 2025 19:18
@DJackson2313
Copy link
Copy Markdown

SWCM witness approval; release concurrence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants