NGWPC PI-7 PR by dylanlee · Pull Request #2 · NOAA-OWP/hand-index

dylanlee · 2025-09-30T14:59:30Z

This PR adds the code associated with NGWPC's hand-index repository to OWP. A HAND index allows for indexing a directory of HAND outputs in a way that allows for efficient spatial querying. The repository contains a script to create HAND indexes, an example query, and a schema describing the files necessary to inundate HAND REM's for the FIM100 version of HAND.

1st draft of schema

… script and will be referenced from the Hydrotables table now

…f HAND versions back in

…ories

Load script

metrics table linked to hydrotable through benchmark ROI and catchment geometries instead of huc8s. This simplifies things.

…e and discharge are now arrays which should save even more space since its a big reduction in rows

…l file with table schemas

…out not finding files

…g because uuids weren't unique per row

…3 or local since only the portion of the path after the hand version used generate uuid

…ngths before

…f strict data contract for hydrotable schema. We will need to meet with Fernando, Brad, and Rob about this

…anch, use pandas for more table handling

…n script and reduce repetition

…olumn

Refactor batch insert and data communication with the .ddb file. Simplified hydrotable handling. Code readiblity improvements. Improved memory management.

…rtitioned files

h3 extension function to lookup cells covering polygon buggy so getting rid of lookup table. Spatial partitioning catchments and hydrotables by h3 id still seems to achieve speedups

Revised load.py so that it passes through the schema defined in the schema directory for a given version of HAND. It will aggregate into Hydrotable by hydroid depending on if the column in the duckdb Hydrotable schema has an array type or not. Updated Readme. Created a script analyze_hydrotable_columns.py that should be able to give you an indication of if a column should become an array type or not

* Update code to not leak memory load.py has been updated to use duckdb for catchment gpkg processing and for hydrotable aggregation. The query and schema have also been updated in some places when necessary to work with minor differences in the parquet files produced this way compared to the previous parquet files * Remove example-hydrotable.csv * optimize centroid calc * Merge geometries * Remove insertion order setting since closing connection now * Reduce default batch size and document sizing guidelines * Fix tolerance to be 100m * Update gitignore * Update load.py comments * Add staging for hydrotables. Remove json columns. * Add staging for catchments and modify raster table schema * Replace WKT writing with WKB writing for catchment geometries * Add more explanatory comments * Remove broken batch insertion print statements and estimates * Fix broken row insertion counts printout after table creation * Add branch lookup table to improve performance Create a branch lookup table that is used during hydrotable creation. This table is indexed by branch so should be faster than using the original catchment table. * Aggregating before joining in hydrotables * Flush staging hydrotable to avoid runing out of memory * Fix parquet index geometry handling to deal with wkb format Fixed the geometry loading in the test query script and in the visualization script * Fix new load.py new partition_tables_to_parquet partition_tables_to_parquet now partitions files one H3 index at a time to avoid running out of memory. Was previously not actually writing a file so query wasn't running sucessfully * Squash merge feature/simplify-hydrotable into fix/mem-leak. This gets rid of complex hydrotable handling in favor of just passing refs to the existing hydrotable csv's. This change reduced code complexity alot in this repo while requiring only minor changes to the autoeval coordinator.

* Update README, remove uneccessary arguments from load.py * Proofread README * Update README.md Minor syntax updates * Update README.md * update README to clarify .env creation * Remove .env from tracking --------- Co-authored-by: Brad <bradford.bates@ertcorp.com> Co-authored-by: Parallel Works app-run user <dylan.lee@mgmt-dylanlee-oefimbenchmarkstac-00067.optimizationuseast1-5.pw.local>

Merge staging main with hand index commit history into OWP main branch with template files

DJackson2313 · 2025-09-30T21:17:28Z

SWCM witness approval; release concurrence.

dylanlee and others added 30 commits January 23, 2025 12:13

Initial commit

ad1fa97

1st draft of schema

1fd5e68

add schema diagram

4680aab

revised schema. added draft loader script

7fc1ddb

working loading script and prototype query for hand inundate process

8d428e5

change name of inundate query

072ad9b

Merge pull request #1 from dylanlee/schema-draft

4dc579c

1st draft of schema

got rid of HAND_Versions table. HAND version will be supplied by load…

eb35356

… script and will be referenced from the Hydrotables table now

version of script able to index a HAND version on s3. Added a table o…

54e24df

…f HAND versions back in

added requirements.txt

dfcac4e

changed nwm feature id type to bigint instead of int

17bb94f

finished casting all tables nwm_feature_id's to bigints in schema

86ed287

added transactions to deal with incomplete data in some branch direct…

78f054b

…ories

tweaked inundate query

3e9043e

load script bug fix

840aad8

Merge pull request NOAA-OWP#2 from dylanlee/load-script

40a15fa

Load script

tweaked inundate-query

e20e514

query now emits lake ids since that is needed in inundate

33a9715

query now emits the feature id associated with each hydro id

ed1bdf9

changed raster_pairs to raster_pair in inundate query json output

d9c4914

Update hand-db.sql

5d79070

metrics table linked to hydrotable through benchmark ROI and catchment geometries instead of huc8s. This simplifies things.

Update hand-db.sql to tweak last schema update

576d834

only loading in absolutely necessary columns to hydrotables now. stag…

fcbc2e0

…e and discharge are now arrays which should save even more space since its a big reduction in rows

Merge branch 'main' of https://github.com/dylanlee/hand-metadata-db

7f1cb09

converted load script to making parquet tables validated off of a yam…

6ec64fd

…l file with table schemas

multi-threaded loading. Bug fixes in creating geoparquet. Warnings ab…

979ee51

…out not finding files

successful multiprocess indexing of the WBT 3m UAT HAND run. Rerunnin…

a59f857

…g because uuids weren't unique per row

make it so that uuid's for a hand run should be the same whether on s…

b5ad792

…3 or local since only the portion of the path after the hand version used generate uuid

added temp directory cleanup

e3cafd2

presort hydrotable rows by stage before aggregating

5f6a625

dylanlee and others added 26 commits June 16, 2025 14:29

fixed aggregation bug. Aggregated arrays could have been different le…

7846d47

…ngths before

fixed error loading geometry with execute many. Decided on strategy o…

f4aa28f

…f strict data contract for hydrotable schema. We will need to meet with Fernando, Brad, and Rob about this

add list of hydrotable columns to treat as numeric

5bfcf08

remove hydrotable row sorting in hyrdotable prep function

e2bfbe0

some light performance optimizations. Stop doing some things every br…

cbd1206

…anch, use pandas for more table handling

more light performance optimizations

973b1fa

refactored partitioning. Added additional utility functions to shorte…

c14cbf7

…n script and reduce repetition

switch to loading wkb instead of wkt into catchments table geometry c…

9077e34

…olumn

fix broken table reference in partitioning sql

25d3f79

git rid of explicit garbage collection at branch level. Was very slow

0d1b60d

refix broken catchments table ref

ba27885

Merge pull request #3 from NGWPC/refactor/consumer-producer-pattern

c5d90eb

Refactor batch insert and data communication with the .ddb file. Simplified hydrotable handling. Code readiblity improvements. Improved memory management.

modified query. Not using the lookup table is still pretty fast on pa…

f98fdd4

…rtitioned files

remove lookup table creation

1b24464

h3 extension function to lookup cells covering polygon buggy so getting rid of lookup table. Spatial partitioning catchments and hydrotables by h3 id still seems to achieve speedups

Add calb argument to process calibrated hydrotables

a1567cf

Update README, remove uneccessary arguments from load.py

98cffcd

Update README, remove uneccessary arguments from load.py (#6)

8f08c63

Proofread README

2c36136

Merge branch 'main' into Readme

789a0e8

Add workflow to build hand-index container

df67b94

Merge branch 'main' into pi-7-deliverables

caf2374

Merge branch 'pi-7-deliverables'

c391828

Merge pull request #1 from NGWPC/main-for-pr

f438465

Merge staging main with hand index commit history into OWP main branch with template files

dylanlee marked this pull request as draft September 30, 2025 15:01

dylanlee marked this pull request as ready for review September 30, 2025 19:18

CarsonPruitt-NOAA added this to NGWPC Oct 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NGWPC PI-7 PR#2

NGWPC PI-7 PR#2
dylanlee wants to merge 75 commits intoNOAA-OWP:mainfrom
NGWPC:main

dylanlee commented Sep 30, 2025

Uh oh!

DJackson2313 commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dylanlee commented Sep 30, 2025

Uh oh!

DJackson2313 commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants