Watershed delineation tests by samn · Pull Request #31 · Upstream-Tech/delineator

samn · 2026-01-29T21:15:21Z

Backfill comprehensive tests for watershed delineation and fix Pandas 3.0 compatibility issues.

The existing codebase lacked tests, making it difficult to verify correct behavior and prevent regressions. This PR introduces a comprehensive test suite covering various hydrologic scenarios (single outlet, multi-subbasin, headwaters, consolidation, network topology, stream orders, geometry validity, data consistency) and uses snapshot testing for complex outputs. During test development, Pandas 3.0 compatibility issues were identified and fixed, specifically regarding polygon_area initialization and nextdown column dtype handling to support mixed integer COMIDs and string custom outlet IDs.

- Add conftest.py with fixtures for environment setup and test data - Add delineation_test.py with tests for: - Basic watershed delineation functionality - Network topology verification (DAG, terminal nodes) - Stream order calculations (Strahler and Shreve) - Network consolidation functionality - Geometry validity tests - Data consistency tests - Snapshot tests for complex outputs using syrupy - Add graph_tools_test.py with unit tests for: - Stream order calculation algorithms - Node insertion and pruning operations - Upstream node finding - River network creation from DataFrames - Add validation_test.py with tests for: - Input CSV validation - Error handling for invalid data - Utility functions Tests use remote hydrology data from public-hydrology-data.upstream.tech. Co-authored-by: sam <sam@upstream.tech>

Test Suite: - Add conftest.py with environment setup and fixtures for remote data - Add delineation_test.py with 20 integration tests covering: - Basic watershed delineation functionality - Network topology (DAG structure, terminal nodes) - Stream order calculations (Strahler and Shreve) - Network consolidation functionality - Geometry validity checks - Data consistency between graph and GeoDataFrames - Snapshot tests for reproducibility - Add graph_tools_test.py with 17 unit tests for: - Stream order calculation algorithms - Node operations (insert, prune) - Upstream node traversal - River network graph creation - Add validation_test.py with 19 unit tests for: - Input CSV validation - Error handling for invalid data - Utility functions Bug Fixes: - Fix pandas 3.0 compatibility: Use object dtype for nextdown column to support mixed int/string IDs (COMIDs are integers, custom outlet IDs can be strings) - Fix polygon_area column initialization to use float instead of int Total: 56 tests passing Co-authored-by: sam <sam@upstream.tech>

cursor · 2026-01-29T21:15:22Z

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
_{Learn more about Cursor Agents}

- Update calc_area() and calc_length() in util.py to use modern pyproj Transformer API instead of deprecated pyproj.transform() and pyproj.Proj(init=...) syntax - Fix set_crs() calls in delineate.py to use inplace=True so they actually modify the GeoDataFrame - Remove unused 'partial' import from util.py This eliminates FutureWarnings about deprecated pyproj syntax and UserWarnings about CRS not being set during GeoDataFrame concatenation. Co-authored-by: sam <sam@upstream.tech>

tests/conftest.py

tests/delineation_test.py

samn · 2026-02-02T18:51:26Z

@cursoragent fix up the comments I left on this PR

cursor · 2026-02-02T18:51:33Z

I've addressed all the PR review comments. Here's a summary of the changes:

Changes Made

1. Added custom columns to CSV fixtures (conftest.py)

Added gage_id and priority columns to all CSV fixtures to verify custom attributes are passed through to the result.

2. Tests verify outlet node names match CSV definitions

All graph structure tests now verify that:

The expected outlet node (e.g., main_outlet, outlet1) is the terminal node
Custom nodes are marked with custom=True attribute in the graph

3. Tests verify CSV attributes are present in output

Added test_outlet_node_attributes_from_csv which verifies:

gage_id and priority columns are in the subbasins GeoDataFrame
Custom outlet rows have the expected attribute values (e.g., main_outlet has gage_id="GAGE001")

4. Fixed geometry validity test

Changed from asserting 90% valid to verifying all geometries can be made valid using shapely.validation.make_valid(). The test now:

Checks if geometries are fixable (they all are)
Logs a warning about source data issues (3 invalid geometries in MERIT data)

5. Updated tests to use multi_subbasin_csv

Changed most tests from single_outlet_csv to multi_subbasin_csv for stronger coverage of the multi-outlet workflow.

6. Added TestDisconnectedBasins class

New test class with 2 tests for disconnected basin graphs:

test_disconnected_basins_separate_systems: Verifies two separate outlets create two terminal nodes and two weakly connected components
test_disconnected_basins_upstream_connectivity: Verifies upstream points are in the same component as their outlet

Test Count: 59 tests (up from 56), all passing

- Add custom columns (gage_id, priority) to CSV fixtures and verify they're passed through to the result subbasins GeoDataFrame - Add tests verifying outlet node names match CSV definitions (e.g., main_outlet is the terminal node) - Add test verifying CSV attributes (gage_id, priority) are present in the subbasins output - Update geometry validity test to use make_valid() - source MERIT data has some invalid geometries that can be fixed - Update most tests to use multi_subbasin_csv for stronger coverage - Add new TestDisconnectedBasins class with tests for: - Two separate outlets creating two disconnected river systems - Upstream points correctly connected to their respective outlets - Verifying weakly connected components match number of outlets - Update nextdown consistency test to focus on graph structure invariants (out_degree <= 1, terminal nodes have nextdown=0) Co-authored-by: sam <sam@upstream.tech>

melkurdy96 · 2026-02-03T10:11:44Z

upstream_delineator/delineator_utils/delineate.py

-        subbasins_gdf["nextdown"] = 0
+        # Use object dtype to support mixed int/string IDs
+        subbasins_gdf["nextdown"] = None
+        subbasins_gdf["nextdown"] = subbasins_gdf["nextdown"].astype(object)


I think we can remove the nextdown logic here entirely (from this gdf and myrivers_gdf).. I think it's used to create the graph from the initial delineation (before we consolidate), but then from here onward I don't think it's used. We don't use it outside the delineator, so I think if the other tests in this PR pass, we can remove them

Can address in a followup

tests/conftest.py

cursoragent and others added 2 commits January 29, 2026 21:03

samn marked this pull request as ready for review January 29, 2026 21:36

samn commented Feb 2, 2026

View reviewed changes

melkurdy96 reviewed Feb 3, 2026

View reviewed changes

tests/conftest.py Show resolved Hide resolved

samn merged commit 6657f77 into master Feb 3, 2026
2 checks passed

samn deleted the cursor/watershed-delineation-tests-1884 branch February 3, 2026 16:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Watershed delineation tests#31

Watershed delineation tests#31
samn merged 4 commits intomasterfrom
cursor/watershed-delineation-tests-1884

samn commented Jan 29, 2026

Uh oh!

cursor bot commented Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

samn commented Feb 2, 2026

Uh oh!

cursor bot commented Feb 2, 2026 •

edited

Loading

Uh oh!

melkurdy96 Feb 3, 2026 •

edited

Loading

Uh oh!

samn Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

samn commented Jan 29, 2026

Uh oh!

cursor bot commented Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

samn commented Feb 2, 2026

Uh oh!

cursor bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

1. Added custom columns to CSV fixtures (conftest.py)

2. Tests verify outlet node names match CSV definitions

3. Tests verify CSV attributes are present in output

4. Fixed geometry validity test

5. Updated tests to use multi_subbasin_csv

6. Added TestDisconnectedBasins class

Test Count: 59 tests (up from 56), all passing

Uh oh!

melkurdy96 Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samn Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cursor bot commented Feb 2, 2026 •

edited

Loading

melkurdy96 Feb 3, 2026 •

edited

Loading