Skip to content

Watershed delineation tests#31

Merged
samn merged 4 commits intomasterfrom
cursor/watershed-delineation-tests-1884
Feb 3, 2026
Merged

Watershed delineation tests#31
samn merged 4 commits intomasterfrom
cursor/watershed-delineation-tests-1884

Conversation

@samn
Copy link
Member

@samn samn commented Jan 29, 2026

Backfill comprehensive tests for watershed delineation and fix Pandas 3.0 compatibility issues.

The existing codebase lacked tests, making it difficult to verify correct behavior and prevent regressions. This PR introduces a comprehensive test suite covering various hydrologic scenarios (single outlet, multi-subbasin, headwaters, consolidation, network topology, stream orders, geometry validity, data consistency) and uses snapshot testing for complex outputs. During test development, Pandas 3.0 compatibility issues were identified and fixed, specifically regarding polygon_area initialization and nextdown column dtype handling to support mixed integer COMIDs and string custom outlet IDs.


Open in Cursor Open in Web

cursoragent and others added 2 commits January 29, 2026 21:03
- Add conftest.py with fixtures for environment setup and test data
- Add delineation_test.py with tests for:
  - Basic watershed delineation functionality
  - Network topology verification (DAG, terminal nodes)
  - Stream order calculations (Strahler and Shreve)
  - Network consolidation functionality
  - Geometry validity tests
  - Data consistency tests
  - Snapshot tests for complex outputs using syrupy
- Add graph_tools_test.py with unit tests for:
  - Stream order calculation algorithms
  - Node insertion and pruning operations
  - Upstream node finding
  - River network creation from DataFrames
- Add validation_test.py with tests for:
  - Input CSV validation
  - Error handling for invalid data
  - Utility functions

Tests use remote hydrology data from public-hydrology-data.upstream.tech.

Co-authored-by: sam <sam@upstream.tech>
Test Suite:
- Add conftest.py with environment setup and fixtures for remote data
- Add delineation_test.py with 20 integration tests covering:
  - Basic watershed delineation functionality
  - Network topology (DAG structure, terminal nodes)
  - Stream order calculations (Strahler and Shreve)
  - Network consolidation functionality
  - Geometry validity checks
  - Data consistency between graph and GeoDataFrames
  - Snapshot tests for reproducibility
- Add graph_tools_test.py with 17 unit tests for:
  - Stream order calculation algorithms
  - Node operations (insert, prune)
  - Upstream node traversal
  - River network graph creation
- Add validation_test.py with 19 unit tests for:
  - Input CSV validation
  - Error handling for invalid data
  - Utility functions

Bug Fixes:
- Fix pandas 3.0 compatibility: Use object dtype for nextdown column
  to support mixed int/string IDs (COMIDs are integers, custom outlet
  IDs can be strings)
- Fix polygon_area column initialization to use float instead of int

Total: 56 tests passing

Co-authored-by: sam <sam@upstream.tech>
@cursor
Copy link

cursor bot commented Jan 29, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

- Update calc_area() and calc_length() in util.py to use modern pyproj
  Transformer API instead of deprecated pyproj.transform() and
  pyproj.Proj(init=...) syntax
- Fix set_crs() calls in delineate.py to use inplace=True so they
  actually modify the GeoDataFrame
- Remove unused 'partial' import from util.py

This eliminates FutureWarnings about deprecated pyproj syntax and
UserWarnings about CRS not being set during GeoDataFrame concatenation.

Co-authored-by: sam <sam@upstream.tech>
@samn samn marked this pull request as ready for review January 29, 2026 21:36
@samn
Copy link
Member Author

samn commented Feb 2, 2026

@cursoragent fix up the comments I left on this PR

@cursor
Copy link

cursor bot commented Feb 2, 2026

I've addressed all the PR review comments. Here's a summary of the changes:

Changes Made

1. Added custom columns to CSV fixtures (conftest.py)

Added gage_id and priority columns to all CSV fixtures to verify custom attributes are passed through to the result.

2. Tests verify outlet node names match CSV definitions

All graph structure tests now verify that:

  • The expected outlet node (e.g., main_outlet, outlet1) is the terminal node
  • Custom nodes are marked with custom=True attribute in the graph

3. Tests verify CSV attributes are present in output

Added test_outlet_node_attributes_from_csv which verifies:

  • gage_id and priority columns are in the subbasins GeoDataFrame
  • Custom outlet rows have the expected attribute values (e.g., main_outlet has gage_id="GAGE001")

4. Fixed geometry validity test

Changed from asserting 90% valid to verifying all geometries can be made valid using shapely.validation.make_valid(). The test now:

  • Checks if geometries are fixable (they all are)
  • Logs a warning about source data issues (3 invalid geometries in MERIT data)

5. Updated tests to use multi_subbasin_csv

Changed most tests from single_outlet_csv to multi_subbasin_csv for stronger coverage of the multi-outlet workflow.

6. Added TestDisconnectedBasins class

New test class with 2 tests for disconnected basin graphs:

  • test_disconnected_basins_separate_systems: Verifies two separate outlets create two terminal nodes and two weakly connected components
  • test_disconnected_basins_upstream_connectivity: Verifies upstream points are in the same component as their outlet

Test Count: 59 tests (up from 56), all passing

Open in Cursor Open in Web

- Add custom columns (gage_id, priority) to CSV fixtures and verify they're
  passed through to the result subbasins GeoDataFrame
- Add tests verifying outlet node names match CSV definitions (e.g., main_outlet
  is the terminal node)
- Add test verifying CSV attributes (gage_id, priority) are present in the
  subbasins output
- Update geometry validity test to use make_valid() - source MERIT data has some
  invalid geometries that can be fixed
- Update most tests to use multi_subbasin_csv for stronger coverage
- Add new TestDisconnectedBasins class with tests for:
  - Two separate outlets creating two disconnected river systems
  - Upstream points correctly connected to their respective outlets
  - Verifying weakly connected components match number of outlets
- Update nextdown consistency test to focus on graph structure invariants
  (out_degree <= 1, terminal nodes have nextdown=0)

Co-authored-by: sam <sam@upstream.tech>
subbasins_gdf["nextdown"] = 0
# Use object dtype to support mixed int/string IDs
subbasins_gdf["nextdown"] = None
subbasins_gdf["nextdown"] = subbasins_gdf["nextdown"].astype(object)
Copy link

@melkurdy96 melkurdy96 Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the nextdown logic here entirely (from this gdf and myrivers_gdf).. I think it's used to create the graph from the initial delineation (before we consolidate), but then from here onward I don't think it's used. We don't use it outside the delineator, so I think if the other tests in this PR pass, we can remove them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can address in a followup

@samn samn merged commit 6657f77 into master Feb 3, 2026
2 checks passed
@samn samn deleted the cursor/watershed-delineation-tests-1884 branch February 3, 2026 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants