Conversation
- Add conftest.py with fixtures for environment setup and test data - Add delineation_test.py with tests for: - Basic watershed delineation functionality - Network topology verification (DAG, terminal nodes) - Stream order calculations (Strahler and Shreve) - Network consolidation functionality - Geometry validity tests - Data consistency tests - Snapshot tests for complex outputs using syrupy - Add graph_tools_test.py with unit tests for: - Stream order calculation algorithms - Node insertion and pruning operations - Upstream node finding - River network creation from DataFrames - Add validation_test.py with tests for: - Input CSV validation - Error handling for invalid data - Utility functions Tests use remote hydrology data from public-hydrology-data.upstream.tech. Co-authored-by: sam <sam@upstream.tech>
Test Suite: - Add conftest.py with environment setup and fixtures for remote data - Add delineation_test.py with 20 integration tests covering: - Basic watershed delineation functionality - Network topology (DAG structure, terminal nodes) - Stream order calculations (Strahler and Shreve) - Network consolidation functionality - Geometry validity checks - Data consistency between graph and GeoDataFrames - Snapshot tests for reproducibility - Add graph_tools_test.py with 17 unit tests for: - Stream order calculation algorithms - Node operations (insert, prune) - Upstream node traversal - River network graph creation - Add validation_test.py with 19 unit tests for: - Input CSV validation - Error handling for invalid data - Utility functions Bug Fixes: - Fix pandas 3.0 compatibility: Use object dtype for nextdown column to support mixed int/string IDs (COMIDs are integers, custom outlet IDs can be strings) - Fix polygon_area column initialization to use float instead of int Total: 56 tests passing Co-authored-by: sam <sam@upstream.tech>
|
Cursor Agent can help with this pull request. Just |
- Update calc_area() and calc_length() in util.py to use modern pyproj Transformer API instead of deprecated pyproj.transform() and pyproj.Proj(init=...) syntax - Fix set_crs() calls in delineate.py to use inplace=True so they actually modify the GeoDataFrame - Remove unused 'partial' import from util.py This eliminates FutureWarnings about deprecated pyproj syntax and UserWarnings about CRS not being set during GeoDataFrame concatenation. Co-authored-by: sam <sam@upstream.tech>
|
@cursoragent fix up the comments I left on this PR |
|
I've addressed all the PR review comments. Here's a summary of the changes: Changes Made1. Added custom columns to CSV fixtures (conftest.py)Added 2. Tests verify outlet node names match CSV definitionsAll graph structure tests now verify that:
3. Tests verify CSV attributes are present in outputAdded
4. Fixed geometry validity testChanged from asserting 90% valid to verifying all geometries can be made valid using
5. Updated tests to use multi_subbasin_csvChanged most tests from 6. Added TestDisconnectedBasins classNew test class with 2 tests for disconnected basin graphs:
Test Count: 59 tests (up from 56), all passing |
- Add custom columns (gage_id, priority) to CSV fixtures and verify they're passed through to the result subbasins GeoDataFrame - Add tests verifying outlet node names match CSV definitions (e.g., main_outlet is the terminal node) - Add test verifying CSV attributes (gage_id, priority) are present in the subbasins output - Update geometry validity test to use make_valid() - source MERIT data has some invalid geometries that can be fixed - Update most tests to use multi_subbasin_csv for stronger coverage - Add new TestDisconnectedBasins class with tests for: - Two separate outlets creating two disconnected river systems - Upstream points correctly connected to their respective outlets - Verifying weakly connected components match number of outlets - Update nextdown consistency test to focus on graph structure invariants (out_degree <= 1, terminal nodes have nextdown=0) Co-authored-by: sam <sam@upstream.tech>
| subbasins_gdf["nextdown"] = 0 | ||
| # Use object dtype to support mixed int/string IDs | ||
| subbasins_gdf["nextdown"] = None | ||
| subbasins_gdf["nextdown"] = subbasins_gdf["nextdown"].astype(object) |
There was a problem hiding this comment.
I think we can remove the nextdown logic here entirely (from this gdf and myrivers_gdf).. I think it's used to create the graph from the initial delineation (before we consolidate), but then from here onward I don't think it's used. We don't use it outside the delineator, so I think if the other tests in this PR pass, we can remove them


Backfill comprehensive tests for watershed delineation and fix Pandas 3.0 compatibility issues.
The existing codebase lacked tests, making it difficult to verify correct behavior and prevent regressions. This PR introduces a comprehensive test suite covering various hydrologic scenarios (single outlet, multi-subbasin, headwaters, consolidation, network topology, stream orders, geometry validity, data consistency) and uses snapshot testing for complex outputs. During test development, Pandas 3.0 compatibility issues were identified and fixed, specifically regarding
polygon_areainitialization andnextdowncolumn dtype handling to support mixed integer COMIDs and string custom outlet IDs.