A Python implementation of watershed delineation algorithms with benchmarking infrastructure. FLOWFINDER provides reliable watershed boundary extraction from Digital Elevation Models (DEMs) using standard hydrological algorithms, along with performance monitoring and validation tools.
Current Status: Core watershed delineation functionality is complete and tested. Multi-tool benchmarking and research framework components are in development.
What does FLOWFINDER currently provide?
- Reliable Watershed Delineation: Fast, accurate watershed boundary extraction from DEM data
- Performance Monitoring: Built-in timing, memory usage, and accuracy tracking
- Validation Tools: Topology checking and quality assessment for generated watersheds
- Python & CLI Access: Both programmatic API and command-line interface
Future Development Goals:
- Multi-tool comparison framework (TauDEM, GRASS, WhiteboxTools integration)
- Systematic benchmarking across different terrain types
- Research-grade validation studies
- Geographic specialization optimization
FLOWFINDER implements standard hydrological algorithms using modern Python scientific libraries:
- Flow Direction: D8 algorithm with priority-flood depression filling
- Flow Accumulation: Topological sorting (Kahn's algorithm) for O(n) performance
- Watershed Extraction: Upstream tracing from pour points
- Polygon Creation: Morphological operations with boundary tracing
- Real-time performance monitoring (runtime, memory usage)
- Topology validation (geometry validity, containment checks)
- Accuracy assessment tools (when ground truth available)
- Comprehensive error handling and logging
- Python 3.8+
- DEM data in GeoTIFF format
- 4GB+ RAM recommended for processing large datasets
- Additional watershed tools for comparison:
- TauDEM (requires Docker)
- GRASS GIS
- WhiteboxTools
- Ground truth watershed boundaries for validation
# Clone the repository
git clone <repository-url>
cd flowfinder
# Install dependencies
pip install -e .[dev]
# Install FLOWFINDER
pip install flowfinder
# Copy environment template (if it exists)
cp .env.example .env || echo "Create .env file with your data paths"
# Edit .env with your data paths and configurationThe system uses a hierarchical configuration architecture to manage complexity:
# Configuration structure is already set up:
# Environment-specific configurations
config/environments/development.yaml # Local dev (10 basins)
config/environments/testing.yaml # CI/testing (50 basins)
config/environments/production.yaml # Full-scale (500+ basins)
# Tool-specific configurations
config/tools/flowfinder.yaml # FLOWFINDER settings
config/tools/taudem.yaml # TauDEM MPI settings
config/tools/grass.yaml # GRASS r.watershed settings
config/tools/whitebox.yaml # WhiteboxTools settingsPlace your input datasets in the data/ directory:
data/
βββ huc12_mountain_west.shp # HUC12 boundaries for Mountain West
βββ nhd_hr_catchments.shp # NHD+ HR catchment polygons
βββ nhd_flowlines.shp # NHD+ HR flowlines
βββ dem_10m.tif # 10m DEM mosaic or tiles
from flowfinder import FlowFinder
# Initialize with DEM
with FlowFinder("path/to/dem.tif") as ff:
# Delineate watershed from a pour point
watershed, metrics = ff.delineate_watershed(lat=40.0, lon=-105.0)
print(f"Watershed area: {watershed.area:.6f} degreesΒ²")# Delineate watershed
python -m flowfinder.cli delineate --dem dem.tif --lat 40.0 --lon -105.0 --output watershed.geojson
# Validate DEM
python -m flowfinder.cli validate --dem dem.tif
# Get DEM info
python -m flowfinder.cli info --dem dem.tif# Run basic benchmark with FLOWFINDER
python scripts/benchmark_runner.py \
--environment development \
--tools flowfinder \
--outdir results/βββ README.md # Project overview + setup
βββ requirements.txt # Python dependencies
βββ pyproject.toml # Modern Python project config
βββ .env.example # Environment template
βββ .gitignore # Standard Python gitignore
β
βββ config/ # Hierarchical configuration system
β βββ base.yaml # Foundation configurations
β βββ configuration_manager.py # Configuration inheritance system
β βββ schema.json # JSON Schema validation
β βββ environments/ # Environment-specific settings
β β βββ development.yaml # Local development (10 basins)
β β βββ testing.yaml # CI/testing (50 basins)
β β βββ production.yaml # Full-scale (500+ basins)
β βββ tools/ # Tool-specific configurations
β βββ flowfinder.yaml # FLOWFINDER settings
β βββ taudem.yaml # TauDEM MPI settings
β βββ grass.yaml # GRASS r.watershed settings
β βββ whitebox.yaml # WhiteboxTools settings
β
βββ scripts/ # Core benchmark scripts
β βββ basin_sampler.py # Stratified basin sampling
β βββ truth_extractor.py # Truth polygon extraction
β βββ benchmark_runner.py # FLOWFINDER accuracy testing
β βββ watershed_experiment_runner.py # Multi-tool comparison
β βββ validation_tools.py # Validation utilities
β βββ backup/ # Deprecated scripts and files
β
βββ data/ # Input datasets (gitignored)
β βββ test_outputs/ # Test result files
βββ results/ # Output directory (gitignored)
βββ tests/ # Unit and integration tests
β βββ integration/ # Integration test suite
βββ docs/ # Research and technical documentation
β βββ README.md # Documentation index
β βββ PIPELINE.md # Pipeline orchestrator guide
β βββ user_guide/ # User documentation and guides
β βββ architecture/ # System architecture documentation
β βββ test_coverage/ # Test coverage documentation
β βββ development/ # Development notes and status reports
β
βββ notebooks/ # Jupyter exploration
βββ benchmark_analysis.ipynb
The system uses a hierarchical configuration architecture to manage complexity across different tools and environments:
Base Configurations β Environment β Tool β Local Overrides
# Development FLOWFINDER experiment
inherits:
- "base/regions.yaml#mountain_west_minimal"
- "base/quality_standards.yaml#development_grade"
- "environments/development.yaml"
- "tools/flowfinder/base.yaml"
- "experiments/accuracy_comparison.yaml"
overrides:
basin_sampling:
n_per_stratum: 1 # Minimal for dev
benchmark:
timeout_seconds: 30 # Quick timeoutclass ToolAdapter(ABC):
@abstractmethod
def delineate_watershed(self, pour_point: Point, dem_path: str) -> Tuple[Polygon, Dict]:
"""Delineate watershed and return polygon + performance metrics"""
pass
@abstractmethod
def is_available(self) -> bool:
"""Check if tool is available on system"""
pass- Watershed Polygons: GeoJSON or Shapefile format
- Performance Metrics: Runtime, memory usage, processing rate
- Quality Assessment: Topology validation, accuracy scores
- Detailed Logs: Complete processing history and diagnostics
benchmark_results.json: Detailed per-watershed metricsaccuracy_summary.csv: Tabular results for analysisperformance_comparison.csv: Runtime and memory comparisons
| Feature | Status | Notes |
|---|---|---|
| Basic watershed delineation | β Implemented | Tested and working |
| D8 flow direction | β Implemented | With depression filling |
| Flow accumulation | β Implemented | O(n) topological sorting |
| Performance monitoring | β Implemented | Runtime, memory tracking |
| Python API | β Implemented | Full functionality |
| CLI interface | β Implemented | Basic commands available |
| Topology validation | β Implemented | Geometry checking |
| Feature | Timeline | Status |
|---|---|---|
| Multi-tool integration | Future release | π Planned |
| Batch processing | Next minor version | π In Development |
| Advanced algorithms | Future release | π Planned |
| Performance optimization | Ongoing | π Continuous |
| Documentation improvements | Next patch | π In Progress |
# Run unit tests
python -m pytest tests/
# Run integration tests
python -m pytest tests/integration/
# Test configuration system
python tests/integration/test_configuration_system.py
# Test multi-tool integration
python tests/integration/test_integration.py
# Run with coverage
python -m pytest tests/ --cov=scripts --cov-report=htmlUse the Jupyter notebook for detailed analysis:
# Start Jupyter
jupyter lab notebooks/
# Open benchmark_analysis.ipynb for interactive exploration- β Core Algorithm Implementation: D8 flow direction, flow accumulation, watershed extraction
- β Python API: Complete programmatic interface
- β Basic CLI: Command-line watershed delineation
- β Performance Monitoring: Runtime and memory tracking
- β Validation Tools: Topology checking and quality assessment
- π Batch Processing: Process multiple pour points efficiently
- π Output Formats: Additional export options (KML, WKT)
- π Configuration Improvements: Better parameter management
- π Documentation: Comprehensive user guide and API reference
- π Multi-tool Integration: TauDEM, GRASS, WhiteboxTools comparison (requires external tool installation)
- π Advanced Algorithms: D-infinity, multiple flow direction methods
- π Performance Optimization: Parallel processing, memory optimization
- π Research Framework: Systematic validation studies (academic collaboration needed)
π Complete Documentation - Full documentation index
- Setup Guide - Installation and environment setup
- Pipeline Guide - Running benchmarks and workflows
- Data Specification - Data sources and requirements
- Configuration Examples - Configuration system usage
- Architecture Overview - System design and architecture
- Configuration Architecture - Hierarchical configuration system
- Multi-Tool Framework - Multi-tool comparison framework
- Test Coverage - Comprehensive testing documentation
We welcome contributions from the research community:
- Fork the repository
- Create a feature branch (
git checkout -b feature/research-improvement) - Commit your changes (
git commit -m 'Add research improvement') - Push to the branch (
git push origin feature/research-improvement) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- USGS for NHD+ HR and 3DEP data
- FLOWFINDER development team
- Open source geospatial community
- Academic research community for feedback and validation
For research questions and technical issues:
- Check the documentation
- Review the Research Roadmap
- See the Multi-Tool Integration Strategy
- Open an issue on GitHub
FLOWFINDER: Reliable Python watershed delineation with performance monitoring and validation tools.
