Skip to content

A research project exploring watershed delineation accuracy and developing systematic comparison methods for hydrological analysis tools.

License

Notifications You must be signed in to change notification settings

rgdonohue/flowfinder

Repository files navigation

FLOWFINDER: Python Watershed Delineation Tool

A Python implementation of watershed delineation algorithms with benchmarking infrastructure. FLOWFINDER provides reliable watershed boundary extraction from Digital Elevation Models (DEMs) using standard hydrological algorithms, along with performance monitoring and validation tools.

Current Status: Core watershed delineation functionality is complete and tested. Multi-tool benchmarking and research framework components are in development.

flow finder

🎯 Project Goals

What does FLOWFINDER currently provide?

  1. Reliable Watershed Delineation: Fast, accurate watershed boundary extraction from DEM data
  2. Performance Monitoring: Built-in timing, memory usage, and accuracy tracking
  3. Validation Tools: Topology checking and quality assessment for generated watersheds
  4. Python & CLI Access: Both programmatic API and command-line interface

Future Development Goals:

  • Multi-tool comparison framework (TauDEM, GRASS, WhiteboxTools integration)
  • Systematic benchmarking across different terrain types
  • Research-grade validation studies
  • Geographic specialization optimization

πŸ”¬ Technical Background

Core Implementation

FLOWFINDER implements standard hydrological algorithms using modern Python scientific libraries:

  • Flow Direction: D8 algorithm with priority-flood depression filling
  • Flow Accumulation: Topological sorting (Kahn's algorithm) for O(n) performance
  • Watershed Extraction: Upstream tracing from pour points
  • Polygon Creation: Morphological operations with boundary tracing

Validation & Quality Assurance

  • Real-time performance monitoring (runtime, memory usage)
  • Topology validation (geometry validity, containment checks)
  • Accuracy assessment tools (when ground truth available)
  • Comprehensive error handling and logging

πŸ“‹ Prerequisites

Required

  • Python 3.8+
  • DEM data in GeoTIFF format
  • 4GB+ RAM recommended for processing large datasets

Optional (for development/benchmarking)

  • Additional watershed tools for comparison:
    • TauDEM (requires Docker)
    • GRASS GIS
    • WhiteboxTools
  • Ground truth watershed boundaries for validation

πŸš€ Quick Start

1. Installation

# Clone the repository
git clone <repository-url>
cd flowfinder

# Install dependencies
pip install -e .[dev]

# Install FLOWFINDER
pip install flowfinder

# Copy environment template (if it exists)
cp .env.example .env || echo "Create .env file with your data paths"
# Edit .env with your data paths and configuration

2. Configuration Setup

The system uses a hierarchical configuration architecture to manage complexity:

# Configuration structure is already set up:

# Environment-specific configurations
config/environments/development.yaml    # Local dev (10 basins)
config/environments/testing.yaml        # CI/testing (50 basins)
config/environments/production.yaml     # Full-scale (500+ basins)

# Tool-specific configurations
config/tools/flowfinder.yaml            # FLOWFINDER settings
config/tools/taudem.yaml               # TauDEM MPI settings
config/tools/grass.yaml                # GRASS r.watershed settings
config/tools/whitebox.yaml             # WhiteboxTools settings

3. Data Preparation

Place your input datasets in the data/ directory:

data/
β”œβ”€β”€ huc12_mountain_west.shp    # HUC12 boundaries for Mountain West
β”œβ”€β”€ nhd_hr_catchments.shp      # NHD+ HR catchment polygons
β”œβ”€β”€ nhd_flowlines.shp          # NHD+ HR flowlines
└── dem_10m.tif               # 10m DEM mosaic or tiles

4. Basic Usage

Python API

from flowfinder import FlowFinder

# Initialize with DEM
with FlowFinder("path/to/dem.tif") as ff:
    # Delineate watershed from a pour point
    watershed, metrics = ff.delineate_watershed(lat=40.0, lon=-105.0)
    print(f"Watershed area: {watershed.area:.6f} degreesΒ²")

Command Line Interface

# Delineate watershed
python -m flowfinder.cli delineate --dem dem.tif --lat 40.0 --lon -105.0 --output watershed.geojson

# Validate DEM
python -m flowfinder.cli validate --dem dem.tif

# Get DEM info
python -m flowfinder.cli info --dem dem.tif

5. Benchmarking (Experimental)

⚠️ Note: Multi-tool comparison features are currently in development. Some components may use mock data for testing infrastructure.

# Run basic benchmark with FLOWFINDER
python scripts/benchmark_runner.py \
    --environment development \
    --tools flowfinder \
    --outdir results/

πŸ“ Project Structure

β”œβ”€β”€ README.md                    # Project overview + setup
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ pyproject.toml              # Modern Python project config
β”œβ”€β”€ .env.example                # Environment template
β”œβ”€β”€ .gitignore                  # Standard Python gitignore
β”‚
β”œβ”€β”€ config/                     # Hierarchical configuration system
β”‚   β”œβ”€β”€ base.yaml              # Foundation configurations
β”‚   β”œβ”€β”€ configuration_manager.py # Configuration inheritance system
β”‚   β”œβ”€β”€ schema.json            # JSON Schema validation
β”‚   β”œβ”€β”€ environments/           # Environment-specific settings
β”‚   β”‚   β”œβ”€β”€ development.yaml   # Local development (10 basins)
β”‚   β”‚   β”œβ”€β”€ testing.yaml       # CI/testing (50 basins)
β”‚   β”‚   └── production.yaml    # Full-scale (500+ basins)
β”‚   └── tools/                  # Tool-specific configurations
β”‚       β”œβ”€β”€ flowfinder.yaml    # FLOWFINDER settings
β”‚       β”œβ”€β”€ taudem.yaml        # TauDEM MPI settings
β”‚       β”œβ”€β”€ grass.yaml         # GRASS r.watershed settings
β”‚       └── whitebox.yaml      # WhiteboxTools settings
β”‚
β”œβ”€β”€ scripts/                    # Core benchmark scripts
β”‚   β”œβ”€β”€ basin_sampler.py       # Stratified basin sampling
β”‚   β”œβ”€β”€ truth_extractor.py     # Truth polygon extraction
β”‚   β”œβ”€β”€ benchmark_runner.py    # FLOWFINDER accuracy testing
β”‚   β”œβ”€β”€ watershed_experiment_runner.py # Multi-tool comparison
β”‚   β”œβ”€β”€ validation_tools.py    # Validation utilities
β”‚   └── backup/                # Deprecated scripts and files
β”‚
β”œβ”€β”€ data/                       # Input datasets (gitignored)
β”‚   └── test_outputs/          # Test result files
β”œβ”€β”€ results/                    # Output directory (gitignored)
β”œβ”€β”€ tests/                      # Unit and integration tests
β”‚   └── integration/           # Integration test suite
β”œβ”€β”€ docs/                       # Research and technical documentation
β”‚   β”œβ”€β”€ README.md              # Documentation index
β”‚   β”œβ”€β”€ PIPELINE.md            # Pipeline orchestrator guide
β”‚   β”œβ”€β”€ user_guide/            # User documentation and guides
β”‚   β”œβ”€β”€ architecture/          # System architecture documentation
β”‚   β”œβ”€β”€ test_coverage/         # Test coverage documentation
β”‚   └── development/           # Development notes and status reports
β”‚
└── notebooks/                  # Jupyter exploration
    └── benchmark_analysis.ipynb

πŸ”§ Configuration Architecture

The system uses a hierarchical configuration architecture to manage complexity across different tools and environments:

Configuration Hierarchy

Base Configurations β†’ Environment β†’ Tool β†’ Local Overrides

Example Configuration Composition

# Development FLOWFINDER experiment
inherits:
  - "base/regions.yaml#mountain_west_minimal"
  - "base/quality_standards.yaml#development_grade"
  - "environments/development.yaml"
  - "tools/flowfinder/base.yaml"
  - "experiments/accuracy_comparison.yaml"

overrides:
  basin_sampling:
    n_per_stratum: 1  # Minimal for dev
  benchmark:
    timeout_seconds: 30  # Quick timeout

Tool Adapter Interface

class ToolAdapter(ABC):
    @abstractmethod
    def delineate_watershed(self, pour_point: Point, dem_path: str) -> Tuple[Polygon, Dict]:
        """Delineate watershed and return polygon + performance metrics"""
        pass

    @abstractmethod
    def is_available(self) -> bool:
        """Check if tool is available on system"""
        pass

πŸ“Š Outputs

FLOWFINDER Results

  • Watershed Polygons: GeoJSON or Shapefile format
  • Performance Metrics: Runtime, memory usage, processing rate
  • Quality Assessment: Topology validation, accuracy scores
  • Detailed Logs: Complete processing history and diagnostics

Benchmarking Outputs (When Available)

  • benchmark_results.json: Detailed per-watershed metrics
  • accuracy_summary.csv: Tabular results for analysis
  • performance_comparison.csv: Runtime and memory comparisons

⚠️ Note: Multi-tool comparison outputs are generated using mock data when external tools are not available.

🎯 Current Performance

Core FLOWFINDER Capabilities

Feature Status Notes
Basic watershed delineation βœ… Implemented Tested and working
D8 flow direction βœ… Implemented With depression filling
Flow accumulation βœ… Implemented O(n) topological sorting
Performance monitoring βœ… Implemented Runtime, memory tracking
Python API βœ… Implemented Full functionality
CLI interface βœ… Implemented Basic commands available
Topology validation βœ… Implemented Geometry checking

Development Roadmap

Feature Timeline Status
Multi-tool integration Future release πŸ“‹ Planned
Batch processing Next minor version πŸ”„ In Development
Advanced algorithms Future release πŸ“‹ Planned
Performance optimization Ongoing πŸ”„ Continuous
Documentation improvements Next patch πŸ”„ In Progress

πŸ§ͺ Testing

# Run unit tests
python -m pytest tests/

# Run integration tests
python -m pytest tests/integration/

# Test configuration system
python tests/integration/test_configuration_system.py

# Test multi-tool integration
python tests/integration/test_integration.py

# Run with coverage
python -m pytest tests/ --cov=scripts --cov-report=html

πŸ“ˆ Analysis

Use the Jupyter notebook for detailed analysis:

# Start Jupyter
jupyter lab notebooks/

# Open benchmark_analysis.ipynb for interactive exploration

🎯 Development Roadmap

Current Release (v0.1.0) - COMPLETED

  • βœ… Core Algorithm Implementation: D8 flow direction, flow accumulation, watershed extraction
  • βœ… Python API: Complete programmatic interface
  • βœ… Basic CLI: Command-line watershed delineation
  • βœ… Performance Monitoring: Runtime and memory tracking
  • βœ… Validation Tools: Topology checking and quality assessment

Next Minor Release (v0.2.0) - PLANNED

  • πŸ“‹ Batch Processing: Process multiple pour points efficiently
  • πŸ“‹ Output Formats: Additional export options (KML, WKT)
  • πŸ“‹ Configuration Improvements: Better parameter management
  • πŸ“‹ Documentation: Comprehensive user guide and API reference

Future Development - UNDER CONSIDERATION

  • πŸ“‹ Multi-tool Integration: TauDEM, GRASS, WhiteboxTools comparison (requires external tool installation)
  • πŸ“‹ Advanced Algorithms: D-infinity, multiple flow direction methods
  • πŸ“‹ Performance Optimization: Parallel processing, memory optimization
  • πŸ“‹ Research Framework: Systematic validation studies (academic collaboration needed)

πŸ“š Documentation

πŸ“– Complete Documentation - Full documentation index

Quick Links

Technical Documentation

🀝 Contributing

We welcome contributions from the research community:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/research-improvement)
  3. Commit your changes (git commit -m 'Add research improvement')
  4. Push to the branch (git push origin feature/research-improvement)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • USGS for NHD+ HR and 3DEP data
  • FLOWFINDER development team
  • Open source geospatial community
  • Academic research community for feedback and validation

πŸ“ž Support

For research questions and technical issues:


FLOWFINDER: Reliable Python watershed delineation with performance monitoring and validation tools.

About

A research project exploring watershed delineation accuracy and developing systematic comparison methods for hydrological analysis tools.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages