Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 55 additions & 55 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,28 +3,28 @@
[![CI](https://github.com/Upstream-Tech/delineator/actions/workflows/ci.yml/badge.svg)](https://github.com/Upstream-Tech/delineator/actions/workflows/ci.yml)

A set of Python scripts for delineating watersheds or drainage basins
using data from [MERIT-Hydro](https://doi.org/10.1029/2019WR024873) and
[MERIT-Basins](https://doi.org/10.1029/2019WR025287).
The script outputs geospatial data for subbasins and river reaches,
and creates a river network graph representation,
which can be useful for watershed modeling or machine learning applications.
using data from [MERIT-Hydro](https://doi.org/10.1029/2019WR024873) and
[MERIT-Basins](https://doi.org/10.1029/2019WR025287).
The script outputs geospatial data for subbasins and river reaches,
and creates a river network graph representation,
which can be useful for watershed modeling or machine learning applications.

This script also lets you subdivide the watershed at specific locations, such as gages, if you provide
additional points that fall inside the main watershed, as shown in red in the image below.

![Subbasins illustration](img/subbasins_map.jpg)


These scripts are a heavily modified fork of [delineator.py](https://github.com/mheberger/delineator).
These scripts are a heavily modified fork of [delineator.py](https://github.com/mheberger/delineator).


# Outputs:

Geodata in a variety of formats -- shapefile, geopackage, GeoJSON, etc., for:

* sub-basin polygons
* sub-basin outlet points
* river reaches
* sub-basin polygons
* sub-basin outlet points
* river reaches

and, optionally:

Expand All @@ -33,18 +33,18 @@ and, optionally:
The network graph can be saved in a variety of formats -- Python NetworkX Graph object (in a pickle file),
JSON, GML, XML, etc.

You can also customize the size of the subbasins to make them larger; see *Outputing larger subbasins* below.
You can also customize the size of the subbasins to make them larger; see *Outputing larger subbasins* below.


# Using these scripts

This repository includes sample data covering Iceland. To delineate watersheds in other
parts of the world, you will need to download datasets from MERIT-Hydro and MERIT-Basins.
This repository includes sample data covering Iceland. To delineate watersheds in other
parts of the world, you will need to download datasets from MERIT-Hydro and MERIT-Basins.
Instructions on how to get the data and run the script are provided below.

To get started, download the latest release from this GitHub repository (or fork the repository).

These scripts require Python 3.12 or later.
These scripts require Python 3.11 or later.

## Setup

Expand Down Expand Up @@ -98,14 +98,14 @@ uv run ruff format .
To run tests (once added):
```bash
uv run pytest
```
```


# Overview of using `subbasins.py`

The major steps are the following, with more detailed instructions below.
To simply try the program with the sample data provided, you can skip to step 5.
When you are ready to try with your own locations, you will need to download additional
To simply try the program with the sample data provided, you can skip to step 5.
When you are ready to try with your own locations, you will need to download additional
data as described in steps 1 and 2.

1. [Export env vars pointing to raster and vector data](#step_env)
Expand All @@ -114,9 +114,9 @@ data as described in steps 1 and 2.
1. [Review output](#step_review)
1. [Run again to fix mistakes](#step_repeat)

Before you begin downloading the data in steps 1 and 2, determine which files you need based on your region of interest.
The data files are organized into continental-scale river basins, or Pfafstetter Level 2 basins.
There are 61 of these basins in total. Basins are identified by a 2-digit code, with values from 11 to 91.
Before you begin downloading the data in steps 1 and 2, determine which files you need based on your region of interest.
The data files are organized into continental-scale river basins, or Pfafstetter Level 2 basins.
There are 61 of these basins in total. Basins are identified by a 2-digit code, with values from 11 to 91.

![MERIT Level 2 Basins](img/merit_level2_basins.jpg)
MERIT Level 2 megabasins
Expand Down Expand Up @@ -145,36 +145,36 @@ export MEGABASINS_PATH="https://example.com/file.shp"

## <a name="step_csv">Create a CSV file with your desired watershed outlet points</a>

The script reads information about your desired watershed outlet points from a
plain-text comma-delimited (CSV) file. Edit this file carefully, as the script will
not run if this file is not formatted correctly.
The script reads information about your desired watershed outlet points from a
plain-text comma-delimited (CSV) file. Edit this file carefully, as the script will
not run if this file is not formatted correctly.

The CSV file **must** contain these 4 required fields or columns.

- **id** - _required_: a unique identifier for your watershed or outlet point,
an alphanumeric string. The id may be any length, but shorter is better.
The script uses the id as the filename for output, so avoid using any
forbidden characters. On Linux, do not use the forward slash /.
an alphanumeric string. The id may be any length, but shorter is better.
The script uses the id as the filename for output, so avoid using any
forbidden characters. On Linux, do not use the forward slash /.
On Windows, the list of forbidden characters is slightly longer (`\< \> : " / \ | ? \*`).
Also, do not use id = 0, since the convention is that 0 is reserved for discharging to
the ocean.

- **lat** - _required_: latitude in decimal degrees of the watershed outlet.
Avoid using a whole number without a decimal in the first row.
Avoid using a whole number without a decimal in the first row.
For example, use 23.0 instead of 23.

- **lng** - _required_: longitude in decimal degrees

- **is_outlet**: If the gage is a watershed outlet, enter `true` or `True`
- **is_outlet**: If the gage is a watershed outlet, enter `true` or `True`
(capitalization does not matter). For intermediate
upstream points that will be sub-basin outlets, enter `false` or `False`.

All latitude and longitude coordinates should be in decimal degrees
All latitude and longitude coordinates should be in decimal degrees
(EPSG: 4326, [https://spatialreference.org/ref/epsg/4326/](https://spatialreference.org/ref/epsg/4326/)).

- The order of the columns does not matter, but the names must be exactly as shown above.

- If you are delineating more than one main watershed, put any subbasin
- If you are delineating more than one main watershed, put any subbasin
outlets immediately after the main outlet, as in the following example:

| id | lat | lng | name | is_outlet |
Expand All @@ -186,15 +186,15 @@ All latitude and longitude coordinates should be in decimal degrees
| algoso | 41.455 | -6.591 | "EN 219 Crossing near Algoso" | false |

In this example, there are two *main* outlets. The first, "foz-tua," has two subbasin
outlets. The second, "baixo-sabor," has one subbasin outlet.
outlets. The second, "baixo-sabor," has one subbasin outlet.

## <a name="step_run">Delineate watersheds</a>

Delineation can be run from the command line, or from Python. Either way, you will need to specify a few arguments:

- `input_csv` (required) - Input CSV filename, for example `outlets.csv`
- `output_prefix` (required) - Output prefix, a string. The output files will start with this string. For
example,
- `output_prefix` (required) - Output prefix, a string. The output files will start with this string. For
example,
if you provide 'shasta', the script will produce `shasta_subbasins.shp`, `shasta_outlets.shp`, etc.
- `config_vals` (optional) - Override default settings. Eg to turn `VERBOSE` logging on or off.

Expand All @@ -213,36 +213,36 @@ Alternatively, you can call the delineation routine from Python:

## <a name="step_review">Review results</a>

The script can output several different geodata formats,
as long as the format is supported by `GeoPandas`. Shapefiles are popular,
but we recommend **GeoPackage**, as it is a more modern and open format.
The script can output several different geodata formats,
as long as the format is supported by `GeoPandas`. Shapefiles are popular,
but we recommend **GeoPackage**, as it is a more modern and open format.
**Feather** is another lightweight, portable data format.
To get a full list of available formats, follow the directions
To get a full list of available formats, follow the directions
[here](https://geopandas.org/en/stable/docs/user_guide/io.html#writing-spatial-data)
(see Supported Drivers).


## <a name="step_repeat">Run again to fix any mistakes</a>

Automated watershed delineation is often incorrect.
The good news is that errors can often be fixed by slightly moving the
Automated watershed delineation is often incorrect.
The good news is that errors can often be fixed by slightly moving the
location of your watershed outlets.

Repeat the above steps to create a new outlets CSV file, or modify your existing file,
Repeat the above steps to create a new outlets CSV file, or modify your existing file,
using revised coordinates. The script will automatically overwrite existing files
without any warning, so first make sure to back up anything you want to save.


# Simplification

The Python routine used to simplify the subbasin polygons (`topojson`) is not perfect.
Sometimes, the output will contain weird overlaps and slivers. If appearances matter,
Sometimes, the output will contain weird overlaps and slivers. If appearances matter,
we recommend setting `SIMPLIFY` to `False` and using external software for simplification.

*Mapshaper* works well. You can use the [web version](https://mapshaper.org/), or you can install and run it from the command line.
Note that mapshaper will only accept shapefiles or geojson as input, and not geopackages or feather files.
*Mapshaper* works well. You can use the [web version](https://mapshaper.org/), or you can install and run it from the command line.
Note that mapshaper will only accept shapefiles or geojson as input, and not geopackages or feather files.

As an alternative, GIS software like QGIS (free) or ArcGIS (commercial) do the job nicely.
As an alternative, GIS software like QGIS (free) or ArcGIS (commercial) do the job nicely.


# Outputting larger subbasins
Expand All @@ -252,22 +252,22 @@ larger subbasins, se `CONSOLIDATE` to `True`.
Then, set a value for `MAX_AREA` in km². This sets the upper limit on the size of subbasins.
The script will merge unit catchments such that the overall structure
and connectivity of the drainage network is maintained. This example shows the subbasins
for the Yellowstone River with different values of `MAX_AREA`.
for the Yellowstone River with different values of `MAX_AREA`.

![Subbasin consolidation illustration](img/consolidation.jpg)

By "rediscretizing" the subbasins with this option, you can reduce their number while increasing their size.
By "rediscretizing" the subbasins with this option, you can reduce their number while increasing their size.
This means that your hydrologic model will be smaller and simpler and probably run faster! 😊

The simplification routine also appears to
The simplification routine also appears to
make the subbasin sizes somewhat more homogeneous. The area of MERIT-Basins unit catchments
is highly variable, and highly skewed, with many very small unit catchments.
After consolidation, the distribution
of subbasin areas tends to be more tightly clustered around the mean,
of subbasin areas tends to be more tightly clustered around the mean,
as indicated by a lower coefficient
of variation (standard deviation divided by the mean). Here are the results of a
little experiment in using different values of `MAX_AREA` for the Yellowstone River
basin in North America. Statistics are for the subbasin areas in km².
basin in North America. Statistics are for the subbasin areas in km².


| MAX_AREA | count | median | mean | std. dev. | CV | skewness |
Expand All @@ -285,9 +285,9 @@ basin in North America. Statistics are for the subbasin areas in km².

More error handling of stress cases is forthcoming.
For example, the MERIT-Basins dataset
has some gaps and slivers of missing data; if your outlet points falls
into one of these locations, it may cause problems. In this case, simply nudge the
location by changing the latitude and/or longitude slightly.
has some gaps and slivers of missing data; if your outlet points falls
into one of these locations, it may cause problems. In this case, simply nudge the
location by changing the latitude and/or longitude slightly.

As another example, if you put two points in your input file that are the same, or very close
to one another, the script will fail and the error messages will not be very helpful. Please
Expand All @@ -300,10 +300,10 @@ make sure all of your points have a little space between them!

To report any bugs, you can create an Issue on this GitHub page.

This code is open source, so if you are motivated to make any
modifications, additions, or bug fixes, you can make a pull request on GitHub.
This code is open source, so if you are motivated to make any
modifications, additions, or bug fixes, you can make a pull request on GitHub.


# Acknowledgments
# Acknowledgments

Thanks to Matthew Heberger who wrote the [original code](https://github.com/mheberger/delineator) built upon here.
Thanks to Matthew Heberger who wrote the [original code](https://github.com/mheberger/delineator) built upon here.
6 changes: 3 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ dependencies = [
"graphviz==0.20.3",
"matplotlib==3.*",
"networkx~=3.0",
"numpy==1.*",
"numpy>=1.26,<3",
"pandas~=2.1",
"pyarrow",
"pyogrio",
Expand All @@ -22,7 +22,7 @@ dependencies = [
name = "upstream_delineator"
version = "0.1.1"
description = "Delineate basins"
requires-python = ">=3.12"
requires-python = ">=3.11"

[dependency-groups]
dev = [
Expand All @@ -39,7 +39,7 @@ addopts = "--strict-markers"
timeout = 90

[tool.ruff]
target-version = "py312"
target-version = "py311"

[tool.ruff.lint]
select = [
Expand Down
Loading