GSI-Performance-Prediction

Welcome to my repository on AI-based modeling for Green Stormwater Infrastructure (GSI) performance prediction. This project implements a configurable, model-agnostic experiment pipeline for training, evaluating, and comparing multiple models under one structure.

Why This Matters

Green Stormwater Infrastructure (GSI) plays a key role in reducing urban flooding, improving water quality, and supporting urban ecosystems. However, traditional models are computationally expensive and hard to generalize. This project aims to overcome these limitations by providing a shared ML/DL framework with model configurability, reproducibility, and consistent diagnostics.

Supported Models

This repository supports training and testing of the following models:

✅ Random Forest (RF) — Non-parametric ensemble learning. ✅ Data-Driven LSTM (LSTM) — Sequence modeling for time-series data. ✅ Physics-Informed LSTM (PILSTM) — Deep learning with physical constraints.

Model selection is fully controlled via config/config.yaml or CLI arguments. The same pipeline now handles:

data loading and validation
storm-based train/test splitting
feature scaling
tabular or sequence feature generation
model training / loading
metrics, residuals, and bootstrap uncertainty
artifact saving

Project Structure

├── config/                  # Experiment and model configuration
├── data/                    # Raw and processed data (DO NOT COMMIT large files)
├── notebooks/               # Exploration notebooks; no longer the source of truth
├── results/
│   ├── experiments/         # Predictions, metrics, plots, and bootstrap outputs by model
│   └── models/              # Serialized trained models
├── src/
│   ├── analysis/            # Metrics and bootstrap uncertainty
│   ├── data/                # Dataset loading and storm-level splitting
│   ├── features/            # Tabular and sequence feature builders
│   ├── models/              # Model adapters and registry
│   ├── pipeline/            # Shared training / evaluation orchestration
│   ├── plots/               # Reusable diagnostics plots
│   ├── train.py             # CLI entrypoint for training
│   └── test.py              # CLI entrypoint for evaluation
└── README.md

Environment Setup

This project should run in a project-local Python 3.10 virtual environment. The repo currently pins tensorflow==2.14.0, which is not a good match for Python 3.13. Python 3.11 is also acceptable if it is a stable release build.

One-Time Setup

bash scripts/setup_env.sh

That script will:

Create .venv with python3.10
Upgrade pip, setuptools, and wheel
Install runtime and notebook dependencies
Register a Jupyter kernel for VS Code and Jupyter
Run an environment validation check

If your OS Python was installed without the standard venv module, the script falls back to virtualenv automatically.

Manual Setup

python3.10 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements-dev.txt
python -m ipykernel install --user --name gsi-performance-prediction --display-name "Python (.venv) - GSI Performance Prediction"
python scripts/validate_env.py

VS Code / Notebook Workflow

Open the repo in WSL
Select the interpreter at .venv/bin/python
Open a notebook and select the kernel Python (.venv) - GSI Performance Prediction

Best Practices

Do not install project packages into base, system Python, or a Conda environment.
Always use python -m pip ... so installs target the active interpreter.
Keep runtime dependencies in requirements.txt and notebook/dev tooling in requirements-dev.txt.
Recreate the virtual environment after major dependency changes, especially around TensorFlow, NumPy, scikit-learn, or Python version changes.
Commit dependency files and setup scripts, but never commit .venv/.
Retrain and resave serialized models in results/models/ after changing scikit-learn or TensorFlow/Keras versions.

Pipeline Notes

The repo is no longer intended to be run primarily from notebooks. The notebook workflows have been extracted into reusable source modules so models can be trained and evaluated consistently from the CLI.

Sequence models are now built storm-by-storm, so LSTM windows do not cross StormID boundaries.

Notebook Notes

The notebooks were originally authored in Colab and still contain Google Drive paths such as /content/drive/.... They should now be treated as exploration layers on top of the shared pipeline rather than the place where the core workflow lives.

notebooks/LSTM(uni)_BTI_Storms.ipynb and notebooks/RF_BTI_Storms.ipynb should use data/raw/filtered_storms_df.csv
notebooks/PILSTM_BTI_Storms.ipynb expects data/raw/filtered_df_ET_inf.csv
If the PILSTM dataset is not present, the pilstm CLI path will fail with a clear missing-file message

Usage Example

source .venv/bin/activate
python -m src.train --model rf

source .venv/bin/activate
python -m src.train --model lstm

source .venv/bin/activate
python -m src.test --model rf

source .venv/bin/activate
python -m src.train --all

Useful Overrides

python -m src.train --model lstm --set lstm.epochs=5 --set bootstrap.n_samples=200

python -m src.train --model rf --skip-bootstrap --skip-plots

Saved Artifacts

Each model writes artifacts to results/experiments/<model_name>/:

predictions.csv
metrics.json
metrics.csv
plots/
bootstrap/
resolved_config.yaml
preprocessor.joblib

Serialized models are written to results/models/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GSI-Performance-Prediction

Why This Matters

Supported Models

Project Structure

Environment Setup

One-Time Setup

Manual Setup

VS Code / Notebook Workflow

Best Practices

Pipeline Notes

Notebook Notes

Usage Example

Useful Overrides

Saved Artifacts

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
config		config
data		data
deployment		deployment
notebooks		notebooks
results		results
scripts		scripts
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
s-24-1459-202exit		s-24-1459-202exit

Folders and files

Latest commit

History

Repository files navigation

GSI-Performance-Prediction

Why This Matters

Supported Models

Project Structure

Environment Setup

One-Time Setup

Manual Setup

VS Code / Notebook Workflow

Best Practices

Pipeline Notes

Notebook Notes

Usage Example

Useful Overrides

Saved Artifacts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages