PREVAIL: Predictive Response for Emergency Volume Assessment in Incident Locations

An HDSI Capstone Project

Section B19-1: Aditya Surapaneni, Angela Hu, Subika Haider, Suhani Sharma

PREVAIL is a machine-learning pipeline that predicts power outages and crew dispatch requirements for San Diego Gas & Electric (SDG&E) service territory. It fuses weather station observations with utility outage records, engineers spatial-temporal features using Uber H3 hexagons, and trains XGBoost and ensemble models. The dashboard is live in a separate repository: https://github.com/angela139/prevail-dashboard.

Notes for the TA

To get an accurate reflection of each member's contribution history, please take a look at the different branches in the repository. We each worked on various parts of the project in our own branches, and pushed them to the main branch once each part was finalized, which is why all contributions may not be reflected in the main branch.
Additionally we would like to note that many of our data sources were given to us directly by our mentors at SDG&E and due to a data privacy agreement between them and UCSD we are not able to publish them directly to this repo.

Thank you!

Project Overview

Electric utilities face significant operational challenges in anticipating and responding to weather-related grid disruptions. Traditional reactive approaches often result in inefficient crew deployment, increased standby costs, and prolonged restoration times during extreme weather events.

To address this operational gap, PREVAIL introduces a framework that forecasts potential grid vulnerabilities over a weekly planning window. Unlike conventional models, this project utilizes a two-stage predictive architecture:

Outage Location Prediction: Identifies geographic areas (hexagonal grid cells) with probable outages due to extreme weather conditions
Crew Size Optimization: Quantifies the precise number of crew members required for restoration in affected areas

Dataset

By engineering a novel spatio-temporal linkage between historical outage logs and crew dispatch records using a ZIP code proxy, we constructed a training dataset of over 1,500 verified adverse weather-related responses. This dataset enables:

Proactive crew staging - Position crews before incidents occur
Resource optimization - Reduce standby costs while enhancing grid reliability
Data-driven decision making - Forecasts for operational planning

The system combines weather data, power outage records, and crew deployment information using:

Spatial analysis with H3 hexagonal indexing for geographic granularity
Time-series modeling with temporal lag features and rolling aggregations
Ensemble machine learning including XGBoost, Random Forest, and Lasso regression
Interactive visualization through a geospatial dashboard (see prevail-dashboard)

Project Structure

data/all_weather.parquet
        │
        ├──────────────────────────────────────────────────────────┐
        ▼                                                          ▼
master_dataset_hourly_build.py                         outage_span_weather_10311_clean_v2.parquet
  • station → H3 hex mapping (res=7)                   (outage events + nearest-station weather)
  • hex-hour weather aggregation                                    │
  • extreme weather flags (q95/q05)                                │
  • outage start labels per hex-hour                   ◄───────────┘
  • future outage targets (1/3/6/12/24h)
  • lag & rolling features (1/3/6/12/24h)
        │
        ▼
data/master_dataset_hex_hour_v1.parquet  ←─── primary ML dataset
        │
        ├──► outage_prediction_model.py  →  XGBoost outage classifier
        │
        ├──► outage_sort_merge.py  →  data/merged_outage_sort_data.csv
        │         (ZIP code KDTree match + SORT crew dispatch join)
        │
        ├──► crew_size_prediction_approach_1.py  →  api/models/trained/
        │         LASSO → RandomForest → XGBoost(Poisson) → Stacking
        │
        └──► crew_size_final.py  →  data/predictions_final.csv
                  (generates predictions for the dashboard)

(dashboard → https://github.com/angela139/prevail-dashboard)

Prerequisites

Python Environment

Python 3.11+ is required
All packages are pinned in requirements.txt

python -m venv venv
source venv/bin/activate          # Windows: venv\Scripts\activate
pip install -r requirements.txt

Required Data Files

The following input files must be present before running the pipeline. They are not committed to the repository due to size and data-use agreements.

File	Description
`data/all_weather.parquet`	Consolidated weather station observations (temp °F, wind/gust mph, humidity %)
`data/outage_span_weather_10311_clean_v2.parquet`	Outage events joined to nearest weather station observations; includes span/conductor/topology fields
`data/SORT/REP_ORD_ORDER.parquet`	SORT work orders
`data/SORT/REP_LAB_BUSINESS.parquet`	SORT business/crew type lookup
`data/SORT/REP_ASN_ASSIGNMENT.parquet`	SORT crew assignment resource counts
`data/2022_Gaz_zcta_national.txt`	USCB ZCTA ZIP code centroid coordinates (2022 Gazetteer)

Pipeline Execution Order

Run these scripts in order to reproduce all output datasets from scratch.

Step 1 — Build the master dataset

python master_v3_creation.py

Inputs: data/outage_span_weather_10311_clean_v2.parquet, data/all_weather.parquet
Outputs: data/master_v3.csv, data/master_v3_cause_thresholds.csv, data/master_v3_cause_thresholds_with_lags.csv

Step 2 — Merge outages with SORT crew dispatch records

python outage_sort_merge.py

Inputs: data/outage_and_weather_data.parquet, data/2022_Gaz_zcta_national.txt, data/SORT/*.parquet
Outputs: data/merged_outage_sort_data.csv

Step 3 — Train the outage prediction model

python outage_prediction_model.py

Inputs: data/master_dataset_hex_hour_v1.parquet (via outage_prediction_feature_engineering.py)
Outputs: roc_curve_xgboost.png, console classification report

Step 4 — Train the crew size prediction model

python crew_size_prediction_approach_1.py

Inputs: data/merged_outage_sort_data.csv, data/master_dataset_hex_hour_v1.parquet
Outputs: api/models/trained/ — serialized model artifacts

Step 5 — Generate dashboard predictions

python crew_size_final.py

Inputs: data/merged_outage_sort_data.csv, data/master_dataset_hex_hour_v1.parquet
Outputs: data/predictions_final.csv — crew size predictions used by the dashboard

Unit Testing

All unit tests live in the tests/ directory. No data files are required — every test uses small synthetic DataFrames defined in tests/conftest.py so the full suite can be run immediately after cloning, before any data is present.

Running the Tests

# Run every test with verbose per-test output (recommended)
pytest tests/ -v

# Run only one module at a time
pytest tests/test_feature_engineering.py -v
pytest tests/test_spatial_utils.py -v
pytest tests/test_preprocessing.py -v

Expected: 67 tests pass in under 10 seconds.

Test File Reference

File	Source module tested	What it covers
`tests/conftest.py`	(shared fixtures)	Defines all synthetic DataFrames reused across the three test files. Read this first to understand the shape and values of every test input.
`tests/test_feature_engineering.py`	`master_dataset_hourly_build.py`	`mode_or_nan` (categorical aggregation edge cases); `add_extreme_weather_flags_hourly` (q95/q05 thresholds, binary output, union logic); `add_lag_and_rolling_features_hourly` (exact lag values, history-only rolling mean/max, column naming); `add_future_outage_targets` (forward-looking target correctness, no current-hour leakage)
`tests/test_spatial_utils.py`	`utils/spatial_utils.py`	`lat_lon_to_h3` (determinism, resolution separation); `add_hex_ids_to_df` (column creation, null safety, custom column names); `map_outages_to_zip_codes` (KDTree nearest-ZIP assignment, up/down lat-lon fallback); `clean_zip_codes` (ZIP+4 strip, float suffix strip, nan drop, non-5-digit drop)
`tests/test_preprocessing.py`	`preprocessing/crew_data_cleaning.py`	`remove_duplicates` (count, custom subset, idempotency); `infer_missing_durations` (90-min span filled, non-zero duration protected, temp column removed); `filter_weather_related_outages` (flag=True kept, flag=False dropped); `convert_numeric_columns` (coercion, unparsable → NaN); `convert_datetime_columns_full` (tz-aware output, NaT on bad input)

How fixtures work (`tests/conftest.py`)

pytest automatically loads conftest.py before any test file runs. Any function decorated with @pytest.fixture in that file is available as a parameter in any test method — pytest injects the return value automatically. For example, hourly_weather_df is a 12-row DataFrame (2 hexes × 6 hours) where the last row of each hex is deliberately set to extreme weather values so the threshold tests have a predictable ground truth.

Project Structure

PREVAIL/
├── master_v3_creation.py               # Step 1: per-outage summary dataset
├── outage_prediction_data.py           # Weekly/daily weather+outage joins
├── outage_prediction_feature_engineering.py
├── outage_prediction_model.py          # XGBoost outage classifier
├── outage_sort_merge.py                # Outage ↔ SORT crew dispatch merge
├── crew_size_prediction_approach_1.py  # LASSO → RF → XGBoost → Stacking
├── crew_size_prediction_approach_2.py
├── crew_size_final.py                  # Generates predictions for the dashboard
├── crew_feature_engineering.py
├── requirements.txt
├── README.md
│
├── preprocessing/
│   ├── weather_preprocessing.py
│   ├── crew_data_cleaning.py
│   ├── crew_data_loading.py
│   └── sort_data_processing.py
│
├── utils/
│   ├── pipeline_utils.py
│   └── spatial_utils.py
│
├── tests/                              # Unit tests (no data files required)
│   ├── conftest.py
│   ├── test_feature_engineering.py
│   ├── test_spatial_utils.py
│   └── test_preprocessing.py
│
├── data/                               # Input + output datasets (not in git)
└── logs/                               # Pipeline run logs (auto-created)

Future Work

Real-Time API Integration: The most immediate enhancement would be integrating a live weather API. By streaming real-time meteorological data directly into the inference pipeline, the model could generate dynamic, on-the-fly workforce forecasts as storms evolve, rather than relying on batch-processed historical telemetry.
Logistical Refinement: The deployment pipeline could be further optimized by incorporating real-time traffic and road closure data. Integrating these variables would allow the model to adjust staging recommendations based on the actual travel time required for crews to reach an incident location during adverse conditions.
Multimodal Failure Prediction: While the current framework focuses on personnel counts, the underlying Poisson architecture could be expanded to predict specific equipment failures — such as transformer blowouts versus vegetation-related line faults. Mapping specific hardware needs alongside crew sizes would provide a more holistic logistical solution for emergency response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PREVAIL: Predictive Response for Emergency Volume Assessment in Incident Locations

Project Overview

Dataset

Project Structure

Prerequisites

Python Environment

Required Data Files

Pipeline Execution Order

Step 1 — Build the master dataset

Step 2 — Merge outages with SORT crew dispatch records

Step 3 — Train the outage prediction model

Step 4 — Train the crew size prediction model

Step 5 — Generate dashboard predictions

Unit Testing

Running the Tests

Test File Reference

How fixtures work (`tests/conftest.py`)

Project Structure

Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
eda		eda
logs		logs
preprocessing		preprocessing
tests		tests
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
crew_feature_engineering.py		crew_feature_engineering.py
crew_model_visualization.py		crew_model_visualization.py
crew_size_final.py		crew_size_final.py
crew_size_prediction_approach_1.py		crew_size_prediction_approach_1.py
crew_size_prediction_approach_2.py		crew_size_prediction_approach_2.py
master_dataset_hourly_build.py		master_dataset_hourly_build.py
master_dataset_v3.py		master_dataset_v3.py
outage_prediction_data.py		outage_prediction_data.py
outage_prediction_feature_engineering.py		outage_prediction_feature_engineering.py
outage_prediction_model.py		outage_prediction_model.py
outage_sort_merge.py		outage_sort_merge.py
outage_weather_eda.py		outage_weather_eda.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PREVAIL: Predictive Response for Emergency Volume Assessment in Incident Locations

Project Overview

Dataset

Project Structure

Prerequisites

Python Environment

Required Data Files

Pipeline Execution Order

Step 1 — Build the master dataset

Step 2 — Merge outages with SORT crew dispatch records

Step 3 — Train the outage prediction model

Step 4 — Train the crew size prediction model

Step 5 — Generate dashboard predictions

Unit Testing

Running the Tests

Test File Reference

How fixtures work (tests/conftest.py)

Project Structure

Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

How fixtures work (`tests/conftest.py`)

Packages