Probabilistic Wind Power Forecasting

Overview

This project implements a probabilistic wind power forecasting pipeline using real turbine SCADA data. The goal is to produce reliable short-term power predictions with quantified uncertainty, a critical requirement for modern power systems integrating large shares of wind energy.

The system combines physics-informed data validation, machine learning forecasting, and distribution-free uncertainty calibration to generate prediction intervals suitable for operational decision-making.

A live dashboard visualizing forecasts and uncertainty intervals is available here: 🔗 https://racem1000.github.io/wind-power-forecasting/dashboard/

Motivation

Wind energy is inherently variable and uncertain. Accurate forecasting helps:

Improve grid stability and dispatch planning
Reduce balancing costs in electricity markets
Support renewable integration at scale

While most forecasting systems focus on point predictions, grid operators and energy companies require probabilistic forecasts that quantify uncertainty. This project demonstrates a lightweight yet robust pipeline combining physical constraints with machine learning to produce calibrated probabilistic forecasts.

Dataset

Source: Wind turbine SCADA data
Year: 2018
Resolution: 10-minute intervals
Size: ~50,000 observations
Variables used:
- Wind speed
- Power output
- Derived aerodynamic features
- Time-based features

Methodology

1. Physics-Based Data Cleaning

To ensure realistic training data, the pipeline applies aerodynamic and turbine operation constraints:

Cut-in speed filtering (removal of invalid power generation below operational threshold)
Cut-out speed filtering (removal of shutdown-region artifacts)
Betz limit validation to detect physically impossible power values

This step ensures the model learns from physically plausible turbine behavior.

2. Feature Engineering

The forecasting model uses a feature set combining physics, temporal signals, and historical dynamics:

Base features

Wind speed
Wind power density (v³)

Temporal encoding

Cyclical hour-of-day
Cyclical day-of-year

Historical dynamics

19 lag features
Rolling statistics

These features capture both short-term turbulence effects and daily seasonal patterns.

3. Probabilistic Forecasting Model

The core model is LightGBM quantile regression, trained to predict multiple conditional quantiles:

q = 0.10
q = 0.50 (median forecast)
q = 0.90

Training is performed using pinball loss, enabling direct learning of conditional power distributions rather than a single deterministic value.

4. Uncertainty Calibration

To guarantee statistically valid prediction intervals, the project applies:

MAPIE Split Conformal Regression

This provides distribution-free coverage guarantees, ensuring forecast intervals remain reliable even if the underlying model is imperfect.

Results

Turbine rated power: 3600 kW

Metric	Value
Mean Absolute Error	51 kW
Relative Error	1.4% of rated capacity
Target Coverage	80%
Achieved Coverage	75.8%

The results demonstrate high point forecast accuracy and well-calibrated probabilistic intervals suitable for operational forecasting scenarios.

Visualization Dashboard

An interactive dashboard was developed to visualize:

Wind speed evolution
Power forecasts
Prediction intervals
Forecast uncertainty

Live dashboard: https://racem1000.github.io/wind-power-forecasting/dashboard/

Technology Stack

Languages & Libraries

Python
LightGBM
MAPIE
Pandas / NumPy
Scikit-learn

Visualization

Chart.js
HTML
GitHub Pages

Development Environment

Jupyter Notebooks
Git / GitHub

Project Structure

wind-power-forecasting/
│
├── data/                # Processed SCADA dataset
├── notebooks/           # Exploratory analysis and modeling
├── models/              # Trained models
├── dashboard/           # Web visualization
├── src/                 # Data pipeline and forecasting scripts
└── README.md

Future Improvements

Potential extensions include:

Multi-turbine farm forecasting
Integration of numerical weather prediction (NWP) data
Deep learning models (LSTM / Temporal Transformers)
Advanced probabilistic scoring (CRPS, Winkler score)

Author

Racem Kamel

Renewable Energy Engineer

Focus areas:

Wind energy analytics
Probabilistic forecasting
AI for power systems

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
dashboard		dashboard
data		data
node_modules		node_modules
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
T1.csv		T1.csv
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Probabilistic Wind Power Forecasting

Overview

Motivation

Dataset

Methodology

1. Physics-Based Data Cleaning

2. Feature Engineering

3. Probabilistic Forecasting Model

4. Uncertainty Calibration

Results

Visualization Dashboard

Technology Stack

Project Structure

Future Improvements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Probabilistic Wind Power Forecasting

Overview

Motivation

Dataset

Methodology

1. Physics-Based Data Cleaning

2. Feature Engineering

3. Probabilistic Forecasting Model

4. Uncertainty Calibration

Results

Visualization Dashboard

Technology Stack

Project Structure

Future Improvements

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages