This project implements a probabilistic wind power forecasting pipeline using real turbine SCADA data. The goal is to produce reliable short-term power predictions with quantified uncertainty, a critical requirement for modern power systems integrating large shares of wind energy.
The system combines physics-informed data validation, machine learning forecasting, and distribution-free uncertainty calibration to generate prediction intervals suitable for operational decision-making.
A live dashboard visualizing forecasts and uncertainty intervals is available here: 🔗 https://racem1000.github.io/wind-power-forecasting/dashboard/
Wind energy is inherently variable and uncertain. Accurate forecasting helps:
- Improve grid stability and dispatch planning
- Reduce balancing costs in electricity markets
- Support renewable integration at scale
While most forecasting systems focus on point predictions, grid operators and energy companies require probabilistic forecasts that quantify uncertainty. This project demonstrates a lightweight yet robust pipeline combining physical constraints with machine learning to produce calibrated probabilistic forecasts.
-
Source: Wind turbine SCADA data
-
Year: 2018
-
Resolution: 10-minute intervals
-
Size: ~50,000 observations
-
Variables used:
- Wind speed
- Power output
- Derived aerodynamic features
- Time-based features
To ensure realistic training data, the pipeline applies aerodynamic and turbine operation constraints:
- Cut-in speed filtering (removal of invalid power generation below operational threshold)
- Cut-out speed filtering (removal of shutdown-region artifacts)
- Betz limit validation to detect physically impossible power values
This step ensures the model learns from physically plausible turbine behavior.
The forecasting model uses a feature set combining physics, temporal signals, and historical dynamics:
Base features
- Wind speed
- Wind power density (v³)
Temporal encoding
- Cyclical hour-of-day
- Cyclical day-of-year
Historical dynamics
- 19 lag features
- Rolling statistics
These features capture both short-term turbulence effects and daily seasonal patterns.
The core model is LightGBM quantile regression, trained to predict multiple conditional quantiles:
- q = 0.10
- q = 0.50 (median forecast)
- q = 0.90
Training is performed using pinball loss, enabling direct learning of conditional power distributions rather than a single deterministic value.
To guarantee statistically valid prediction intervals, the project applies:
MAPIE Split Conformal Regression
This provides distribution-free coverage guarantees, ensuring forecast intervals remain reliable even if the underlying model is imperfect.
Turbine rated power: 3600 kW
| Metric | Value |
|---|---|
| Mean Absolute Error | 51 kW |
| Relative Error | 1.4% of rated capacity |
| Target Coverage | 80% |
| Achieved Coverage | 75.8% |
The results demonstrate high point forecast accuracy and well-calibrated probabilistic intervals suitable for operational forecasting scenarios.
An interactive dashboard was developed to visualize:
- Wind speed evolution
- Power forecasts
- Prediction intervals
- Forecast uncertainty
Live dashboard: https://racem1000.github.io/wind-power-forecasting/dashboard/
Languages & Libraries
- Python
- LightGBM
- MAPIE
- Pandas / NumPy
- Scikit-learn
Visualization
- Chart.js
- HTML
- GitHub Pages
Development Environment
- Jupyter Notebooks
- Git / GitHub
wind-power-forecasting/
│
├── data/ # Processed SCADA dataset
├── notebooks/ # Exploratory analysis and modeling
├── models/ # Trained models
├── dashboard/ # Web visualization
├── src/ # Data pipeline and forecasting scripts
└── README.md
Potential extensions include:
- Multi-turbine farm forecasting
- Integration of numerical weather prediction (NWP) data
- Deep learning models (LSTM / Temporal Transformers)
- Advanced probabilistic scoring (CRPS, Winkler score)
Racem Kamel
Renewable Energy Engineer
Focus areas:
- Wind energy analytics
- Probabilistic forecasting
- AI for power systems