Skip to content

XGBoost + Attention-based RNN model to: detect early-stage fires

License

Notifications You must be signed in to change notification settings

CipherEnigma/fire_risk_prediction

Repository files navigation

Fire Risk Prediction System

A machine learning system for wildfire risk prediction using ensemble methods that combine XGBoost and PyTorch LSTM models. The system analyzes weather time-series data to forecast fire risk 6 hours in advance.

Project Overview

This system predicts wildfire risk in California using:

  • Weather station data from CIMIS (California Irrigation Management Information System)
  • Historical fire records from Wikipedia and CAL FIRE
  • Ensemble modeling combining XGBoost and attention-based LSTM
  • 6-hour prediction horizon using 24-hour weather sequences

Architecture

1. XGBoost Classifier

  • Gradient boosting for feature interactions
  • Station-based splitting to prevent data leakage
  • Cyclical time encoding for seasonal patterns (sin/cos)
  • Lag features (previous day weather conditions)
  • Class balancing for rare fire events (~5% positive class)

2. Attention-based LSTM (PyTorch)

  • Bidirectional LSTM with custom attention mechanism
  • 24-hour sequences6-hour prediction horizon
  • Focal loss for imbalanced data
  • Early stopping with validation monitoring

3. Ensemble Methods

  • Weighted Average: Grid search optimization
  • Stacked Meta-Learning: Logistic regression meta-classifier
  • Calibration analysis and reliability curves

Quick Start

Prerequisites

pip install xgboost joblib scikit-learn matplotlib seaborn
pip install torch pandas numpy

Usage

  1. Data Processing: dataset/colab-conditions-df-cleaning.ipynb
  2. XGBoost Training: fire_risk_xgboost_implementation.ipynb
  3. Ensemble Evaluation: ensemble_xgboost_lstm.ipynb

Project Structure

fire_risk_prediction/
├── fire_risk_xgboost_implementation.ipynb    # XGBoost model
├── ensemble_xgboost_lstm.ipynb               # PyTorch LSTM + Ensemble
├── dataset/
│   ├── colab-conditions-df-cleaning.ipynb   # Data preprocessing
│   └── conditions_df.csv                     # Processed weather/fire data
└── README.md

Key Features

Feature Engineering

  • Cyclical Encoding: Month/day-of-year as sin/cos pairs
  • Station Statistics: Historical fire rates per weather station
  • Lag Features: Previous 24-hour weather conditions
  • Region Encoding: Geographic information via one-hot encoding

Data Processing

  • Station-based splitting: 80% stations for training, 20% for testing
  • Class balancing: scale_pos_weight for XGBoost, focal loss for LSTM
  • Sequence construction: 24-hour windows for LSTM input
  • Missing value imputation: Median imputation for numerical features

Model Performance

Model Approach Key Strength
XGBoost Feature-based Complex feature interactions
LSTM Sequential Temporal pattern recognition
Ensemble Combined Best of both approaches

Note: Run the notebooks to see actual performance metrics

Technical Details

XGBoost Configuration

xgb_params = {
    "n_estimators": 500,
    "max_depth": 5,
    "learning_rate": 0.05,
    "subsample": 0.8,
    "colsample_bytree": 0.8,
    "tree_method": "hist",
    "scale_pos_weight": "auto"  # Handles class imbalance
}

LSTM Architecture

  • Input: 24-hour weather sequences
  • Hidden: 64-unit bidirectional LSTM
  • Attention: Custom attention mechanism
  • Output: Single probability (fire risk in next 6 hours)
  • Loss: Focal loss for imbalanced classification

Data Sources

  • Weather Data: CIMIS (California Irrigation Management Information System)
  • Fire Records: Wikipedia fire tables + CAL FIRE historical data
  • Time Period: 2018-2020
  • Geographic Coverage: California weather stations

Use Cases

  • Early Warning: 6-hour advance fire risk alerts
  • Resource Planning: Firefighting resource deployment
  • Research: Weather pattern analysis for fire prediction

Evaluation

The system includes comprehensive evaluation:

  • ROC and Precision-Recall curves
  • Confusion matrices and classification reports
  • Feature importance analysis (XGBoost + SHAP)
  • Attention weight visualization (LSTM)
  • Model calibration analysis

Future Enhancements

  • Real-time weather data integration
  • Extended prediction horizons (12-24 hours)
  • Additional weather variables (soil moisture, drought indices)
  • Spatial modeling for fire spread prediction

About

XGBoost + Attention-based RNN model to: detect early-stage fires

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published