Demand Forecasting System

Time-series demand forecasting pipeline comparing LSTM, XGBoost, and Prophet with ensemble learning, backtesting, and an interactive dashboard.

Overview

A production-ready demand forecasting system combining three ML models (Prophet, LSTM, XGBoost) with ensemble learning. Features automatic seasonality detection, anomaly handling, and an interactive Streamlit dashboard. Built on the Kaggle Store Sales Time Series Forecasting dataset.

Key Features

Multi-model forecasting: Prophet, PyTorch LSTM, XGBoost with optimized ensemble
Anomaly detection: IQR + Isolation Forest with configurable handling strategies
Seasonality analysis: Fourier transform and ACF/PACF for pattern detection
Feature engineering: Lag features, rolling statistics, temporal indicators, holiday effects
Backtesting framework: Rolling window cross-validation with Diebold-Mariano tests
Interactive dashboard: Streamlit + Plotly visualizations with 5 pages
CI/CD: GitHub Actions with linting, testing, and Docker build

Architecture

+-----------------------------------------------------------+
|                   Streamlit Dashboard                      |
|  +----------+ +----------+ +---------+ +---------------+  |
|  | Forecast | |  Model   | | Anomaly | | Backtesting   |  |
|  |  View    | | Compare  | | Explorer| | Results       |  |
|  +----+-----+ +----+-----+ +----+----+ +------+--------+  |
+-------+------------+------------+-------------+-----------+
        |            |            |             |
+-------+------------+------------+-------------+-----------+
|                      Model Layer                          |
|  +---------+    +---------+    +---------+                |
|  | Prophet |    |  LSTM   |    | XGBoost |                |
|  +----+----+    +----+----+    +----+----+                |
|       +              |              +                     |
|                 +----+----+                               |
|                 | Ensemble|                               |
|                 +---------+                               |
+-----------------------------------------------------------+
                          |
+-----------------------------------------------------------+
|                     Data Layer                            |
|  +------------+ +--------------+ +--------------------+   |
|  |   DuckDB   | |   Feature    | |     Anomaly        |   |
|  | raw/cleaned| |  Engineering | |     Detection      |   |
|  +------------+ +--------------+ +--------------------+   |
+-----------------------------------------------------------+

Quick Start

Prerequisites

Python 3.11+
Kaggle account (for dataset download)

1. Install

# Clone repository
git clone https://github.com/KarasiewiczStephane/demand-forecast.git
cd demand-forecast

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
make install

2. Download Data

The project uses the Kaggle Store Sales dataset. Set your Kaggle API credentials (see Kaggle API docs) and download:

export KAGGLE_USERNAME="your_username"
export KAGGLE_KEY="your_key"

python -m src.main download

This downloads and extracts all competition files into data/raw/.

3. Train Models & Run Analysis

# Run anomaly detection
python -m src.main detect-anomalies

# Train all models (Prophet, LSTM, XGBoost)
python -m src.main train --models all

# Run backtesting evaluation
python -m src.main backtest

You can also train a single model: python -m src.main train --models prophet

4. Launch Dashboard

make dashboard

Open http://localhost:8501 in your browser. The dashboard provides five interactive pages (see Dashboard Pages below).

Docker Deployment

# Build and run with docker-compose
docker-compose up -d

# Or manually
make docker-build
make docker-run

Dashboard Pages

Page	Description
Forecast View	Historical sales + ensemble forecast with confidence intervals, filterable by store/family
Model Comparison	Side-by-side metrics table, error distributions, rolling MAPE
Anomaly Explorer	Timeline of detected anomalies, filter by method/store/family
Seasonality Patterns	Seasonal decomposition, ACF/PACF analysis
Backtesting Results	Rolling accuracy, best-model-per-window, aggregate metrics

Project Structure

demand-forecast/
├── src/
│   ├── data/           # Data pipeline (download, validate, anomaly, features)
│   ├── models/         # Prophet, LSTM, XGBoost, Ensemble
│   ├── evaluation/     # Backtesting, metrics, report generation
│   ├── dashboard/      # Streamlit app with 5 pages
│   └── utils/          # Config loader, structured logging
├── tests/              # Unit tests (>80% coverage)
├── configs/            # YAML configuration
├── data/               # Raw, processed, and report data
├── models/             # Model checkpoints
├── .github/workflows/  # CI pipeline (lint, test, docker)
├── Dockerfile
├── docker-compose.yml
├── Makefile
├── requirements.txt
└── README.md

Configuration

All parameters are centralized in configs/config.yaml:

Forecast horizons: 7, 14, 28 days
LSTM lookback: 28 days
Backtesting: 12-month train window, 28-day test, 7-day step
Anomaly thresholds: IQR multiplier, Isolation Forest contamination

Models

Model	Description
Prophet	Additive time series model with Ecuadorian holidays and changepoint detection
LSTM	PyTorch sequence model with configurable lookback, early stopping, gradient clipping
XGBoost	Gradient boosting on engineered features with per-store training
Ensemble	Weighted average with SLSQP-optimized weights minimizing RMSE

Testing

# Run all tests
make test

# Run with coverage report
pytest tests/ -v --cov=src --cov-report=html

# Run local CI checks
make ci-local

Makefile Commands

Command	Description
`make install`	Install dependencies
`make run`	Print CLI help (`python -m src.main`)
`make test`	Run tests with coverage
`make lint`	Run ruff linter and formatter
`make dashboard`	Launch Streamlit dashboard on port 8501
`make docker-build`	Build Docker image
`make docker-run`	Run Docker container on port 8501
`make ci-local`	Run lint + format check + tests (80% coverage gate)
`make clean`	Remove caches

Author

Stéphane Karasiewicz — skarazdata.com | LinkedIn

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Demand Forecasting System

Overview

Key Features

Architecture

Quick Start

Prerequisites

1. Install

2. Download Data

3. Train Models & Run Analysis

4. Launch Dashboard

Docker Deployment

Dashboard Pages

Project Structure

Configuration

Models

Testing

Makefile Commands

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
configs		configs
data		data
models/checkpoints		models/checkpoints
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Folders and files

Latest commit

History

Repository files navigation

Demand Forecasting System

Overview

Key Features

Architecture

Quick Start

Prerequisites

1. Install

2. Download Data

3. Train Models & Run Analysis

4. Launch Dashboard

Docker Deployment

Dashboard Pages

Project Structure

Configuration

Models

Testing

Makefile Commands

Author

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages