End-to-End Inventory Forecasting System

From 12 Million Rows of Raw Sales Data to a Deployed Business Solution

Dashboard & Results Preview

Dashboard View 1	Dashboard View 2

Dashboard View 3	Actual vs Predicted

Read the Technical Breakdown

I documented the entire journey from handling 12 million rows of raw data to deploying this final system in a detailed article on Medium.

👉 Click here to read the full story

Project Overview

Inventory management is a high-stakes balancing act. This project addresses the challenge of predicting daily product demand to prevent stockouts (lost revenue) and minimize overstocking (high holding costs).

I built a complete Machine Learning pipeline that processes 1.4GB of raw grocery data (12M+ transactions) and delivers daily sales forecasts through an interactive Streamlit dashboard.

The Goal: Provide a tool where a user inputs a Date and a Product ID, and the system returns the predicted quantity to be sold on that specific day.

Technical Architecture (MLDLC)

To ensure a production-grade result, I followed the Machine Learning Development Life Cycle (MLDLC):

Problem Framing: Defined the regression task for daily demand.
Data Gathering: Handled 1.4GB of raw CSV data using chunking techniques.
Pre-processing: Cleaned and validated data for physical logic.
EDA: Identified seasonality and the "Top 10" products (Pareto Principle).
Feature Engineering: Created lags and rolling windows to capture trends.
Modeling: Trained 10 specialized models.
Deployment: Containerized with Docker for Cloud deployment.

Phase 3 & 4: Data Integrity & EDA

Data Cleaning & Logic Checks

Zero-Leakage Policy: Identified and removed 140,897 rows of "future data" (Dec 2025) that would have caused data leakage.
Integrity Check: Verified zero null values and duplicates across 12 million rows, ensuring high data integrity.
Physical Logic: Screened for and removed negative prices/quantities.

Strategic Sampling (Pareto Principle)

With 452 different products, I applied the 80/20 rule.

Insight: Found that the Top 10 selling products contribute to over 10% of total sales.
Action: Created a optimized subset of the data focusing on these high-velocity items for higher model precision.

Seasonality & Aggregation

Resampling: Aggregated transaction-level data into Daily Totals per product.
Findings: Identified clear sales spikes on weekends and at month-ends.
Outliers: Used the IQR (Interquartile Range) method to handle extreme values that could negatively impact model training.

Phase 5: Feature Engineering

To transform static dates into patterns the model could understand, I engineered three types of features:

Lag Features (1, 7, 52 days): Captures daily, weekly, and yearly momentum.
Rolling Windows (7 & 30 days): Captures moving average trends to smooth out "noise."
Temporal Indicators: Extracted Day of Week, Month, and Weekend flags from the timestamp.
Label Encoding: Converted categorical Product IDs into numerical format for the model.

Phase 6: Modeling & Results

I treated this as a Multi-Model Regression problem. Instead of one generic model for all items, I trained 10 individual models—one for each top product—to capture unique demand patterns.

Model Selection & Comparison

I evaluated five different architectures to find the best fit for the seasonality of grocery sales:

Linear Regression (Baseline)
Xtreme Gradient Boosting (XGBoost)
SARIMA (Selected for Production)
Prophet by Meta (Selected for Production)
LSTM (Deep Learning approach)

Model Performance Comparison

Below is the comparison of the Mean Absolute Percentage Error (MAPE) across all five models.

Why Prophet? While machine learning models like XGBoost are powerful but Prophet provided better handling of the specific weekly and monthly seasonality found in this grocery dataset, leading to more reliable forecasts for inventory planning.

Phase 7 & 8: Deployment & Testing

To move the project from a notebook to a usable product, I implemented a modern DevOps stack:

Streamlit UI: Created a dashboard allowing users to select a Product ID and a Date to get an instant forecast.
Dockerization: Packaged the application into a Docker Image to ensure it runs consistently across different environments.
Cloud Deployment: Used a Container Registry and deployed the image to the cloud for real-time access.
Testing: Verified model outputs against real-world 2025 sales data to ensure consistency.

Installation & Usage

Follow these steps to set up the Inventory Forecasting System on your local machine.

1. Prerequisites

Ensure you have Python 3.9+ and pip installed. You will also need Docker if you plan to run the containerized version.

2. Clone the Repository

git clone [https://github.com/shashwatbansal1414/inventory-forecasting.git](https://github.com/shashwatbansal1414/inventory-forecasting.git)
cd inventory-forecasting

# Create environment
python -m venv venv

# Activate on Windows
venv\Scripts\activate

# Activate on Mac/Linux
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Notebooks		Notebooks
assets		assets
utils		utils
.gitignore		.gitignore
README.md		README.md
app.py		app.py
dockerfile		dockerfile
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End Inventory Forecasting System

From 12 Million Rows of Raw Sales Data to a Deployed Business Solution

Dashboard & Results Preview

Read the Technical Breakdown

📍 Table of Contents

Project Overview

Technical Architecture (MLDLC)

Phase 3 & 4: Data Integrity & EDA

Data Cleaning & Logic Checks

Strategic Sampling (Pareto Principle)

Seasonality & Aggregation

Phase 5: Feature Engineering

Phase 6: Modeling & Results

Model Selection & Comparison

Model Performance Comparison

Phase 7 & 8: Deployment & Testing

Installation & Usage

1. Prerequisites

2. Clone the Repository

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

End-to-End Inventory Forecasting System

From 12 Million Rows of Raw Sales Data to a Deployed Business Solution

Dashboard & Results Preview

Read the Technical Breakdown

📍 Table of Contents

Project Overview

Technical Architecture (MLDLC)

Phase 3 & 4: Data Integrity & EDA

Data Cleaning & Logic Checks

Strategic Sampling (Pareto Principle)

Seasonality & Aggregation

Phase 5: Feature Engineering

Phase 6: Modeling & Results

Model Selection & Comparison

Model Performance Comparison

Phase 7 & 8: Deployment & Testing

Installation & Usage

1. Prerequisites

2. Clone the Repository

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages