Skip to content

ShashwatBansal14/Inventory_forecast_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

End-to-End Inventory Forecasting System

From 12 Million Rows of Raw Sales Data to a Deployed Business Solution

Dashboard & Results Preview

Dashboard View 1 Dashboard View 2
Dashboard 1 Dashboard 2
Dashboard View 3 Actual vs Predicted
Dashboard 3 Actual vs Predicted

Read the Technical Breakdown

I documented the entire journey from handling 12 million rows of raw data to deploying this final system in a detailed article on Medium.

Medium

👉 Click here to read the full story

Python Docker


📍 Table of Contents

  1. Project Overview
  2. Technical Architecture (MLDLC)
  3. Phase 3 & 4: Data Integrity & EDA
  4. Phase 5: Feature Engineering
  5. Phase 6: Modeling & Results
  6. Phase 7 & 8: Deployment & Testing
  7. Installation & Usage

Project Overview

Inventory management is a high-stakes balancing act. This project addresses the challenge of predicting daily product demand to prevent stockouts (lost revenue) and minimize overstocking (high holding costs).

I built a complete Machine Learning pipeline that processes 1.4GB of raw grocery data (12M+ transactions) and delivers daily sales forecasts through an interactive Streamlit dashboard.

The Goal: Provide a tool where a user inputs a Date and a Product ID, and the system returns the predicted quantity to be sold on that specific day.


Technical Architecture (MLDLC)

To ensure a production-grade result, I followed the Machine Learning Development Life Cycle (MLDLC):

  1. Problem Framing: Defined the regression task for daily demand.
  2. Data Gathering: Handled 1.4GB of raw CSV data using chunking techniques.
  3. Pre-processing: Cleaned and validated data for physical logic.
  4. EDA: Identified seasonality and the "Top 10" products (Pareto Principle).
  5. Feature Engineering: Created lags and rolling windows to capture trends.
  6. Modeling: Trained 10 specialized models.
  7. Deployment: Containerized with Docker for Cloud deployment.

Phase 3 & 4: Data Integrity & EDA

Data Cleaning & Logic Checks

  • Zero-Leakage Policy: Identified and removed 140,897 rows of "future data" (Dec 2025) that would have caused data leakage.
  • Integrity Check: Verified zero null values and duplicates across 12 million rows, ensuring high data integrity.
  • Physical Logic: Screened for and removed negative prices/quantities.

Strategic Sampling (Pareto Principle)

With 452 different products, I applied the 80/20 rule.

  • Insight: Found that the Top 10 selling products contribute to over 10% of total sales.
  • Action: Created a optimized subset of the data focusing on these high-velocity items for higher model precision.

Seasonality & Aggregation

  • Resampling: Aggregated transaction-level data into Daily Totals per product.
  • Findings: Identified clear sales spikes on weekends and at month-ends.
  • Outliers: Used the IQR (Interquartile Range) method to handle extreme values that could negatively impact model training.

Phase 5: Feature Engineering

To transform static dates into patterns the model could understand, I engineered three types of features:

  1. Lag Features (1, 7, 52 days): Captures daily, weekly, and yearly momentum.
  2. Rolling Windows (7 & 30 days): Captures moving average trends to smooth out "noise."
  3. Temporal Indicators: Extracted Day of Week, Month, and Weekend flags from the timestamp.
  4. Label Encoding: Converted categorical Product IDs into numerical format for the model.

Phase 6: Modeling & Results

I treated this as a Multi-Model Regression problem. Instead of one generic model for all items, I trained 10 individual models—one for each top product—to capture unique demand patterns.

Model Selection & Comparison

I evaluated five different architectures to find the best fit for the seasonality of grocery sales:

  1. Linear Regression (Baseline)
  2. Xtreme Gradient Boosting (XGBoost)
  3. SARIMA (Selected for Production)
  4. Prophet by Meta (Selected for Production)
  5. LSTM (Deep Learning approach)

Model Performance Comparison

Below is the comparison of the Mean Absolute Percentage Error (MAPE) across all five models.

Model Comparison Chart

Why Prophet? While machine learning models like XGBoost are powerful but Prophet provided better handling of the specific weekly and monthly seasonality found in this grocery dataset, leading to more reliable forecasts for inventory planning.

Phase 7 & 8: Deployment & Testing

To move the project from a notebook to a usable product, I implemented a modern DevOps stack:

  1. Streamlit UI: Created a dashboard allowing users to select a Product ID and a Date to get an instant forecast.
  2. Dockerization: Packaged the application into a Docker Image to ensure it runs consistently across different environments.
  3. Cloud Deployment: Used a Container Registry and deployed the image to the cloud for real-time access.
  4. Testing: Verified model outputs against real-world 2025 sales data to ensure consistency.

Installation & Usage

Follow these steps to set up the Inventory Forecasting System on your local machine.

1. Prerequisites

Ensure you have Python 3.9+ and pip installed. You will also need Docker if you plan to run the containerized version.

2. Clone the Repository

git clone [https://github.com/shashwatbansal1414/inventory-forecasting.git](https://github.com/shashwatbansal1414/inventory-forecasting.git)
cd inventory-forecasting

# Create environment
python -m venv venv

# Activate on Windows
venv\Scripts\activate

# Activate on Mac/Linux
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

streamlit run app.py

About

A production-ready ML pipeline for demand forecasting. Features automated data cleaning, lag-based feature engineering, and a Streamlit UI. Built with Python, Scikit-Learn, and Various models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors