| Dashboard View 1 | Dashboard View 2 |
|---|---|
![]() |
![]() |
| Dashboard View 3 | Actual vs Predicted |
![]() |
![]() |
I documented the entire journey from handling 12 million rows of raw data to deploying this final system in a detailed article on Medium.
👉 Click here to read the full story
- Project Overview
- Technical Architecture (MLDLC)
- Phase 3 & 4: Data Integrity & EDA
- Phase 5: Feature Engineering
- Phase 6: Modeling & Results
- Phase 7 & 8: Deployment & Testing
- Installation & Usage
Inventory management is a high-stakes balancing act. This project addresses the challenge of predicting daily product demand to prevent stockouts (lost revenue) and minimize overstocking (high holding costs).
I built a complete Machine Learning pipeline that processes 1.4GB of raw grocery data (12M+ transactions) and delivers daily sales forecasts through an interactive Streamlit dashboard.
The Goal: Provide a tool where a user inputs a Date and a Product ID, and the system returns the predicted quantity to be sold on that specific day.
To ensure a production-grade result, I followed the Machine Learning Development Life Cycle (MLDLC):
- Problem Framing: Defined the regression task for daily demand.
- Data Gathering: Handled 1.4GB of raw CSV data using chunking techniques.
- Pre-processing: Cleaned and validated data for physical logic.
- EDA: Identified seasonality and the "Top 10" products (Pareto Principle).
- Feature Engineering: Created lags and rolling windows to capture trends.
- Modeling: Trained 10 specialized models.
- Deployment: Containerized with Docker for Cloud deployment.
- Zero-Leakage Policy: Identified and removed 140,897 rows of "future data" (Dec 2025) that would have caused data leakage.
- Integrity Check: Verified zero null values and duplicates across 12 million rows, ensuring high data integrity.
- Physical Logic: Screened for and removed negative prices/quantities.
With 452 different products, I applied the 80/20 rule.
- Insight: Found that the Top 10 selling products contribute to over 10% of total sales.
- Action: Created a optimized subset of the data focusing on these high-velocity items for higher model precision.
- Resampling: Aggregated transaction-level data into Daily Totals per product.
- Findings: Identified clear sales spikes on weekends and at month-ends.
- Outliers: Used the IQR (Interquartile Range) method to handle extreme values that could negatively impact model training.
To transform static dates into patterns the model could understand, I engineered three types of features:
- Lag Features (1, 7, 52 days): Captures daily, weekly, and yearly momentum.
- Rolling Windows (7 & 30 days): Captures moving average trends to smooth out "noise."
- Temporal Indicators: Extracted Day of Week, Month, and Weekend flags from the timestamp.
- Label Encoding: Converted categorical Product IDs into numerical format for the model.
I treated this as a Multi-Model Regression problem. Instead of one generic model for all items, I trained 10 individual models—one for each top product—to capture unique demand patterns.
I evaluated five different architectures to find the best fit for the seasonality of grocery sales:
- Linear Regression (Baseline)
- Xtreme Gradient Boosting (XGBoost)
- SARIMA (Selected for Production)
- Prophet by Meta (Selected for Production)
- LSTM (Deep Learning approach)
Below is the comparison of the Mean Absolute Percentage Error (MAPE) across all five models.
Why Prophet? While machine learning models like XGBoost are powerful but Prophet provided better handling of the specific weekly and monthly seasonality found in this grocery dataset, leading to more reliable forecasts for inventory planning.
To move the project from a notebook to a usable product, I implemented a modern DevOps stack:
- Streamlit UI: Created a dashboard allowing users to select a Product ID and a Date to get an instant forecast.
- Dockerization: Packaged the application into a Docker Image to ensure it runs consistently across different environments.
- Cloud Deployment: Used a Container Registry and deployed the image to the cloud for real-time access.
- Testing: Verified model outputs against real-world 2025 sales data to ensure consistency.
Follow these steps to set up the Inventory Forecasting System on your local machine.
Ensure you have Python 3.9+ and pip installed. You will also need Docker if you plan to run the containerized version.
git clone [https://github.com/shashwatbansal1414/inventory-forecasting.git](https://github.com/shashwatbansal1414/inventory-forecasting.git)
cd inventory-forecasting
# Create environment
python -m venv venv
# Activate on Windows
venv\Scripts\activate
# Activate on Mac/Linux
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
streamlit run app.py



