Skip to content

Quantum-Yeti/MarketBasketAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Market Basket Analysis (BI Data Analysis Project)


Project Overview

This project analyzes the Instacart dataset to uncover patterns in customer purchasing behavior, product reorders, and basket composition. Using the relational data from multiple tables, the project demonstrates advanced data cleaning, feature engineering, exploratory data analysis, and predictive modeling.

  • Dashboard example PowerBiDashboard

  • Apriori algorithm to find patterns for the product bananas (what is frequently bought together) AprioriPowerBiDashboard

Dataset

The dataset is sourced from Kaggle: Instacart Market Basket Analysis

  • These are large files therefore you must download the raw data and place the csv files into the /data/raw folder directory.

  • You will also have several prepared Power BI-ready tables summarizing product demand, user behavior, and department performance for interactive analytics exported inside the 'data' folder.

Table Description
orders.csv Customer orders, timestamps, and user info
order_products__prior.csv Products purchased in prior orders
order_products__train.csv Products purchased in train set orders (for reorder prediction)
products.csv Product IDs, names, aisle and department IDs
aisles.csv Aisle names and IDs
departments.csv Department names and IDs

Project Goals

  • Clean and merge multiple relational tables to create an integrated dataset.
  • Conduct Exploratory Data Analysis (EDA) to understand customer purchasing habits.
  • Generate insights into product reorders,basket composition, and departmental trends.
  • Build a predictive model for product reordering behavior.
  • Develop interactive dashboards for visualization and decision support (Power BI/Tableau).

Tools & Technologies

  • Python (pandas, numpy, matplotlib, seaborn, scikit-learn)
  • SQL (joins, aggregations, feature engineering)
  • Jupyter Notebook for interactive exploration
  • Power BI (or Tableau) for dashboards

Key Features / Deliverables

  • Data cleaning scripts for handling missing values, duplicates, and inconsistent timestamps.
  • Merged and enriched datasets ready for analysis.
  • Correlation matrices and summary statistics for key features.
  • Predictive modeling to identify reorder likelihood for products.
  • Interactive dashboards highlighting product, aisle, and department trends over time.

Project Structure

├── data/
│ ├── raw/          # Original Instacart CSV files downloaded here (not tracked/large file size)
│ └── clean/        # Cleaned and merged dataset (not tracked/large file size)
├── images/         # Screenshots
├── notebooks/      # Jupyter notebooks for analysis
├── scripts/        # Python scripts for ETL and modeling
├── reports/        # Charts, plots, and dashboard
└── README.md       # Project documentation

Featured Report Images

Screenshot6 Screenshot1 Screenshot2 Screenshot5 Screenshot4 Screenshot3

About

Engineered an end-to-end E-Commerce BI Pipeline to process and analyze the Instacart relational dataset. Developed a structured ETL workflow—from data ingestion and cleaning to predictive modeling—to identify high-probability reorder patterns and supply chain optimization opportunities.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages