A machine learning project that predicts the most sold product category based on consumer behavior and shopping patterns.
- Overview
- Problem Statement
- Objective
- Dataset
- Features
- Installation
- Usage
- Model Performance
- Technologies Used
- Project Structure
- Results
- Future Improvements
- Contributing
- License
- Contact
This project analyzes consumer behavior and shopping habits to predict product category sales. By leveraging machine learning algorithms, we can help businesses understand customer preferences and optimize their sales strategies based on demographics, purchase behavior, and promotional effectiveness.
The consumer behavior and shopping dataset contains various features such as customer demographics, purchase behavior, and product details. It provides comprehensive insights into consumers' preferences, tendencies, and patterns during their shopping experiences.
This dataset aims to provide an understanding of consumer behavior in purchasing products according to their ages and gender, including the impact of promotional ads and subscription plans. Developing an effective prediction model can enhance sales strategies in the market.
To develop a machine learning model that predicts the most sold product category based on customer demographics, shopping patterns, and promotional factors.
Source: Kaggle - Consumer Behavior and Shopping Habits Dataset
Dataset Statistics:
- Total Records: 3,900 rows
- Total Features: 18 columns
- Data Types: Mixed (Numerical and Categorical)
| Feature | Type | Description |
|---|---|---|
| Customer ID | Numerical | Unique identifier for each customer |
| Age | Numerical | Customer's age |
| Gender | Categorical | Customer's gender |
| Item Purchased | Categorical | Specific item bought |
| Category | Categorical | Product category (Target Variable) |
| Purchase Amount (USD) | Numerical | Transaction amount in USD |
| Location | Categorical | Customer's location |
| Size | Categorical | Product size |
| Color | Categorical | Product color |
| Season | Categorical | Season of purchase |
| Review Rating | Numerical | Customer rating (2.5-5.0) |
| Subscription Status | Categorical | Active subscription (Yes/No) |
| Shipping Type | Categorical | Delivery method |
| Discount Applied | Categorical | Discount status (Yes/No) |
| Promo Code Used | Categorical | Promo code usage (Yes/No) |
| Previous Purchases | Numerical | Number of past purchases |
| Payment Method | Categorical | Payment type used |
| Frequency of Purchases | Categorical | Purchase frequency pattern |
- Python 3.8 or higher
- pip package manager
- Clone the repository
git clone https://github.com/Devikrishna545/CustomerBehaviour_ShoppingAnalysis.git
cd CustomerBehaviour_ShoppingAnalysis- Create a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install required packages
pip install -r requirements.txtpandas
numpy
matplotlib
seaborn
scikit-learn
jupyter
- Launch Jupyter Notebook
jupyter notebook- Open the main notebook
E-Commerce Dataset ML_New.ipynb
- Run all cells sequentially to:
- Load and explore the dataset
- Perform data preprocessing
- Conduct feature selection
- Train the model
- Evaluate model performance
-
Feature Selection:
- Random Forest Classifier
- Lasso Regression
-
Classification Model:
- Random Forest Classifier
- Accuracy: 84.5%
- Model: Random Forest Classifier
- The model successfully predicts product categories with high accuracy
- Random Forest proved effective for handling mixed data types
- Feature selection improved model performance and interpretability
- Programming Language: Python 3.8+
- Data Analysis: Pandas, NumPy
- Visualization: Matplotlib, Seaborn
- Machine Learning: Scikit-learn
- Development Environment: Jupyter Notebook
CustomerBehaviour_ShoppingAnalysis/
│
├── E-Commerce Dataset ML_New.ipynb # Main analysis notebook
├── shopping_behavior.csv # Dataset (not included in repo)
├── README.md # Project documentation
├── requirements.txt # Python dependencies
├── .gitignore # Git ignore file
│
└── results/ # (Optional) Model outputs and visualizations
├── models/ # Saved models
└── figures/ # Generated plots
The project successfully:
- ✅ Analyzed 3,900 customer records across 18 features
- ✅ Identified key factors influencing category purchases
- ✅ Achieved 84.5% prediction accuracy
- ✅ Provided actionable insights for sales strategy optimization
- Category distribution shows Clothing as the most purchased category (1,737 items)
- Customer demographics significantly influence purchase patterns
- Promotional factors (discounts, promo codes) impact buying behavior
- Subscription status correlates with purchase frequency
- Implement additional classification algorithms (XGBoost, LightGBM)
- Perform hyperparameter tuning for better accuracy
- Add cross-validation for robust model evaluation
- Create interactive visualizations with Plotly/Dash
- Deploy model as a web application
- Incorporate time-series analysis for seasonal trends
- Add customer segmentation analysis
- Implement recommendation system
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Devikrishna545
- GitHub: @Devikrishna545
- Project Link: https://github.com/Devikrishna545/CustomerBehaviour_ShoppingAnalysis
- Dataset provided by Kaggle - zeesolver
- Inspiration from e-commerce analytics and consumer behavior research
- Open-source community for excellent ML libraries
Note: Download the dataset from the Kaggle link provided above and place it in the project root directory before running the notebook.
⭐ If you find this project helpful, please consider giving it a star!