A comprehensive data analytics project analyzing customer shopping trends and behaviors using Python and SQL. This project explores purchasing patterns, customer segmentation, and key business insights from customer transaction data.
This project analyzes a dataset of 3,900+ customer shopping transactions to uncover valuable insights about customer behavior, purchase patterns, and revenue drivers. The analysis combines data cleaning, exploratory data analysis (EDA), and SQL-based business intelligence queries to answer critical business questions.
File: customer_shopping_behavior.csv
The dataset contains 3,901 customer transaction records with the following features:
- Customer ID: Unique identifier for each customer
- Age: Customer age
- Gender: Customer gender (Male/Female)
- Item Purchased: Product name
- Category: Product category (Clothing, Footwear, etc.)
- Purchase Amount (USD): Transaction amount in USD
- Location: Customer location (US states)
- Size: Product size (S, M, L, XL)
- Color: Product color
- Season: Season of purchase (Winter, Spring, Summer, Fall)
- Review Rating: Product review rating (1-5 scale)
- Subscription Status: Whether customer has a subscription (Yes/No)
- Shipping Type: Shipping method (Express, Standard, Free Shipping, Next Day Air)
- Discount Applied: Whether discount was applied (Yes/No)
- Promo Code Used: Whether promo code was used (Yes/No)
- Previous Purchases: Number of previous purchases
- Payment Method: Payment method used (Credit Card, PayPal, Venmo, Cash, etc.)
- Frequency of Purchases: Purchase frequency (Weekly, Fortnightly, Monthly, Quarterly, Annually)
-
Python 3.x
- pandas - Data manipulation and analysis
- SQLAlchemy - Database connection and ORM
- PyMySQL - MySQL database driver
-
MySQL - Database management and SQL queries
-
Jupyter Notebook - Interactive data analysis environment
-
Power BI - Interactive dashboard for data visualization
Customer Trends Data Analysis/
β
βββ customer_shopping_behavior.csv # Raw dataset
βββ Customer_Shopping_Behaviour_Analysis.ipynb # Python analysis notebook
βββ Customer_Shopping_Behavior.sql # SQL queries for business insights
βββ Customer_Behavior_Dashboard.pbix # Power BI dashboard file
βββ README.md # Project documentation
- Data Loading: Imported dataset using pandas
- Missing Values: Handled missing
Review Ratingvalues by filling with median rating per product category - Column Standardization:
- Converted column names to lowercase
- Replaced spaces with underscores
- Renamed
Purchase Amount (USD)topurchase_amount
- Feature Engineering:
- Created
age_groupfeature using quartile-based segmentation (Young Adult, Adult, Middle Aged, Senior) - Created
purchase_frequency_daysto convert frequency categories to numeric days
- Created
- Data Quality:
- Identified and removed duplicate column (
promo_code_usedidentical todiscount_applied)
- Identified and removed duplicate column (
- Database Export: Loaded cleaned data into MySQL database for SQL analysis
The SQL analysis explores 10 key business questions:
- Gender-based Revenue: Total revenue comparison between male and female customers
- Discount Impact: Identified high-value customers who used discounts but still spent above average
- Top-Rated Products: Top 5 products with highest average review ratings
- Discount Usage by Product: Products with highest discount usage percentage
- Shipping Cost Analysis: Average purchase amounts for Standard vs. Express shipping
- Subscription Impact: Average spend and total revenue comparison between subscribers and non-subscribers
- Customer Segmentation: Classified customers as New (1 purchase), Returning (2-10), or Loyal (>10 previous purchases)
- Repeat Buyer Subscription: Correlation between repeat buyers and subscription status
- Top Products by Category: Top 3 most purchased products within each category
- Age Group Revenue: Revenue contribution analysis across different age groups
An interactive Power BI dashboard (Customer_Behavior_Dashboard.pbix) visualizes key metrics and insights:
Dashboard Features:
-
Key Metrics Cards:
- Total Customers: 3.9K
- Average Purchase Amount: $59.76
- Average Review Rating: 3.75
-
Subscription Analysis: Pie chart showing 27% subscribers vs 73% non-subscribers
-
Revenue Insights:
- Revenue by Category (bar chart)
- Revenue by Age Group (horizontal bar chart)
-
Sales Performance:
- Sales by Category breakdown
- Sales by Age Group distribution
-
Interactive Filters:
- Subscription Status (Yes/No)
- Gender (Female/Male)
- Category (Accessories, Clothing, Footwear, Outerwear)
- Shipping Type (multiple options)
The dashboard provides real-time insights for business decision-making and trend analysis.
The analysis provides insights into:
- Customer spending patterns by demographics
- Impact of discounts and subscriptions on revenue
- Product performance and customer satisfaction
- Shopping frequency and customer loyalty
- Shipping preferences and their correlation with spending
- Age group preferences and revenue contribution by category
# Python packages
pip install pandas sqlalchemy pymysql
# MySQL Server (ensure MySQL is installed and running)-
Clone or download the project files
-
Prepare the Database:
CREATE DATABASE customer_analytics;
-
Update Database Credentials:
- Open
Customer_Shopping_Behaviour_Analysis.ipynb - Update the following variables in the notebook:
username = 'your_username' password = 'your_password' host = 'localhost' port = '3306' database = 'customer_analytics'
- Open
-
Run the Analysis:
- Execute the Jupyter notebook cells sequentially to:
- Clean and prepare the data
- Load data into MySQL
- Run SQL queries in
Customer_Shopping_Behavior.sqlfor business insights
- Execute the Jupyter notebook cells sequentially to:
jupyter notebook Customer_Shopping_Behaviour_Analysis.ipynbExecute the queries in Customer_Shopping_Behavior.sql using:
- MySQL Workbench
- MySQL command line
- Any SQL client connected to your MySQL database
-
Install Power BI Desktop (if not already installed)
- Download from Microsoft Power BI
-
Open the Dashboard:
- Double-click
Customer_Behavior_Dashboard.pbix - Or open Power BI Desktop and select File β Open β Browse to the file
- Double-click
-
Interact with the Dashboard:
- Use filters on the left panel to segment data
- Hover over visualizations for detailed tooltips
- Click on chart elements to cross-filter other visuals
-- Total revenue by gender
SELECT gender, SUM(purchase_amount) AS revenue
FROM shopping_behavior
GROUP BY gender;
-- Top 5 products by average rating
SELECT item_purchased, ROUND(AVG(review_rating),2) AS avg_rating
FROM shopping_behavior
GROUP BY item_purchased
ORDER BY avg_rating DESC
LIMIT 5;Contributions, issues, and feature requests are welcome! Feel free to check issues page if you want to contribute.
This project is available for educational and analytical purposes.
Vineet Patel
- Email: vineetpatel468@gmail.com
- GitHub: @vineet416
- LinkedIn: @vineet416
For any questions or feedback regarding this project, please feel free to reach out.
This project demonstrates end-to-end data analytics skills including data cleaning, feature engineering, database management, and SQL-based business intelligence.
