This project performs an end-to-end exploratory data analysis (EDA) on a realistic e-commerce sales dataset to uncover key business insights related to sales performance, customer behavior, product trends, and regional contribution.
The analysis follows an industry-style analytics workflow, starting from data loading and cleaning to advanced customer analytics and insightful visualizations.
- Analyze overall sales performance and key KPIs
- Identify time-based sales trends (monthly & quarterly)
- Evaluate product-wise and category-wise performance
- Understand customer purchasing behavior
- Identify high-value customers using revenue contribution
- Apply Pareto (80/20) analysis for business insights
Language: Python
Environment: Google Colab
Libraries Used:
- pandas
- numpy
- matplotlib
- seaborn
Type: Synthetic but realistic e-commerce transaction data
Records: ~5,000 transactions
- Order date
- Customer ID & segment
- Product category & product name
- Quantity & unit price
- Sales amount
- Region & payment mode
The dataset was programmatically generated to closely resemble real-world e-commerce data, ensuring control over data quality and structure.
- Converted date columns to datetime format
- Verified missing values and duplicates
- Performed sanity checks on quantity and pricing
- Engineered time-based features (month, quarter, year)
- Overall KPIs (Total Sales, Orders, Customers, AOV)
- Monthly and quarterly sales trends
- Product-wise and category-wise sales analysis
- Region-wise revenue contribution
- Orders per customer distribution
- Average Order Value (AOV) distribution
- Customer segment performance
- Top 10 customers by revenue
- Pareto (80/20) revenue analysis
- Revenue concentration insights
- Sales exhibit clear seasonality with noticeable quarterly fluctuations
- A small group of customers contributes a large share of total revenue
- Certain product categories consistently outperform others
- Majority of customers are low-frequency buyers, indicating strong retention opportunities
- Quarterly sales summary (CSV)
- Product performance report (CSV)
- Region-wise sales report (CSV)
- Top customers by revenue (CSV)
This project demonstrates how data analysis transforms raw transactional data into actionable business insights.
The findings emphasize the importance of customer retention, product optimization, and targeted marketing strategies in e-commerce businesses.
- Customer segmentation using RFM analysis
- Cohort analysis for retention tracking
- Interactive dashboards (Power BI / Tableau)
The entire project was developed and executed using Google Colab, leveraging its cloud-based environment for efficient experimentation, visualization, and reproducibility.