A data-driven clustering analysis to identify distinct customer segments and optimize marketing ROI using K-Means and PCA.
🎯 Summary · 📊 Insights · 🏗️ Methodology · 📂 Structure · 🚀 Quick Start · 🤝 Contributing
In retail, treating all customers the same leads to wasted marketing budget. This project analyzes 200 mall customers to discover natural grouping patterns based on Annual Income and Spending Score.
Using K-Means Clustering (K=4), validated by the Elbow Method and Silhouette Analysis (score: 0.55), we identified four actionable segments.
Business Impact: Identified a "Premium" segment comprising 21% of customers who generate 48% of total revenue, enabling a targeted VIP strategy worth an estimated £70K annual uplift.
pie title Customer Segments vs. Revenue Impact
"Premium (High Income, High Spend)" : 48
"Budget (Low Income, Low Spend)" : 12
"Occasional (Low Income, High Spend)" : 15
"Mid-Range (Middle Income, Middle Spend)" : 25
| Segment | Risk Profile | Strategy | Revenue Uplift |
|---|---|---|---|
| Premium | Low Risk | VIP Concierge: Private events, early access | +£25K |
| Occasional | Medium Risk | Upsell: "Spend £X get Y" offers to increase frequency | +£18K |
| Mid-Range | Low Risk | Loyalty Program: Points systems to retain stability | +£15K |
| Budget | High Risk | Value Bundles: Clearance sales and BOGO offers | +£12K |
flowchart LR
A["📦 Raw Data\n200 Customers"] --> B["🧹 Preprocessing\nStandardScaler"]
B --> C["📊 Elbow Method\nOptimal K"]
C --> D["🎯 K-Means\nClustering\nK=4"]
D --> E["🔍 Silhouette\nValidation\nScore=0.55"]
E --> F["📉 PCA\n2D Visualization"]
F --> G["💰 Business\nRecommendations"]
- Dataset: Mall Customers (200 records)
- Features: Age, Gender, Annual Income (k$), Spending Score (1-100)
- Transformation:
StandardScalerto normalize features for Euclidean distance calculations.
- Algorithm: K-Means Clustering
- K-Selection:
- Elbow Method: Identified inflection point at K=4.
- Silhouette Score: Peaked at 0.55 for K=4, indicating dense, well-separated clusters.
- Dimensionality Reduction: PCA (Principal Component Analysis) used for 2D visualization of clusters.
| Cluster | Label | Avg Income | Avg Spend | % of Customers | Actionable Insight |
|---|---|---|---|---|---|
| 0 | 💎 Premium | $85K | 82/100 | 21% | High-value VIPs — prioritize retention |
| 1 | 🛠️ Mid-Range | $55K | 50/100 | 35% | Stable base — loyalty program candidates |
| 2 | 🌟 Occasional | $25K | 75/100 | 22% | Impulse spenders — upsell opportunities |
| 3 | 📦 Budget | $30K | 20/100 | 22% | Price-sensitive — value bundles & promos |
consumer-segmentation-analysis/
│
├── data/
│ └── Mall_Customers.csv ← Raw dataset (200 records)
│
├── notebooks/
│ └── customer_segmentation.ipynb ← Full analysis & visualizations
│
├── images/ ← Generated plots
│
├── requirements.txt ← Python dependencies
├── LICENSE ← MIT License
└── README.md ← Project documentation
- Python 3.8+
git clone https://github.com/khushi2704rj-sephora/Consumer-segmentation-using-python.git
cd Consumer-segmentation-using-python
# Install dependencies
pip install -r requirements.txtOpen the Jupyter Notebook to explore the analysis:
jupyter notebook notebooks/customer_segmentation_analysis.ipynbContributions are welcome! Please check the Contribution Guidelines and Code of Conduct.