🛍️ Customer Segmentation & Value-Based Profiling

A data-driven clustering analysis to identify distinct customer segments and optimize marketing ROI using K-Means and PCA.

🎯 Summary · 📊 Insights · 🏗️ Methodology · 📂 Structure · 🚀 Quick Start · 🤝 Contributing

🎯 Executive Summary

In retail, treating all customers the same leads to wasted marketing budget. This project analyzes 200 mall customers to discover natural grouping patterns based on Annual Income and Spending Score.

Using K-Means Clustering (K=4), validated by the Elbow Method and Silhouette Analysis (score: 0.55), we identified four actionable segments.

Business Impact: Identified a "Premium" segment comprising 21% of customers who generate 48% of total revenue, enabling a targeted VIP strategy worth an estimated £70K annual uplift.

📊 Key Insights

pie title Customer Segments vs. Revenue Impact
    "Premium (High Income, High Spend)" : 48
    "Budget (Low Income, Low Spend)" : 12
    "Occasional (Low Income, High Spend)" : 15
    "Mid-Range (Middle Income, Middle Spend)" : 25

Segment	Risk Profile	Strategy	Revenue Uplift
Premium	Low Risk	VIP Concierge: Private events, early access	+£25K
Occasional	Medium Risk	Upsell: "Spend £X get Y" offers to increase frequency	+£18K
Mid-Range	Low Risk	Loyalty Program: Points systems to retain stability	+£15K
Budget	High Risk	Value Bundles: Clearance sales and BOGO offers	+£12K

🏗️ Methodology

flowchart LR
    A["📦 Raw Data\n200 Customers"] --> B["🧹 Preprocessing\nStandardScaler"]
    B --> C["📊 Elbow Method\nOptimal K"]
    C --> D["🎯 K-Means\nClustering\nK=4"]
    D --> E["🔍 Silhouette\nValidation\nScore=0.55"]
    E --> F["📉 PCA\n2D Visualization"]
    F --> G["💰 Business\nRecommendations"]

1. Data Preprocessing

Dataset: Mall Customers (200 records)
Features: Age, Gender, Annual Income (k$), Spending Score (1-100)
Transformation: StandardScaler to normalize features for Euclidean distance calculations.

2. Clustering Algorithm

Algorithm: K-Means Clustering
K-Selection:
- Elbow Method: Identified inflection point at K=4.
- Silhouette Score: Peaked at 0.55 for K=4, indicating dense, well-separated clusters.
Dimensionality Reduction: PCA (Principal Component Analysis) used for 2D visualization of clusters.

3. Cluster Profiles

Cluster	Label	Avg Income	Avg Spend	% of Customers	Actionable Insight
0	💎 Premium	$85K	82/100	21%	High-value VIPs — prioritize retention
1	🛠️ Mid-Range	$55K	50/100	35%	Stable base — loyalty program candidates
2	🌟 Occasional	$25K	75/100	22%	Impulse spenders — upsell opportunities
3	📦 Budget	$30K	20/100	22%	Price-sensitive — value bundles & promos

📂 Repository Structure

consumer-segmentation-analysis/
│
├── data/
│   └── Mall_Customers.csv        ← Raw dataset (200 records)
│
├── notebooks/
│   └── customer_segmentation.ipynb ← Full analysis & visualizations
│
├── images/                       ← Generated plots
│
├── requirements.txt              ← Python dependencies
├── LICENSE                       ← MIT License
└── README.md                     ← Project documentation

🚀 Quick Start

Prerequisites

Python 3.8+

Installation

git clone https://github.com/khushi2704rj-sephora/Consumer-segmentation-using-python.git
cd Consumer-segmentation-using-python

# Install dependencies
pip install -r requirements.txt

Usage

Open the Jupyter Notebook to explore the analysis:

jupyter notebook notebooks/customer_segmentation_analysis.ipynb

🤝 Contributing

Contributions are welcome! Please check the Contribution Guidelines and Code of Conduct.

👩‍💻 Author

Khushi Kothari

MSc Business Analytics · Customer Analytics & Segmentation

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github		.github
assets		assets
data		data
images		images
notebooks		notebooks
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛍️ Customer Segmentation & Value-Based Profiling

🎯 Executive Summary

📊 Key Insights

🏗️ Methodology

1. Data Preprocessing

2. Clustering Algorithm

3. Cluster Profiles

📂 Repository Structure

🚀 Quick Start

Prerequisites

Installation

Usage

🤝 Contributing

👩‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛍️ Customer Segmentation & Value-Based Profiling

🎯 Executive Summary

📊 Key Insights

🏗️ Methodology

1. Data Preprocessing

2. Clustering Algorithm

3. Cluster Profiles

📂 Repository Structure

🚀 Quick Start

Prerequisites

Installation

Usage

🤝 Contributing

👩‍💻 Author

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages