This project performs customer segmentation using RFM (Recency, Frequency, Monetary) analysis on a realistic e-commerce transactional dataset.
The objective is to identify high-value customers, loyal customers, and at-risk customers to support data-driven marketing and retention strategies.
The project follows a complete end-to-end analytics workflow, starting from raw transactional data to actionable business insights and exportable customer segments.
- Segment customers based on purchasing behavior
- Identify high-value and loyal customers
- Detect at-risk and inactive customers
- Translate analytical insights into actionable marketing strategies
- Provide marketing-ready customer segment exports
Language: Python
Environment: Google Colab (Cloud-based Jupyter Notebook)
Libraries Used:
- pandas
- numpy
- matplotlib
- seaborn
Type: Synthetic but realistic e-commerce transaction data
Records: ~8,000 transaction-level rows
- Multiple transactions per customer
- Missing customer IDs (guest checkouts)
- Negative quantities (returns)
- Noisy pricing and transactional logs
InvoiceNoInvoiceDateCustomerIDCountryQuantityUnitPriceTotalPrice
The dataset was intentionally designed to be messy and realistic, requiring data cleaning before analysis—similar to real-world business datasets.
- Removed transactions with missing
CustomerID - Filtered out returns and invalid transactions (negative quantity or price)
- Removed duplicate records
- Recomputed total transaction value after cleaning
- Defined a snapshot date for accurate recency calculation
For each customer:
- Recency: Days since last purchase
- Frequency: Number of unique purchases
- Monetary: Total spending
Customers were scored using quantile-based RFM scoring (1–5) and combined into a composite RFM score.
Customers were grouped into meaningful business segments:
- Champions
- Loyal Customers
- Potential Loyalists
- At Risk
- Lost Customers
Segmentation was validated using:
- RFM score distributions
- Heatmaps
- 3D RFM visualizations
- RFM Heatmap (Recency vs Frequency vs Monetary)
- 3D Scatter Plot of customer clusters
- Segment-wise customer distribution
- Monetary value comparison across segments
Each customer segment was mapped to targeted marketing actions:
- Champions: Loyalty rewards and premium offers
- Loyal Customers: Upsell and engagement campaigns
- At Risk: Win-back campaigns
- Lost Customers: Selective re-engagement or suppression
- Segmented customer dataset exported as CSV
- Optional segment-wise customer lists for marketing teams
This project demonstrates how RFM analysis converts raw transactional data into actionable customer insights.
The results highlight the importance of customer retention, personalized marketing, and targeted engagement strategies to maximize customer lifetime value and optimize marketing ROI.
- Time-based RFM (monthly or quarterly snapshots)
- Integration with clustering models (K-Means)
- Campaign impact and churn prediction analysis
- Dashboard development (Power BI / Tableau)
The entire project was developed and executed using Google Colab, ensuring reproducibility and cloud-based accessibility.