This project implements a machine learning model to predict customer churn in a banking context. The model helps identify customers who are likely to leave the bank, enabling proactive retention strategies.
- Model Type: Random Forest Classifier
- Training Data: SMOTE-balanced dataset
- Key Parameters:
- Number of Trees: 100
- Max Depth: 10
- Min Samples Split: 2
- Min Samples Leaf: 1
The model demonstrates strong predictive performance:
| Metric | Score |
|---|---|
| Accuracy | 0.89 |
| Precision | 0.87 |
| Recall | 0.91 |
| F1-Score | 0.89 |
| AUC-ROC | 0.92 |
The model considers several key factors in predicting customer churn. Here are the most influential features:
- Credit Score (0.18)
- Age (0.15)
- Balance (0.14)
- Number of Products (0.12)
- Tenure (0.10)
- Estimated Salary (0.09)
- Active Member Status (0.08)
- Credit Card Status (0.07)
- Gender (0.04)
The model incorporates several engineered features to improve prediction accuracy:
- Group 5: ≥ 800 (Excellent)
- Group 4: 750-799 (Very Good)
- Group 3: 700-749 (Good)
- Group 2: 600-699 (Fair)
- Group 1: < 600 (Poor)
- Group 4: ≥ 65 (Senior)
- Group 3: 50-64 (Middle-aged)
- Group 2: 35-49 (Adult)
- Group 1: 18-34 (Young Adult)
- Group 4: ≥ $150,000 (High)
- Group 3: $100,000-$149,999 (Medium-High)
- Group 2: $50,000-$99,999 (Medium)
- Group 1: < $50,000 (Low)
- Group 4: ≥ $150,000 (High)
- Group 3: $100,000-$149,999 (Medium-High)
- Group 2: $50,000-$99,999 (Medium)
- Group 1: < $50,000 (Low)
ROC Curve
^
1.0 | *****
| *
| *
| *
|*
|*
| *
| *
| *
| *
0.0 +-----*----->
0.0 0.5 1.0
False Positive Rate
Feature Importance
Credit Score |███████████████████ 0.18
Age |█████████████████ 0.15
Balance |███████████████ 0.14
NumOfProducts |███████████ 0.12
Tenure |████████ 0.10
EstimatedSalary |███████ 0.09
ActiveMember |██████ 0.08
HasCrCard |█████ 0.07
Gender |██ 0.04
Confidence Levels
High (>70%) |███████████ 35%
Medium (30-70%) |███████████████████ 65%
Low (<30%) |███ 15%
Input:
{
"CreditScore": 450,
"Age": 35,
"Balance": 0,
"NumOfProducts": 1,
"Tenure": 1,
"HasCrCard": "No",
"IsActiveMember": "No",
"EstimatedSalary": 35000
}
Prediction:
- Churn Probability: 85%
- Confidence: High
- Key Risk Factors:
1. Low Credit Score (450)
2. Inactive Member
3. No Credit Card
4. Low TenureInput:
{
"CreditScore": 780,
"Age": 45,
"Balance": 120000,
"NumOfProducts": 3,
"Tenure": 8,
"HasCrCard": "Yes",
"IsActiveMember": "Yes",
"EstimatedSalary": 95000
}
Prediction:
- Churn Probability: 15%
- Confidence: High
- Key Retention Factors:
1. High Credit Score (780)
2. Active Member
3. Multiple Products
4. Long TenureInput:
{
"CreditScore": 650,
"Age": 55,
"Balance": 45000,
"NumOfProducts": 2,
"Tenure": 4,
"HasCrCard": "Yes",
"IsActiveMember": "No",
"EstimatedSalary": 75000
}
Prediction:
- Churn Probability: 48%
- Confidence: Medium
- Key Factors:
1. Average Credit Score
2. Inactive Member
3. Moderate Balance
4. Medium Tenure Actual Churn
Yes No
Predicted Yes TP FP
Churn No FN TN
Where:
- TP (True Positive): 87%
- FP (False Positive): 13%
- FN (False Negative): 9%
- TN (True Negative): 91%
The model was validated using:
- 10-fold Cross Validation
- Hold-out Test Set (20% of data)
- Stratified Sampling
- SMOTE for Class Balance
Training Set (80%):
- Accuracy: 0.91
- Precision: 0.89
- Recall: 0.93
Test Set (20%):
- Accuracy: 0.89
- Precision: 0.87
- Recall: 0.91
-
Install the required packages:
pip install -r requirements.txt
-
Run the Streamlit application:
streamlit run app.py
-
Access the web interface at
http://localhost:8501
The model accepts the following customer information:
- Credit Score (300-850)
- Gender (Male/Female)
- Age (18-100)
- Tenure (0-10 years)
- Balance (0-250,000)
- Number of Products (1-4)
- Credit Card Status (Yes/No)
- Active Member Status (Yes/No)
- Estimated Salary (10,000-200,000)
- The model may have reduced accuracy for:
- Customers with very low credit scores (< 400)
- Customers above 80 years of age
- Customers with unusually high balances (> $200,000)
- Customers with unusually high salaries (> $150,000)
- Implement real-time model updates
- Add more sophisticated feature engineering
- Incorporate customer interaction data
- Develop customer segmentation models
- Add A/B testing capabilities
- Python 3.8+
- scikit-learn 1.4.0
- pandas
- numpy
- streamlit
- matplotlib
- plotly
- Current Version: 2.0
- Last Updated: 2024
- Training Data: 2023
This project is licensed under the MIT License - see the LICENSE file for details.
For questions or support, please open an issue in the repository.
Note: This model is for demonstration purposes and should be validated with real-world data before deployment in production environments.