Skip to content

Must024/Customer-Churn-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 

Repository files navigation

Customer Churn Prediction – Subscription-Based Business

Overview

Customer churn is one of the biggest challenges for subscription-based businesses. This project analyzes Telco customer data to identify factors that influence churn and predict customers who are most likely to leave.
Using machine learning models (Logistic Regression, Random Forest, XGBoost), I built a predictive pipeline and explained the results with SHAP values to give actionable retention strategies.


Business Problem

Losing customers is costly, acquiring a new customer can be up to 5x more expensive than retaining an existing one.
This project answers two key questions:

  1. Which customers are most likely to churn?
  2. What are the key factors driving churn?

Project Structure

├── image/
│ ├── Slide1.JPG
│ ├── Slide2.JPG
│ ├── Slide3.JPG
│ ├── Slide4.JPG
│ ├── Slide5.JPG
│ ├── Slide6.JPG
│ ├── Slide7.JPG
│ ├── Slide8.JPG
│ ├── Slide9.JPG
│ ├── Slide10.JPG
│ ├── Slide11.JPG
│ ├── Slide12.JPG
├── Customer Churn Analysis.ipynb
├── README.md

Data Source


️ Process

  1. Data Cleaning – Handled missing values, corrected data types, encoded categorical variables.
  2. Exploratory Data Analysis (EDA) – Visualized churn patterns by tenure, contract type, and monthly charges.
  3. Modeling – Built and compared Logistic Regression, Random Forest, and XGBoost models.
  4. Model Interpretation – Used SHAP values to explain model predictions.
  5. Recommendations – Developed data-driven retention strategies.

Model Performance

Model Accuracy Precision Recall F1-score
Logistic Regression 78.68% 0.619 0.513 0.561
Random Forest 78.53% 0.632 0.460 0.533
XGBoost 73.70% 0.504 0.676 0.578

Best Model:

  • For overall balanced performanceLogistic Regression
  • For high recall (catching more churners)XGBoost
  • For highest precisionRandom Forest

Key Visuals

Churn Distribution

Churn Distribution

Churn by Contract Type

Contract Type vs Churn

Model Comparison

Model Comparison

SHAP Summary Plot

SHAP Summary Plot


Key Insights from SHAP

  • Contract type is the strongest churn driver — month-to-month customers are far more likely to churn.
  • Fiber optic internet service customers churn more than DSL users.
  • Tenure is negatively correlated with churn — long-term customers are more loyal.
  • Online security & tech support lower churn probability.
  • Electronic check payment method is linked to higher churn.

Recommendations

  • Promote long-term contracts with incentives.
  • Target new customers early (first 6 months) with retention campaigns.
  • Offer discounts or bundles to high-bill customers.
  • Encourage switching to automatic payment instead of electronic checks.
  • Bundle services like online security and tech support to increase customer stickiness.

Deliverables


️ Tools & Libraries

  • Python: pandas, numpy, matplotlib, seaborn, scikit-learn, xgboost, shap
  • Jupyter Notebook
  • Google Slides (presentation)
  • GitHub (version control & portfolio hosting)

Author

Gafar Mustopha – Data Analyst
LinkedIn | GitHub | Email

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors