Skip to content

ParasJain03/customer-churn-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

31 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Customer Churn Analysis using SQL, Power BI & Machine Learning

πŸš€ End-to-End Data Analytics Project | SQL | Power BI | Machine Learning

SQL PowerBI Python Scikit-Learn

An end-to-end Data Analytics project that analyzes telecom customer churn and predicts customers likely to leave using SQL Server, Power BI, and Machine Learning (Random Forest).

This project demonstrates a complete data pipeline from raw data ingestion to predictive insights, helping businesses improve customer retention strategies.


πŸš€ Project Overview

Customer churn is a major challenge for telecom companies. Retaining customers is often more cost-effective than acquiring new ones.

This project builds a complete churn analytics system that:

βœ” Processes raw telecom data using SQL ETL βœ” Builds interactive dashboards in Power BI βœ” Trains a machine learning model to predict churn βœ” Identifies high-risk customers for retention campaigns


🎯 Project Objectives

  • Analyze historical churn behavior
  • Identify key factors influencing churn
  • Build predictive models for churn detection
  • Provide actionable insights through dashboards

πŸ”„ End-to-End Analytics Pipeline

Raw Dataset
     ↓
SQL Server ETL Pipeline
     ↓
Cleaned Analytical Dataset
     ↓
Power BI Dashboard
     ↓
Machine Learning Model
     ↓
Predicted Churn Customers

πŸ—οΈ System Architecture

Raw Dataset (CSV)
      ↓
SQL Server Database
      ↓
Staging Table (stg_Churn)
      ↓
Data Cleaning & Transformation
      ↓
Production Table (prod_Churn)
      ↓
SQL Views (vw_ChurnData, vw_JoinData)
      ↓
Power BI Dashboard
      ↓
Machine Learning Model (Random Forest)
      ↓
Predicted Churn Customers

πŸ“Š Dashboard Preview

Customer Churn Analysis Dashboard

This dashboard provides insights into:

  • Total Customers
  • Churn Rate
  • Customer Demographics
  • Service Usage
  • Geographic churn distribution

Churn Reason Analysis

Key churn drivers identified:

  • Network reliability issues
  • High service charges
  • Limited service availability
  • Poor self-service experience

Churn Prediction Dashboard

The machine learning model predicts customers most likely to churn, enabling proactive retention strategies.


βš™οΈ ETL Pipeline (SQL Server)

The ETL pipeline performs the following tasks.

1️⃣ Data Ingestion

Raw telecom dataset imported into staging table:

stg_Churn

2️⃣ Data Cleaning

Missing values were handled using SQL transformations such as:

ISNULL(Value_Deal, 'None')

3️⃣ Production Dataset

Cleaned data is stored in:

prod_Churn

4️⃣ Analytical Views

Two analytical views were created:

vw_ChurnData
vw_JoinData

These views are used for:

  • Power BI analytics
  • Machine learning model training

πŸ€– Machine Learning Model

A Random Forest Classification Model is used to predict churn probability.

Model Pipeline

  1. Data preprocessing
  2. Encoding categorical variables
  3. Train-test split (80/20)
  4. Model training
  5. Model evaluation

πŸ“Š Model Performance

Metric Score
Accuracy 88%
Precision High
Recall High
F1 Score Balanced

πŸ“ˆ Feature Importance

Important churn drivers identified by the model:

  • Contract type
  • Customer tenure
  • Monthly charges
  • Internet service usage

These insights help businesses design better retention strategies.


πŸ’Ό Business Value

This project helps telecom companies:

  • Identify high-risk customers
  • Design targeted retention campaigns
  • Improve service quality
  • Reduce customer churn

πŸ› οΈ Technology Stack

Technology Purpose
SQL Server Data storage & ETL pipeline
Power BI Data visualization & dashboards
Python Machine learning model
Pandas Data preprocessing
NumPy Data manipulation
Scikit-Learn Random Forest model
Jupyter Notebook Model development

πŸ“‚ Project Structure

customer-churn-analysis
β”‚
β”œβ”€β”€ dashboard
β”‚   └── churn_dashboard.pbix
β”‚
β”œβ”€β”€ dashboard_Images
β”‚   β”œβ”€β”€ churn_analysis.png
β”‚   β”œβ”€β”€ churn_prediction.png
β”‚   └── churn_reason.png
β”‚
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ raw
β”‚   β”‚   └── Customer_Data.csv
β”‚   β”‚
β”‚   β”œβ”€β”€ processed
β”‚   β”‚   └── Prediction_Data.xlsx
β”‚   β”‚
β”‚   └── predictions
β”‚       └── Predictions.csv.xlsx
β”‚
β”œβ”€β”€ notebooks
β”‚   └── churn_prediction.ipynb
β”‚
β”œβ”€β”€ sql
β”‚   └── churn_etl.sql
β”‚
β”œβ”€β”€ doc
β”‚   └── project_architecture.md
β”‚
└── README.md

πŸš€ How to Run the Project

1️⃣ Setup SQL Database

Run the SQL script:

sql/churn_etl.sql

2️⃣ Open Power BI Dashboard

Open the Power BI file:

dashboard/churn_dashboard.pbix

3️⃣ Train Machine Learning Model

Run the Jupyter notebook:

notebooks/churn_prediction.ipynb

πŸ“Œ Future Improvements

Possible enhancements include:

  • Deploy ML model as an API
  • Automate ETL pipelines
  • Implement real-time churn prediction
  • Cloud deployment using AWS / Azure

πŸ‘¨β€πŸ’» Author

Paras Jain B.Tech CSE (Artificial Intelligence) KIET Group of Institutions


πŸ”— Connect With Me

GitHub https://github.com/ParasJain03

LinkedIn https://www.linkedin.com/in/paras-jain-9b4a4023b/

About

End-to-End Customer Churn Analysis using SQL ETL, Power BI Dashboard, and Machine Learning (Random Forest) for churn prediction.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors