π End-to-End Data Analytics Project | SQL | Power BI | Machine Learning
An end-to-end Data Analytics project that analyzes telecom customer churn and predicts customers likely to leave using SQL Server, Power BI, and Machine Learning (Random Forest).
This project demonstrates a complete data pipeline from raw data ingestion to predictive insights, helping businesses improve customer retention strategies.
Customer churn is a major challenge for telecom companies. Retaining customers is often more cost-effective than acquiring new ones.
This project builds a complete churn analytics system that:
β Processes raw telecom data using SQL ETL β Builds interactive dashboards in Power BI β Trains a machine learning model to predict churn β Identifies high-risk customers for retention campaigns
- Analyze historical churn behavior
- Identify key factors influencing churn
- Build predictive models for churn detection
- Provide actionable insights through dashboards
Raw Dataset
β
SQL Server ETL Pipeline
β
Cleaned Analytical Dataset
β
Power BI Dashboard
β
Machine Learning Model
β
Predicted Churn Customers
Raw Dataset (CSV)
β
SQL Server Database
β
Staging Table (stg_Churn)
β
Data Cleaning & Transformation
β
Production Table (prod_Churn)
β
SQL Views (vw_ChurnData, vw_JoinData)
β
Power BI Dashboard
β
Machine Learning Model (Random Forest)
β
Predicted Churn Customers
This dashboard provides insights into:
- Total Customers
- Churn Rate
- Customer Demographics
- Service Usage
- Geographic churn distribution
Key churn drivers identified:
- Network reliability issues
- High service charges
- Limited service availability
- Poor self-service experience
The machine learning model predicts customers most likely to churn, enabling proactive retention strategies.
The ETL pipeline performs the following tasks.
Raw telecom dataset imported into staging table:
stg_Churn
Missing values were handled using SQL transformations such as:
ISNULL(Value_Deal, 'None')Cleaned data is stored in:
prod_Churn
Two analytical views were created:
vw_ChurnData
vw_JoinData
These views are used for:
- Power BI analytics
- Machine learning model training
A Random Forest Classification Model is used to predict churn probability.
- Data preprocessing
- Encoding categorical variables
- Train-test split (80/20)
- Model training
- Model evaluation
| Metric | Score |
|---|---|
| Accuracy | 88% |
| Precision | High |
| Recall | High |
| F1 Score | Balanced |
Important churn drivers identified by the model:
- Contract type
- Customer tenure
- Monthly charges
- Internet service usage
These insights help businesses design better retention strategies.
This project helps telecom companies:
- Identify high-risk customers
- Design targeted retention campaigns
- Improve service quality
- Reduce customer churn
| Technology | Purpose |
|---|---|
| SQL Server | Data storage & ETL pipeline |
| Power BI | Data visualization & dashboards |
| Python | Machine learning model |
| Pandas | Data preprocessing |
| NumPy | Data manipulation |
| Scikit-Learn | Random Forest model |
| Jupyter Notebook | Model development |
customer-churn-analysis
β
βββ dashboard
β βββ churn_dashboard.pbix
β
βββ dashboard_Images
β βββ churn_analysis.png
β βββ churn_prediction.png
β βββ churn_reason.png
β
βββ data
β βββ raw
β β βββ Customer_Data.csv
β β
β βββ processed
β β βββ Prediction_Data.xlsx
β β
β βββ predictions
β βββ Predictions.csv.xlsx
β
βββ notebooks
β βββ churn_prediction.ipynb
β
βββ sql
β βββ churn_etl.sql
β
βββ doc
β βββ project_architecture.md
β
βββ README.md
Run the SQL script:
sql/churn_etl.sql
Open the Power BI file:
dashboard/churn_dashboard.pbix
Run the Jupyter notebook:
notebooks/churn_prediction.ipynb
Possible enhancements include:
- Deploy ML model as an API
- Automate ETL pipelines
- Implement real-time churn prediction
- Cloud deployment using AWS / Azure
Paras Jain B.Tech CSE (Artificial Intelligence) KIET Group of Institutions


