This project focuses on predicting customer churn for a telecommunications company using machine learning techniques. The dataset used is sourced from Kaggle's "Telco Customer Churn" dataset. The project involves several key steps:
- Data Exploration and Preprocessing: Handling missing values, encoding categorical variables, and feature scaling.
- Feature Engineering and Selection: Creating new features and selecting the most relevant ones using techniques like Recursive Feature Elimination (RFE) and feature importance from ensemble models.
- Model Development and Evaluation: Testing various models including Logistic Regression, Random Forest, Gradient Boosting, LightGBM, and XGBoost. The best performing model was LightGBM, with a precision of 65%, recall of 55%, and an AUC of 0.85.
- Implementation: Developing a user-friendly web application using Streamlit to input customer data and predict churn likelihood.
The project demonstrates the practical application of machine learning in addressing business problems, with a focus on enhancing customer retention strategies.
Technologies Used: Python, Pandas, Scikit-learn, LightGBM, SHAP, Streamlit.