This project aims to enhance the credit risk management system of a bank by predicting the likelihood of a credit card customer defaulting on their payment in the following month. Using anonymized historical behavioral data from over 30,000 customers, we build a binary classification model that flags potential defaulters early, enabling proactive risk mitigation strategies.
- Build a classification model to predict
next_month_default(1 = Default, 0 = No Default). - Perform exploratory and financial analysis to identify behavioral patterns associated with credit default.
- Handle class imbalance using techniques like SMOTE, class weights, or undersampling.
- Engineer meaningful features (e.g., utilization ratio, delinquency streaks).
- Evaluate models using risk-sensitive metrics such as F1, F2, and ROC-AUC.
- Generate production-ready predictions on an unlabeled validation set.
- Training Data: ~25,000 records with features like:
LIMIT_BAL: Credit limitsex,age,education,marriage: Demographic infoPAY_0toPAY_6: Repayment status historyBILL_AMT1toBILL_AMT6: Monthly billed amountsPAY_AMT1toPAY_AMT6: Monthly paymentsnext_month_default: Target variable
- Validation Data: ~5,000 records with the same features but no labels.
- AVG_Bill_amt: Average of all billed amounts over 6 months
- PAY_TO_BILL_ratio: Total payment divided by total billed amount
- Delinquency Streak: Consecutive months of overdue payments
- Utilization Ratio: Ratio of total bill amount to credit limit
We compared multiple models to identify the best-performing one:
- Logistic Regression
- Decision Tree Classifier
- XGBoost
- LightGBM
Since real-world credit risk decisions prioritize recall over precision, metrics were chosen accordingly:
- Primary Metrics: F1 Score, F2 Score, ROC-AUC
- Threshold Tuning: The decision threshold was adjusted to balance business implications of false positives (unnecessary alerts) vs. false negatives (missed defaulters).
Accurate prediction of customer defaults helps in:
- Reducing credit losses
- Designing early-warning systems
- Optimizing credit exposure
- Improving risk-based pricing and customer segmentation
- Jupyter Notebook with:
- Data processing
- Exploratory data analysis (EDA)
- Financial insights
- Feature engineering
- Model training and evaluation
- Handling Class Imbalance
- Final predictions
- Python
pandas,numpymatplotlib,seabornscikit-learn,imbalanced-learnxgboost,lightgbm