Credit Card Behaviour Score Prediction Using Classification and Risk-Based Techniques

Overview

This project aims to enhance the credit risk management system of a bank by predicting the likelihood of a credit card customer defaulting on their payment in the following month. Using anonymized historical behavioral data from over 30,000 customers, we build a binary classification model that flags potential defaulters early, enabling proactive risk mitigation strategies.

Objective

Build a classification model to predict next_month_default (1 = Default, 0 = No Default).
Perform exploratory and financial analysis to identify behavioral patterns associated with credit default.
Handle class imbalance using techniques like SMOTE, class weights, or undersampling.
Engineer meaningful features (e.g., utilization ratio, delinquency streaks).
Evaluate models using risk-sensitive metrics such as F1, F2, and ROC-AUC.
Generate production-ready predictions on an unlabeled validation set.

Dataset Description

Training Data: ~25,000 records with features like:
- LIMIT_BAL: Credit limit
- sex, age, education, marriage: Demographic info
- PAY_0 to PAY_6: Repayment status history
- BILL_AMT1 to BILL_AMT6: Monthly billed amounts
- PAY_AMT1 to PAY_AMT6: Monthly payments
- next_month_default: Target variable
Validation Data: ~5,000 records with the same features but no labels.

Key Features Engineered

AVG_Bill_amt: Average of all billed amounts over 6 months
PAY_TO_BILL_ratio: Total payment divided by total billed amount
Delinquency Streak: Consecutive months of overdue payments
Utilization Ratio: Ratio of total bill amount to credit limit

Modeling Techniques

We compared multiple models to identify the best-performing one:

Logistic Regression
Decision Tree Classifier
XGBoost
LightGBM

Evaluation Strategy

Since real-world credit risk decisions prioritize recall over precision, metrics were chosen accordingly:

Primary Metrics: F1 Score, F2 Score, ROC-AUC
Threshold Tuning: The decision threshold was adjusted to balance business implications of false positives (unnecessary alerts) vs. false negatives (missed defaulters).

Business Implications

Accurate prediction of customer defaults helps in:

Reducing credit losses
Designing early-warning systems
Optimizing credit exposure
Improving risk-based pricing and customer segmentation

What we achieved:

Jupyter Notebook with:
- Data processing
- Exploratory data analysis (EDA)
- Financial insights
- Feature engineering
- Model training and evaluation
- Handling Class Imbalance
- Final predictions

Tools and Libraries Used

Python
pandas, numpy
matplotlib, seaborn
scikit-learn, imbalanced-learn
xgboost, lightgbm

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Credit_Risk_Siddh_Singhal_23115142_Final_1.ipynb		Credit_Risk_Siddh_Singhal_23115142_Final_1.ipynb
README.md		README.md
train_dataset_final1.csv		train_dataset_final1.csv
validate_dataset_final.csv		validate_dataset_final.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Card Behaviour Score Prediction Using Classification and Risk-Based Techniques

Overview

Objective

Dataset Description

Key Features Engineered

Modeling Techniques

Evaluation Strategy

Business Implications

What we achieved:

Tools and Libraries Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Credit Card Behaviour Score Prediction Using Classification and Risk-Based Techniques

Overview

Objective

Dataset Description

Key Features Engineered

Modeling Techniques

Evaluation Strategy

Business Implications

What we achieved:

Tools and Libraries Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages