A machine learning project to predict whether a loan applicant is a good or bad credit risk using the kaggle German Credit Dataset with Target.
To build and evaluate a classification model that predicts the creditworthiness of applicants based on demographic and financial features.
- β Decision Tree Classifier (with pre-pruning)
- β GridSearchCV for hyperparameter tuning
- Source: UCI Machine Learning Repository
- Samples: 1000
- Features:
Age,Job,Housing,Saving accounts,Checking account,Credit amount,Duration,Purpose, etc. - Target:
Risk(good/bad)
- Handled missing values using mode imputation for categorical fields
- Applied One-Hot Encoding for multi-class categorical features
- Scaled numerical features (
Age,Credit amount,Duration) using RobustScaler
- Trained a Decision Tree Classifier with:
- Pre-pruning (
max_depth,min_samples_split,min_samples_leaf) class_weight='balanced'to handle class imbalance
- Pre-pruning (
- Confusion Matrix
- Precision, Recall, F1-Score
- ROC Curve and AUC Score
- Comparative ROC plots between multiple decision tree configurations