Author: Paras Jain
B.Tech CSE (AI) — KIET Group of Institutions
Graduation Year: 2027
Interactive credit risk prediction application:
https://credit-risk-scorecard.streamlit.app
The application allows users to enter borrower information and instantly receive:
• Probability of Default (PD)
• Credit Score
• Risk Category
• Loan Decision
This simulates how banks evaluate loan applications using machine learning models.
Example borrower profile:
| Feature | Value |
|---|---|
| Age | 45 |
| Credit Utilization | 20% |
| Total Delinquencies | 0 |
| Monthly Income | $8,000 |
Model Output:
- Probability of Default: 4.39%
- Credit Score: 873
- Risk Category: Low Risk
- Decision: Approved
This project builds a bank-style credit risk scorecard model that predicts the probability that a borrower will default on a loan.
The system replicates the core credit risk modeling workflow used by banks and financial institutions.
The model:
- Predicts Probability of Default (PD)
- Converts PD into a Credit Score
- Estimates Expected Credit Loss (ECL) using the IFRS-9 framework
The project demonstrates how machine learning models are used for financial risk analytics and loan decision systems.
Credit risk modeling is one of the most important applications of machine learning in finance.
Banks must estimate borrower default risk to:
- approve or reject loan applications
- determine interest rates
- estimate expected financial losses
- comply with financial regulations such as Basel III and IFRS-9
This project replicates a simplified version of the credit risk modeling pipeline used by banks.
Dataset used:
Give Me Some Credit (Kaggle)
https://www.kaggle.com/datasets/brycecf/give-me-some-credit
Dataset characteristics:
- 104,619 loan applicants
- 11 borrower financial variables
- Default rate: 6.68%
Target variable:
SeriousDlqin2yrs
0 = Good borrower 1 = Default
The project follows a standard banking credit risk modeling workflow:
- Business Understanding
- Exploratory Data Analysis
- Data Cleaning & Outlier Handling
- Feature Engineering
- WoE & Information Value Analysis
- Probability of Default (PD) Model Training
- Model Validation (AUC, Gini, KS)
- Probability Calibration
- Credit Scorecard Generation
- IFRS-9 Expected Credit Loss Estimation
Key engineered feature:
Total_Delinquency = (30-day late × 1) (60-day late × 2) (90-day late × 3)
Information Value:
IV ≈ 1.27
This became the strongest predictor of default risk.
Additional engineered variables:
- Income per dependent
- Debt-to-Income ratio
- Credit burden
These features improved model predictive power significantly.
Algorithm used:
Logistic Regression
Reasons:
- Industry standard model for credit scorecards
- Highly interpretable
- Regulatory-friendly for financial institutions
Handling class imbalance:
Dataset distribution:
Good borrowers → 93%
Default borrowers → 7%
Solution used:
SMOTE (Synthetic Minority Oversampling)
This balanced the training data to a 1:1 ratio.
| Metric | Result |
|---|---|
| AUC-ROC | 0.8546 |
| Gini Coefficient | 0.7092 |
| KS Statistic | 0.546 |
| Recall (defaults detected) | ~75% |
These metrics exceed typical banking credit risk model benchmarks.
Pipeline:
Data → Feature Engineering → Logistic Regression → Calibration → Scorecard
Steps:
- Raw borrower financial variables
- Feature engineering
- Logistic Regression PD model
- Prior probability correction
- Credit score conversion
- Expected Credit Loss estimation
Model predictions were explained using:
SHAP (SHapley Additive Explanations)
Top risk drivers:
- Total Delinquency
- Credit Utilization
- Age
- Debt Ratio
Explainability ensures transparency in credit decisions.
SMOTE introduces probability bias.
Initial model prediction:
Average PD ≈ 31.9%
Actual dataset default rate:
6.68%
Solution applied:
Prior Probability Correction
After calibration:
Predicted PD ≈ 7.27%
This aligns predicted probabilities with real-world default rates.
Predicted probabilities were converted into a credit score.
Score range:
300 – 900
Risk bands:
| Score | Risk Level |
|---|---|
| ≥ 720 | Low Risk |
| 600–719 | Medium Risk |
| < 600 | High Risk |
Expected Credit Loss formula:
ECL = PD × LGD × EAD
Portfolio results:
- Total Exposure (EAD): $934M
- 12-Month ECL: $23.8M
- Lifetime ECL: $87.3M
IFRS-9 staging:
| Stage | Risk Level |
|---|---|
| Stage 1 | Low Risk |
| Stage 2 | Medium Risk |
| Stage 3 | High Risk |
Although Stage-3 loans represent only 7.5% of the portfolio, they generate over 70% of expected losses.
The project includes a deployed Streamlit application.
Users input borrower information such as:
- Age
- Credit utilization
- Delinquencies
- Income
- Debt ratio
- Number of credit lines
The system outputs:
- Probability of Default
- Credit Score
- Risk Category
- Loan Decision
This demonstrates how machine learning models can power real-time financial decision systems.
- Python
- Pandas
- NumPy
- Scikit-learn
- Imbalanced-learn
- SHAP
- Matplotlib
- Seaborn
- Streamlit
Credit-Risk-Scorecard
app.py requirements.txt pd_model_final.pkl
Credit_Risk_Scorecard_ParasJain.ipynb
fig1_eda.png fig2_woe.png fig3_model_performance.png fig4_shap.png fig5_calibration.png fig6_ecl.png
README.md
Clone the repository:
git clone https://github.com/ParasJain03/Credit-Risk-Scorecard.git
Install dependencies:
pip install -r requirements.txt
Run the Streamlit app:
streamlit run app.py
The application will open in your browser.
- Feature engineering significantly improves predictive power
- Class imbalance strongly affects probability estimates
- Probability calibration is necessary after SMOTE
- Credit scorecards translate ML predictions into business decisions
- IFRS-9 enables forward-looking credit loss estimation
This project simplifies some real-world aspects:
- Macroeconomic variables were not included
- LGD and EAD values were simulated
- Dataset represents historical US lending behavior
Real bank models undergo additional regulatory validation, stress testing, and back-testing.
- Add macroeconomic stress scenarios
- Build Power BI credit risk monitoring dashboard
- Add macroeconomic PD adjustments
- Expand portfolio risk analytics
Paras Jain
B.Tech CSE (AI)
KIET Group of Institutions
Email: parasjainparas310@gmail.com





