Machine Learning for Finance – Assignment II

This repository contains an empirical credit risk modelling study focused on predicting corporate defaults using multiple classification algorithms. The objective is to evaluate whether advanced machine learning methods outperform a baseline logistic regression model.

Objective

We model one-year corporate default probability using financial ratios:

WKTA (Working Capital / Total Assets)
RETA (Retained Earnings / Total Assets)
EBITTA (EBIT / Total Assets)
MV (Market Value proxy)

Models evaluated:

Logistic Regression (baseline)
Decision Tree
Neural Network
Gradient Boosting Classifier

Methodology

Missing value treatment
Outlier analysis using IQR method
Correlation analysis
60/20/20 stratified train/validation/test split
Hyperparameter tuning on validation set
Evaluation using:
- Accuracy
- Precision
- Recall
- F1 Score
- Fβ (β = 2)

F1 and Fβ metrics were prioritized due to class imbalance (~10% defaults), where recall is economically more important than raw accuracy.

Model Design Highlights

Decision Tree: tuned max_depth
Neural Network:
- ReLU activation
- Dropout = 0.2
- 50 epochs
- Batch size = 10
- Hyperparameter search over hidden layers and nodes
Gradient Boosting:
- Learning rate = 0.8
- Max depth = 2
- Tuned n_estimators

Key Findings

Logistic Regression achieved the strongest overall performance.
More complex models did not outperform the linear baseline, likely due to:
- Limited dataset size
- Relatively linear feature relationships
- Class imbalance constraints
Reduced-feature decision trees required deeper structures and underperformed relative to logit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine Learning for Finance – Assignment II

Objective

Methodology

Model Design Highlights

Key Findings

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Machine Learning for Finance – Assignment II

Objective

Methodology

Model Design Highlights

Key Findings