Skip to content

Latest commit

 

History

History
58 lines (45 loc) · 1.71 KB

File metadata and controls

58 lines (45 loc) · 1.71 KB

Machine Learning for Finance – Assignment II

This repository contains an empirical credit risk modelling study focused on predicting corporate defaults using multiple classification algorithms. The objective is to evaluate whether advanced machine learning methods outperform a baseline logistic regression model.

Objective

We model one-year corporate default probability using financial ratios:

  • WKTA (Working Capital / Total Assets)
  • RETA (Retained Earnings / Total Assets)
  • EBITTA (EBIT / Total Assets)
  • MV (Market Value proxy)

Models evaluated:

  1. Logistic Regression (baseline)
  2. Decision Tree
  3. Neural Network
  4. Gradient Boosting Classifier

Methodology

  • Missing value treatment
  • Outlier analysis using IQR method
  • Correlation analysis
  • 60/20/20 stratified train/validation/test split
  • Hyperparameter tuning on validation set
  • Evaluation using:
    • Accuracy
    • Precision
    • Recall
    • F1 Score
    • Fβ (β = 2)

F1 and Fβ metrics were prioritized due to class imbalance (~10% defaults), where recall is economically more important than raw accuracy.

Model Design Highlights

  • Decision Tree: tuned max_depth
  • Neural Network:
    • ReLU activation
    • Dropout = 0.2
    • 50 epochs
    • Batch size = 10
    • Hyperparameter search over hidden layers and nodes
  • Gradient Boosting:
    • Learning rate = 0.8
    • Max depth = 2
    • Tuned n_estimators

Key Findings

  • Logistic Regression achieved the strongest overall performance.
  • More complex models did not outperform the linear baseline, likely due to:
    • Limited dataset size
    • Relatively linear feature relationships
    • Class imbalance constraints
  • Reduced-feature decision trees required deeper structures and underperformed relative to logit.