Problem Summary

The goal of this project is to study a dataset containing loan applications and truth labels corresponding to approval/rejection and find a model to predict whether a new loan application be approved or rejected. This is a binary classification problem that can be solved using a supervised learning model as truth labels are present. However, special care must be taken to mitigate the impact of class imbalance (i.e. uneven distribution of approval vs. rejection cases) on classification performance. Special attention is given to improving the model's ability to predict minority classes ("Yes") to support more accurate decision-making and benfit the financial firm.

Modelling Choice

For a loan approval prediction since it involves labeled data, it makes sense to use supervised learning model. Various supervised learning models were explored, for e.g. generalized linear model(GLM), Least Absolute Shrinkage and Selection Operator(LASSO), random forest, gradient boosting machine(GBM) and extreme gradient boosting(XGBoost). XGBoost was found to be the best performing model.

Unsupervised learning model is useful when they are no labels and the goal is to find hidden patterns or anomalies. An attempt was made to use unsupervised learning model. In particular, density based spatial clustering with noise (DBSCAN) was used to check if it predicts two clusters. The reason to choose DBSCAN clustering instead of k-means or hierarchical clustering is that there is no requirement to specify the number of clusters. Initially it appeared that the model predicted 2 clusters and several anomalies, however the values for evaluation metrics like precision, recall and F1 Score had perfect scores of 1 and raised some alarms. Futher debugging led to the conclusion that truth labels were used accidentally in DBSCAN clustering and thus it resulted in two perfect clusters and some noise points/anomalies. These results are shown in the appendix of the report report. .

Executive Summary/Report

Look for the FinalReport_Team4.pdf for a detailed report showing executive summary, exploratory data analysis (EDA), feature engineering, missing value imputation, feature transformations, models used, results and comments on future improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
ProjectProposal		ProjectProposal
Rscripts		Rscripts
.gitignore		.gitignore
FinalReport_Team4.R		FinalReport_Team4.R
FinalReport_Team4.pdf		FinalReport_Team4.pdf
InitialDraft_Team4.pdf		InitialDraft_Team4.pdf
ProjectInitialDataAnalysis_Team4.html		ProjectInitialDataAnalysis_Team4.html
ProjectInitialDataAnalysis_Team4.ipynb		ProjectInitialDataAnalysis_Team4.ipynb
ProjectInitialDataAnalysis_Team4.pdf		ProjectInitialDataAnalysis_Team4.pdf
README.md		README.md
Team4_Slides.pdf		Team4_Slides.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem Summary

Modelling Choice

Executive Summary/Report

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Problem Summary

Modelling Choice

Executive Summary/Report

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages