(Assigment by Upgrad and IIIT-B)
The case study focuses on EDA mainly, to understand which major parameters to detect whether a customer will default loan or not.
- Analysing Methodology
- Technologies Used
- Conslusions
- Contributors
- Data Understanding : Working with the Data Dictionary and getting knowledge of all the columns and their domain specific uses .
- Data Cleaning : Removing the null valued columns, unnecessary variables and checking the null value percentage and removing the respective rows.
- Univariate Analysis : Analyzing each column, plotting the distributions of each column.
- Segmented Univariate Analysis : Analyzing the continuous data columns with respect to the categorical column .
- Bivariate Analysis : Analyzing the two variables behavior like term and loan status with respect to loan amount.
- Recommendations : Analyzing all plots and recommendations for reducing the loss of business by detecting columns best which contribute to loan defaulters.
- pandas library for handling datasets
- numpy library for handling series
- seaborn library for better graphic graph plots
- matplotlib library for graph plots
- Surprising number of charged offs belonged to category “Verified” for “verification_status” and the huge number of “Not_verified” status indicates a major need to 2. revamp the verification process being followed.
- Public bankruptcy was a strong indicator of default
- High DTI should be a key deciding factor for lending.
- Purpose of loan like “education", "small business” are likely to default
A Detailed Analysis and recommendations are including in the pdf attached