Credit Score Prediction

This projet is a Midterm project of ML Zoomcamp by Alexey Grigorev

Overview

The extensive dataset of customer banking and credit-related details collected over the years. The goal is to develop a Classification ML model capable of categorizing individuals into predefined credit categories, thereby minimizing manual effort and improving efficiency. This project provides a machine learning-based app for predicting a person's credit score based on their financial and credit-related information. It uses a Flask web application to serve the trained model and Docker for containerization. The goal is to help a global finance company automate the classification of individuals into predefined credit score brackets, reducing manual effort and increasing efficiency.

Dataset contains 28 columns and 100K rows: Target feature is "credit_score" : "Good," "Poor," "Standard".

No	Feature	Description
1	`id`	Unique identifier for each record.
2	`customer_id`	Unique identifier for each customer.
3	`month`	Month of the transaction or record.
4	`name`	Customer’s name.
5	`age`	The customer’s age.
6	`ssn`	Customer’s social security number.
7	`occupation`	The customer’s occupation.
8	`annual_income`	The customer’s annual income.
9	`monthly_inhand_salary`	The customer’s monthly take-home salary.
10	`num_bank_accounts`	Total number of bank accounts owned by the customer.
11	`num_credit_card`	Total number of credit cards held by the customer.
12	`interest_rate`	The interest rate applied to loans or credits.
13	`num_of_loan`	Number of loans the customer has taken.
14	`type_of_loan`	Categories of loans obtained by the customer.
15	`delay_from_due_date`	The delay in payment relative to the due date.
16	`num_of_delayed_payment`	Total instances of late payments made by the customer.
17	`changed_credit_limit`	Adjustments made to the customer’s credit limit.
18	`num_credit_inquiries`	Number of inquiries made regarding the customer's credit.
19	`credit_mix`	The variety of credit types the customer uses (e.g., loans, credit cards).
20	`outstanding_debt`	Total amount of debt the customer currently owes.
21	`credit_utilization_ratio`	Proportion of credit used compared to the total credit limit.
22	`credit_history_age`	Duration of the customer’s credit history.
23	`payment_of_min_amount`	Indicates if the customer pays the minimum required amount each month.
24	`total_emi_per_month`	Total Equated Monthly Installment (EMI) paid by the customer.
25	`amount_invested_monthly`	Monthly investment amount made by the customer.
26	`payment_behaviour`	Customer’s payment habits and tendencies.
27	`monthly_balance`	The remaining balance in the customer’s account at the end of each month.
28	`credit_score`	The customer’s credit score (target variable: "Good," "Poor," "Standard").

Download the latest version

The dataset from Kaggle:

path = kagglehub.dataset_download("parisrohan/credit-score-classification")

Problem Statement

Managing credit risk is a critical task for financial institutions. Over the years, this company has gathered vast amounts of customer data, including banking and credit information. The task is to create an intelligent system capable of:

Classifying individuals into credit score classes.
Reducing the manual workload for the company's analysts.
Delivering accurate predictions for better risk management.

Files description

The dataset cleaning, analysis and model selection were conducted using Jupyter Notebook. You can find the corresponding notebook file in this repository:

notebook_cleaning and preprocessing.ipynb
notebook_EDA_model_selection.ipynb

Cleaned dataset and dataset for testing final model:

cleaned_df.csv
df_test.csv

Trained final model with best parameters and pickled files:

train.py
final_model.pkl
dict_vectorizer.pkl
label_encoder.pkl

Files for running the project locally:

predict.py
predict-test.py
requirements.txt

Running using Waitress as WSGI server

Please use predict.py and predict-test.py for testing model prediction.

Create and activate .venv:

  python -m venv venv
venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Run predict.py:

waitress-serve --listen=0.0.0.0:9696 predict:app

Run predict-test.py and use the command line:

python predict-test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Score Prediction

Overview

Download the latest version

Problem Statement

Files description

The dataset cleaning, analysis and model selection were conducted using Jupyter Notebook. You can find the corresponding notebook file in this repository:

Cleaned dataset and dataset for testing final model:

Trained final model with best parameters and pickled files:

Files for running the project locally:

Running using Waitress as WSGI server

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
df_test.csv		df_test.csv
dict_vectorizer.pkl		dict_vectorizer.pkl
final_model.pkl		final_model.pkl
label_encoder.pkl		label_encoder.pkl
notebook_EDA_model_selection.ipynb		notebook_EDA_model_selection.ipynb
notebook_cleaning and preprocessing.ipynb		notebook_cleaning and preprocessing.ipynb
predict-test.py		predict-test.py
predict.py		predict.py
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Credit Score Prediction

Overview

Download the latest version

Problem Statement

Files description

The dataset cleaning, analysis and model selection were conducted using Jupyter Notebook. You can find the corresponding notebook file in this repository:

Cleaned dataset and dataset for testing final model:

Trained final model with best parameters and pickled files:

Files for running the project locally:

Running using Waitress as WSGI server

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages