BinaryClassification_EDA_with_Python

A detailed notebook to deal with Binary Classification Problem without having any clue of input columns

Task

Binary Classification - Do an exploratory analysis of the dataset provided, decide on feature selection, preprocessing before training a model to classify as class ‘0’ or class ‘1’.

Given files

training_set.csv - To be used as training and validation set - 3910 records, 57 features, 1 output
test_set.csv (without Ground Truth) - 691 records, 57 features

Quick Description

Readme file - explaining any relevant thought process as well as the general approach for the task Any approch to solve a classification problem is to target acceptable F1 Score, or Precision | Recall according to the business problem. In our case, The business problem is invisble as data does not contain neither any categorical values nor any column descriptions. This leads us to solely depend on our capabilities to understand the data and identify the features. Understanding the data: Data contain 57 independent columns for 1 dependent variable (output column). To understand the impact and pattern of all the columns in bulk (as taking one by one is not a feasible option). I calculated all the columns : -Data Types -Description (min, max, std) -Null/Missing Values -Output column distribution (to understand data is imbalance or balanced) -Correlation -Multicollinearity using VIF -Feature Importance using Logistic Regression and Random Forest Classifier -Dimenssionality Reduction (PCA)
Model performance analysis on validation set in terms of various risks
- Started with Niave Bayes on raw data to understand the scope of improvement
- SVM, then Logistic Regression and then Random Forest Classifier, and then ANN (not required for this data as we are getting accuracies over 90% without introducing heavy models)
A list of dependencies/libraries & their versions to run the code. Python and its libraries as stated in notebook
Result: RandomForest Classifier was able to give accuracy ~94% and with a balanced confusion matrix without doing any Dimenssionality Reduction

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Data		Data
Notebook		Notebook
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BinaryClassification_EDA_with_Python

Task

Given files

Quick Description

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BinaryClassification_EDA_with_Python

Task

Given files

Quick Description

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages