Problem statement : To analyses and predict if a person has a Heart Disease.
This is a supervised learning, classification based project.
Overview of the project!
Input Features = 'BMI', 'Smoking', 'AlcoholDrinking', 'Stroke','PhysicalHealth', 'MentalHealth', 'DiffWalking', 'Sex', 'AgeCategory','Race', 'Diabetic', 'PhysicalActivity', 'GenHealth', 'SleepTime','Asthma', 'KidneyDisease', 'SkinCancer'
Output Feature = HeartDisease
Data Set Type = Balanced data set using Oversampling
Best Model = Random Forest
Train Accuracy Score : 0.9979
Test Accuracy Score : 0.9656
Classification Report : Precision recall f1-score support 0 1.00 0.93 0.96 68614 1 0.94 1.00 0.97 68614
Challenges
Too many duplicate data Too many outliers Data imbalance Feature Selection Model Selection Deployment
