You can run the code with the following file:
run_performance_comparison.py
Clone the repository
git clone https://github.com/auringonnousu/performance_comparison_ML_models.gitNavigate to the cloned directory
cd performance_comparison_ML_modelsRun the Python script
python run_performance_comparison.pyOr click on this Binder badge:
Conduction of a comparison of classification performance and run-time for Decision Tree Classifier, Random Forest Classifier and Gradient Boosting Classifier.
RandomOverSampler is used to balance the training set.
GridSearch is used to find the best parameters for each model. 5-fold Cross-validation is performed.
The performance is evaluated on the test set.
The Built-in Feature Importance is used to find the most important features for each model.
Computing ROC AUC score for each model.
Training of models with only the most important features and parameter.
Steps:
- Encoding using OneHotEncoder()
- Applying RandomOverSampler()
- Running pipeline per model with cv
- Performing GridSearchCV() within pipeline for each model
- Training of each model
- Performing cross-validation on each model
- Write results to df
- Visualization of metrics for current models
- Training of each model with best parameter and most important features
- Evaluation of each model