In this project we show with a hands-on case study, why over-sampling techniques like SMOTE should be used with caution.
Key takeaways:
-
SMOTE can generate data that may not represent the real-world data
-
SMOTE can generate data that are closer to the majority class than the minority class.
A detailed description of the project is published in the article "SMOTE-ing is injurious to (data) health"