Skip to content

Mini-project for SC1015 (Introduction to Data Science and Artificial Intelligence)

Notifications You must be signed in to change notification settings

aish-1509/SC1015_Project

 
 

Repository files navigation

SC1015 Project

About

This is a Mini-Project for the course SC1015 (Introduction to Data Science and Artificial Intelligence). In this project we are using the Wine Quality Data Set for our analysis.

Below is the step-by-step process of the analysis:

  1. Data Extraction and Visualisation
  2. Data Cleaning and Resampling
  3. Gradient Boosting Classification

Problem Statement

  • How do we know of a wine is good?
  • Can we predict if a wine is good based on its attributes?

Conclusion

  • A good quality wine should have a quality rating >= 7
  • Individual attributes cannot be used to predict the quality of wine due to low correlation
  • Models created using random oversampled and SMOTETomek resampled train data performed the best when evaluated with test data
  • Both models are able to predict the quality of wine correctly with an accuracy of 87%
  • Among the good quality wines, 67% will be predicted to be good (Sensitivity)
  • 70% of the wines classifed as good are guaranteed to be good (Precision)
  • Any changes in these attributes (volatile acidity, total sulfur dioxide, residual sugar and chlorides) will have a greater impact on the quality of wine than other attributes

What We have Learnt

  • Using one hot encoding to transform a catagorial data into a binary data type
  • Handling imbalanced data with resampling techniques (random oversampling, SMOTEENN resampling and SMOTETomek resampling)
  • Classification using machine learning (XGBoost)
  • Evaluating machine learning models (sensitivity, precision and F1 score)
  • Use of visualisation tools (matplotlib and seaborn)

Contributors

  • @aish-1509 - Data Extraction, Data Visualisation
  • @JJChong77 - Data Cleaning, Data Resampling
  • @JonasChua - Gradient Boosting Classification, Data Visualisation

References

About

Mini-project for SC1015 (Introduction to Data Science and Artificial Intelligence)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%