Miguel Chacon
Nick Deitmers
Anna Fonte
Jorge Gonzalez
A cookie distributor called Flea Cookies has hired us - Team Pink - to predict their customers' perceived quality of the products in its portfolio.
Flea Cookies is suffering from financial problems as a result of failures in predicting their sales - they have overstocked some categories by 40% and understocked others by 25%. They have purchased a cookie dataset and asked the team to develop a Machine Learning model to predict the cookies' perceived quality. They have provided us with a training dataset and the predictor variables for a testing dataset. Nevertheless, they have concealed the predictor variables for the testing dataset from us, as they will use it to judge the root mean square error of our model.
We were given a 16 column dataset with cookies characteristics and around 5000 observations. The target variable was the Quality of the cookies.
- Understanding Dataset
- Wrangling Data
- Design Data Pre-Processing Pipelines
- Assemble and Optimize ML Models
- Pitch our Model to a Jury
-In order to develop our model, we worked as a team to plan our workflow. We then divided activities and collaborated with our code through GitHub, meeting regularly to take decisions on key elements of the pipeline. Once the pre-processing pipeline was ready, the remaining tasks and model training were divided into each component of the group.
Folder structure:
- Presentation: A ppt to pitch our model to the client.
- Machine Learning Pipeline: The team's complete analysis, pre-processing, and modeling of the data.
- Other Model Results: Results from models that were tested but discarded.
- Data: Contains a complete training dataset and the predictor variables for a testing dataset.
- README