One day competition of Ironhack's Data Analytics bootcamp. Goal was to build a predicting model for sales that then was to be verified. Cleaned a raw dataset with data of sales from different stores, used feature engineering for feature selection and then applied two diferente models and compared the scores on both: xgboost and Random Forest Regressor. Weighted the bias / variance to decide on which to choose: chose the second.
Model later verified by the teacher on a new dataset and ended being the winner.
- Data Cleaning and Manipulation: checking and dropping null values / rows / columns, dealing with duplicates, formatting and filtering data;
- Combining and Structuring Data:
- Data Aggregation and Filtering;
- Libraries imported:
- Pandas: import, export the shark_attack.csv - baseline for the project - and manipulate data;
- matplotlib: plotting histograms to verify hypothesis;
- Numpy;
- Seaborn;
- sklearn: metrics, ensemble and model_selection.
- Pandas Documentation
- matplotlib Documentation
- seaborn Documentation)
- .csv File with data;