This is the code for the Random Forest model approach to the problem. There are two code locations, where decades.ipynb is a combination of testing and tuning. The final_code.py file is the final submission for the project, I just wanted to include them both to show progress steps. Included is the output of the final run of the code with the ideal parameters found after a long tuning phase. This will be found in 'model_accuracies.txt' for the details on testing and training accuracies as well as a classification report, and 'confusion_matrix.png' where you will find the final confusion matrix.
In decades.ipynb it also shows a comparison phase where I looked at Random Forest Classification vs Random Forest Regression. For Random Forest Regression I worked with the original dataset using the years as provided, and in Random Forest Classification I converted years to decades.
To run the code you will need to follow a few simple steps.
- Create the dataset folder:
mkdir dataset- Run this command in dataset folder to obtain the data. This file will need to be unzipped:
wget https://archive.ics.uci.edu/static/public/203/yearpredictionmsd.zipUnzip with desired software or run this command in the dataset folder:
tar -xf yearpredictionmsd.zip- Install the required packages, either in virtual environment or main
pip install -r requirements.txt- Run the code:
python final_code.py