This is the repo for our Computer Sciences 838 course. Detailed information about the project and progress by stage can be found here
The focus of our project will be to gain insights into movies and trends. We will extract data from two movie databases. The first one we take from Kaggle here. The second data source can be found here
Our main research question focuses on our ability to predict the imdb_score from our movies_metadata.csv dataset. We will use the variables in this datast in conjunction with the film.csv dataset. We will also attempt to improve our results by exploring some basic sentiment analysis on our final model.
Code to perform our data cleaning and merging can be found in the ETL folder,
where the readme details our process.
Detailed information on project pages can be found on the course website here
This repo is organized in folders with the respective project stages. Detailed information can again be found on the website.