Predicting Amazon Reviews

Goal

Build a classifier that predicts whether a review is negative or positive. This classifier would be trained on Amazon reviews of an app called OverDrive.

Data Collection

I scraped an Amazon website to get the reviews from an app called OverDrive that is a library for audio books and Ebooks. I wrote a script that iterates over 200 review pages to extract review titles, review descriptions, the name of the person who left a review and the date when the review took place from this app. My data set contains a total of 2,388 reviews.

Exploratory Data Anlaysis (EDA)

The average review in the dataframe is roughly 183 characters in length. Taking a generous assumption that the average word is 10 characters in length (this can help account for spaces and punctuation), the average article is roughly 18 words long. There are 1,692 unique authors in the dataframe which is quite diverse. The average author has contributed about 1 review to the dataframe. The top author has contributed 143 reviews. The average reviews done by month is 163 in this dataframe. The month with most reviews is January and the month with last reviews is April.

Data Preprocessing / Feature Engineering

Given clean data, I used Spacy to tokenize, lemmatize and filter the data. I have vectorized the reviews with term frequency-inverse document frequency (tf-idf) values, which provided insight to the weight of each word in each document.

Visualization of Word Frequecies and Wordcloud

Sentiment Analysis

I implemented sentiment analysis in my dataset for labelling purposes. After this I encoded positive sentiment using number 1 and a negative sentiment with number 0. These will be the target values for my classification models.

Performance of ML Models

In total, I trained and tested 9 machine learning models typically used for classification. Based on the testing metrics, I decided to AdaBoost was the best classifier for this dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
Cleaning,EDA, Prepping_Data and Sentiment_Analysis.ipynb		Cleaning,EDA, Prepping_Data and Sentiment_Analysis.ipynb
Modelling.ipynb		Modelling.ipynb
README.md		README.md
Screen Shot 2020-02-18 at 4.30.35 PM.png		Screen Shot 2020-02-18 at 4.30.35 PM.png
Screen Shot 2020-02-18 at 4.45.41 PM.png		Screen Shot 2020-02-18 at 4.45.41 PM.png
Screen Shot 2020-02-18 at 4.45.59 PM.png		Screen Shot 2020-02-18 at 4.45.59 PM.png
Screen Shot 2020-02-18 at 4.50.24 PM.png		Screen Shot 2020-02-18 at 4.50.24 PM.png
Screen Shot 2020-02-18 at 4.53.43 PM.png		Screen Shot 2020-02-18 at 4.53.43 PM.png
WebScraping Amazon reviews.ipynb		WebScraping Amazon reviews.ipynb
data_clean		data_clean
functions.py		functions.py
reviews.csv		reviews.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Amazon Reviews

Goal

Data Collection

Exploratory Data Anlaysis (EDA)

Data Preprocessing / Feature Engineering

Visualization of Word Frequecies and Wordcloud

Sentiment Analysis

Performance of ML Models

About

Uh oh!

Releases

Packages

Languages

Cristinamulas/Amazon-Classifier

Folders and files

Latest commit

History

Repository files navigation

Predicting Amazon Reviews

Goal

Data Collection

Exploratory Data Anlaysis (EDA)

Data Preprocessing / Feature Engineering

Visualization of Word Frequecies and Wordcloud

Sentiment Analysis

Performance of ML Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages