Twitter Sentiment Analysis

Contributors:

Daniel Oselu
John Kanoru
Roseline Maina
Angela Cheruto
Benson Muriu
Irene Maina
Nelly Ng'eno
Janet Gachoki

Overview

This project uses Multinomial Naive Bayes to predict the sentiment of tweets towards Apple and Google products using data obtained from CrowdFlower found on data.world

Project Summary

In a world where technology startups are common, consumer perception of a brand can provide us with valuable information about their purchasing behavior and, in turn, the financial performance of the business that produces them.In order to determine which brands to research further for potential investment, Longview techventures wants a generalizable model to measure sentiment across various brands.

Longview Techventures is only interested in whether consumers feel positive about the brand and have hired us to help them develop a predictive model that keeps track of recent tweets about tech products so they can make smart investment choices.

Data

This project primarily uses data gathered from CrowdFlower which can be found on data.world or in the data folder in this project's GitHub repository. The data contains 9203 datapoints containing columns containing the full tweet, a column identifying what brand or product the tweet was about if any, and a final column indicating whether it has any emotion positive or negative or none towards the brand or product. The products found by searching keywords were all either Apple or Google products. The data was gathered in 2013 and all the tweets came from the #SXSW tag.

The dataset significant class imbalance with most of the data (61%) being marked as no emotion and only 6% of the data expressing negative views towards the brand or product.

Data Preparation and Exploration

Duplicated rows were dropped and rows that had missing values were either dropped or replaced with Undefined for those in the product column.

Links, punctuation and stopwords were removed from the data prior to modeling. '#' was removed from twitter hashtags, but the content of the tag was kept. The data was small enough that lemmatization was usable to reduce the dimensionality of the data.

To get a general idea of both the word frequencies and product sentiments several data visualizations were made:

Modelling

The sentiment column was used as the target variable and the tweet column was used as predictor variable. Initial modeling efforts showed that our models suffers from the class imbalance. Tradition classification algorithms: Naive Bayes, Random Forest and Logistic Regression were tested as well as the more sophisticated LSTM network. Despite most of the traditional classification models suffering from training overfit, Multinomial Naive Bayes improves on the overfit slightly but the accuracy is limited to 74%.

Future Improvements

Testing additional vectorizers and models may improve on these results. Pre-trained vectorizers such as Google's Word2Vec, Stanford's GloVe, and SpaCy may produce better results.

Scrape for more data to possibly balance the four classes, especially negative and positive tweets.

Build a binary class focus on only the positive and negative comments then add the neutral comments later on.

Obtain more tweet-data from other tech companies, particularly small startups and PR companies.

For More Information

Please review our full analysis in our Jupyter Notebook or our presentation.

Repository Structure

├── data

├── images

├── README.md

├── Tweets_sentiment_analysis.ipynb

└── Twitter_Sentiment_Analysis_Notebook.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
code		code
data		data
images		images
.gitignore		.gitignore
README.md		README.md
Tweets_sentiment_analysis.ipynb		Tweets_sentiment_analysis.ipynb
__init__.py		__init__.py
presentations.pdf		presentations.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Sentiment Analysis

Contributors:

Overview

Project Summary

Data

Data Preparation and Exploration

Modelling

Future Improvements

For More Information

Repository Structure

About

Releases

Packages

Contributors 6

Languages

danieloselu3/Twitter-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Twitter Sentiment Analysis

Contributors:

Overview

Project Summary

Data

Data Preparation and Exploration

Modelling

Future Improvements

For More Information

Repository Structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages