Skip to content

A Machine Learning model that predicts whether or a tweet will go viral based on certain features associated to that tweet.

License

Notifications You must be signed in to change notification settings

taneemishere/Viral-Tweet-With-K-Nearest-Neighbor

Repository files navigation

Viral-Tweet-With-K-Nearest-Neighbor

A Machine Learning model that predicts whether or a tweet will go viral based on certain features associated to that tweet.

Requirements

  • Pandas
  • Numpy
  • Scikit Learn
  • Seaborn

The Data

The data associated with this project is I scraped it from the twitter. For, if you need to do the same first you'll need the Twitter Developer Account. And then create an app there and so you'll have your secret keys. So I scraped twitter for 5000 tweets. And get the data about the tweet text, number of followers that specific user has and number of followees i-e here it is mentioned as friends.
One thing to remember is that I scraped twitter for the keyword Machine Learning you can do at your own also one can scrape for some specific user as well.
Here is the decription of the data Data Description

Viral or Not

Define the viral tweet, as here if a tweet has greater than thousand retweets that means, it is viral otherwise not. For this viral is denoted by 1 and not viral is 0. We're doing this, because as we know the numbers do good in machine learning.

At Last

Plot the classifier score over different values of k and see the result. Here is my result, at most we can have 97+ accuracy which is awesome. Moreover, next we should definitely code for its confidence aka the precision of the algorithm as well. Plot

Scraping the Data

To scraping the data from twitter I use this script to scrap and paste the twitter data in the form of csv file. This will create the -tweets.csv file. I use the keyword Machine Learning that's why I have MachineLearning-tweets.csv in the read_csv method. Here the data I used is not that much enough like 500 records is not sufficient I guess so if you like to train a model over this at least use 2000 plus records. As it is said that larger dataset doesn't fit every situation but in this case I can say if we go through some of the larger dataset our model can do well than it is now. I'm now planning of to do some more scraping randomly over a keyword and will merge that dataset and this one and then we'll try.

About

A Machine Learning model that predicts whether or a tweet will go viral based on certain features associated to that tweet.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published