A Machine Learning model that predicts whether or a tweet will go viral based on certain features associated to that tweet.
- Pandas
- Numpy
- Scikit Learn
- Seaborn
The data associated with this project is I scraped it from the twitter. For, if you need to
do the same first you'll need the Twitter Developer Account. And then
create an app there and so you'll have your secret keys. So I scraped twitter for 5000 tweets. And get the
data about the tweet text, number of followers that specific user has and number of followees i-e here it is mentioned
as friends.
One thing to remember is that I scraped twitter for the keyword Machine Learning you can do at your own also
one can scrape for some specific user as well.
Here is the decription of the data
Define the viral tweet, as here if a tweet has greater than thousand retweets that means, it is viral otherwise not. For this viral is denoted by 1 and not viral is 0. We're doing this, because as we know the numbers do good in machine learning.
Plot the classifier score over different values of k and see the result. Here is my result, at most we can have 97+ accuracy which is awesome. Moreover, next we should definitely code for its confidence aka the precision of the algorithm as well.
To scraping the data from twitter I use this script to scrap and paste the twitter data in the form of csv file. This will create the -tweets.csv file.
I use the keyword Machine Learning that's why I have MachineLearning-tweets.csv
in the read_csv method. Here the data I used is not that much enough like 500 records is not sufficient I guess so if you like to train a model over this at least use 2000 plus records.
As it is said that larger dataset doesn't fit every situation but in this case I can say if we go through some of the larger dataset our model can do well than it is now.
I'm now planning of to do some more scraping randomly over a keyword and will merge that dataset and this one and then we'll try.