Skip to content

cheekeet86/project_3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Project 3: Web APIs & Classification

Problem Statement

  • To utilize the Reddit API to scape posts automatically from 2 Subreddits.
  • To create and compare different classification models. The models predict which Subreddit a specific post came from.
  • To perform sentiment analysis on post contents. Provide advertising strategies for customers.

Data Collection

  • The posts are scaped using the Reddit API.
  • The posts are scaped from 2 popular Subreddits i.e. Board Games and Mobile Games with 2.1 million and 2.8k members respectively.
  • The Reddit API extracts the posts in JSON format and the posts are stored as json files for future analysis.

User Configurations

Variable Name Default Value Description
scape_data False True: Scape data from Reddit and save as json files
False: Load json files from input folder
scape_index 0 if scape_data=True:
0: Scape data from subreddits[0]
1: Scape data from subreddits[1]
num_requests 50 Number of Reddit API requests
Note: 25 posts are scaped per request.
posts_limit 900 Number of Posts used (per Subreddit) to build models
subreddits [ boardgames , mobilegames ] Subreddits List
url https://www.reddit.com/r/ Base URL for scaping
headers User-agent:Bleep blorp bot 0.1 User Agent Settings

Executive Summary

Click Here

References

Reddit API
Board Games Subreddit
Mobile Games Subreddit

About

Games Reddits Classifier (General Assembly SG Data Science Immersive Batch 9)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors