Subreddit_Recommender

based on your subreddit usage, which other subreddits might you like?

Requirements: deliver a recommender system use a clustering algorithm somewhere in the process (with too much data) would have to 'joblib' save and train the model Functional Programming (pipelines) implement a custom module command line application - argument for number of subreddits (input() function), based on those entries, recommends the subreddits to the user, set a cap on the number of subreddits returned error handling so program doesn't crash Github Repo

database structure Username | subreddit | Timestamp of comment 14m Rows 22,610 Users 34,967 subreddit

Reference Material: https://towardsdatascience.com/build-your-own-clustering-based-recommendation-engine-in-15-minutes-bdddd591d394 https://www.kaggle.com/supratimhaldar/clustering-movies-with-collaborative-filtering https://medium.com/analytics-vidhya/k-nearest-neighbors-all-you-need-to-know-1333eb5f0ed0

Process: Create Github Repo - Ramon Import the data - Exploratory Data Analysis - class balance (distribution),

Build Final Product

SQL Script import the data to CSV (save the SQL Script for the code review) our SQL query was designed to Group by User, count of UTC,

pull.py - Pipeline Start Query the Database, collect 100k rows (only the IDs) create a map of subreddit IDs to subreddit Names return the DataFrame and the map

clean.py - Pipeline get the DataFrame from pull.py pipepline to transform the data - return the transformed DataFrame

cluster.py import the Transformed DataFrame run the selected Clustering Model on it. Evaluate the Model. Pickle that Trained Model using "cosine similarity" as distance metric in KNearestNeighbors instead of Euclidean Distance due to high dimensionality, Euclidean distance is often repetitive https://medium.com/analytics-vidhya/k-nearest-neighbors-all-you-need-to-know-1333eb5f0ed0

main.py - our command-line interface, accepting the inputted arguments error handling on the inputs try/except to import that pickled model from cluster.py import the inputted arguments from the command line, return the desired number of results

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.ipynb_checkpoints		.ipynb_checkpoints
README.md		README.md
Sahivy_Functions.ipynb		Sahivy_Functions.ipynb
Untitled.ipynb		Untitled.ipynb
askuser.py		askuser.py
data_exploration.ipynb		data_exploration.ipynb
data_exploration_test.ipynb		data_exploration_test.ipynb
main.py		main.py
pull.py		pull.py
reddit_user_subreddit.csv		reddit_user_subreddit.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Subreddit_Recommender

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Subreddit_Recommender

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages