Skip to content

Implemented a Content-based recommender using CountVectorizer on 4.8K+ TMDB movies, leveraging 5K-dimensional sparse vectors & selecting Cosine similarity over Euclidean distance for NLP relevance.

Notifications You must be signed in to change notification settings

akashnegetive/Entertainment-Recommender-System

Repository files navigation

Project: Entertainment Recommender System Using Machine Learning!

workflow

Recommendation systems are becoming increasingly important in today’s extremely busy world. People are always short on time with the myriad tasks they need to accomplish in the limited 24 hours. Therefore, the recommendation systems are important as they help them make the right choices, without having to expend their cognitive resources.

The purpose of a recommendation system basically is to search for content that would be interesting to an individual. Moreover, it involves a number of factors to create personalised lists of useful and interesting content specific to each user/individual. Recommendation systems are Artificial Intelligence based algorithms that skim through all possible options and create a customized list of items that are interesting and relevant to an individual. These results are based on their profile, search/browsing history, what other people with similar traits/demographics are watching, and how likely are you to watch those movies. This is achieved through predictive modeling and heuristics with the data available.

Types of Recommendation System :

workflow

1 ) Content Based :

  • Content-based systems, which use characteristic information and takes item attriubutes into consideration .

  • Twitter , Youtube .

  • Which music you are listening , what singer are you watching . Form embeddings for the features .

  • User specific actions or similar items reccomendation .

  • It will create a vector of it .

  • These systems make recommendations using a user's item and profile features. They hypothesize that if a user was interested in an item in the past, they will once again be interested in it in the future

  • One issue that arises is making obvious recommendations because of excessive specialization (user A is only interested in categories B, C, and D, and the system is not able to recommend items outside those categories, even though they could be interesting to them).

2 ) Collaborative Based :

  • Collaborative filtering systems, which are based on user-item interactions.

  • Clusters of users with same ratings , similar users .

  • Book recommendation , so use cluster mechanism .

  • We take only one parameter , ratings or comments .

  • In short, collaborative filtering systems are based on the assumption that if a user likes item A and another user likes the same item A as well as another item, item B, the first user could also be interested in the second item .

  • Issues are :

    • User-Item nXn matrix , so computationally expensive .

    • Only famous items will get reccomended .

    • New items might not get reccomended at all .

3 ) Hybrid Based :

  • Hybrid systems, which combine both types of information with the aim of avoiding problems that are generated when working with just one kind.

  • Combination of both and used now a days .

  • Uses : word2vec , embedding .

Dataset

The dataset used for this project contains information about movies, including their titles and IDs. It is processed and stored in movie_data.pkl. The dataset is used to calculate the cosine similarity between movies.

Model

The model for recommending movies is based on cosine similarity. Cosine similarity is used to measure the similarity between movie titles. The model computes the similarity scores and suggests the top 10 similar movies based on the selected movie title.

Difference Between Euclidean Distance and Cosine Similarity

image

-Euclidean: Best for numeric, dense, or continuous data (e.g., user ratings, spatial coordinates). -Cosine: Best for textual, sparse, or high-dimensional data (e.g., NLP vectors, TF-IDF, CountVectorizer).

-Euclidean: Ranges from 0 → ∞ (distance). -Cosine: Ranges from –1 → +1

image

Compare Cosine Vs Euclidean

image

Concept used to build the model.pkl file : cosine_similarity

1 . Cosine Similarity is a metric that allows you to measure the similarity of the documents.

2 . In order to demonstrate cosine similarity function we need vectors. Here vectors are numpy array.

3 . Finally, Once we have vectors, We can call cosine_similarity() by passing both vectors. It will calculate the cosine similarity between these two.

4 . It will be a value between [0,1]. If it is 0 then both vectors are complete different. But in the place of that if it is 1, It will be completely similar.

5 . For more details , check URL : https://www.learndatasci.com/glossary/cosine-similarity/

Results | STREAMLIT UI

1.Loads pre-processed data (movies.pkl and similarity.pkl) using pickle

2.Takes user movie input through a Streamlit dropdown and standardizes it (lowercase + strip) for accurate matching.

3.Finds the selected movie index and retrieves similarity scores from the cosine-similarity matrix.

4.Sorts similarity values and selects the top 5 most similar movies based on text-feature vectors.

5.Fetches posters from TMDB API using a safe retry-enabled request system and fills missing posters with placeholders.

6.Displays movie names and posters in a clean 5-column layout using Streamlit UI components

Website Link 🔗

workflow

workflow

workflow

Thanks Watching !!

About

Implemented a Content-based recommender using CountVectorizer on 4.8K+ TMDB movies, leveraging 5K-dimensional sparse vectors & selecting Cosine similarity over Euclidean distance for NLP relevance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published