Skip to content

emirmasood/FilmRecommenderSystem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎬 Data Analysis, Sales Prediction, and Iranian Movies Recommendation System

This repository contains the official implementation of my Bachelor Thesis Project at Isfahan University of Technology. The project presents a data analysis and machine learning framework for predicting film sales and building a content-based recommender system tailored to Iranian cinema.


🧠 Overview

Over the last decade, data and artificial intelligence have revolutionized industries from healthcare to entertainment. Inspired by platforms such as Netflix and Filimo, this project explores how machine learning and data-driven systems can enhance Iran’s cinematic landscape.

The work integrates:

  • Data collection from Iranian film databases (Soureh Cinema, Cinematicket, etc.)
  • Exploratory and statistical analysis to uncover viewing and revenue patterns
  • Predictive modeling for box office success
  • Content-based recommendation using textual similarity
  • Interactive visualization through Power BI and Streamlit

📂 Structure


.
├── data/                          # datasets
├── notebooks/                     # Jupyter notebooks for EDA, ML, and recommender
├── src/                           # Python source code
├── reports/                       # thesis & visual report
├── demo/                          # demo video
├── requirements.txt               # Python dependencies
└── README.md                      # project overview


📊 Dataset

The dataset covers Iranian films from 2011–2021
(Decade 1390–1400 SH), collected and cleaned from multiple reliable sources:

🎬 Features (16 total)

Type Example Variables Description
Qualitative Title, Genre, Director, Stars Film metadata
Quantitative Sale, Audience Count, IMDb Rating, Duration Numerical predictors
Popularity Indicators Instagram Followers of Lead Actors Measures celebrity influence

🧩 Methodology

1. Data Collection

  • Web scraping and manual compilation from national cinema sources
  • Cleaning, normalization, and handling of missing values

2. Data Visualization

  • Correlation matrices and scatter plots for feature relationships
  • Trend analysis of top-grossing genres, directors, and actors

3. Data Preprocessing

  • Outlier detection and treatment
  • Missing value imputation using Random Forest Regression
  • One-Hot Encoding for categorical features
  • Normalization of numerical variables

4. Machine Learning Models

  • Logistic Regression – baseline classifier
  • Stochastic Gradient Descent – efficient linear model
  • K-Nearest Neighbors (KNN) – achieved best overall F1 = 0.75
  • Random Forest – strong interpretability, F1 ≈ 0.74
  • Gradient Boosting – moderate performance (F1 ≈ 0.65)

The KNN classifier was identified as the optimal predictive model for film sales success.

5. Recommendation System

A content-based recommender was built using:

  • TF-IDF to vectorize film metadata (genre, director, year, stars)
  • Cosine Similarity to measure similarity between films

The system returns the top-N (default = 5) most similar movies to a selected title, handling typos via fuzzy string matching.

6. Visualization and Deployment

  • Power BI dashboard for data insights
  • Streamlit application for real-time recommendation demo

🧪 Results

  • Best Predictive Model: K-Nearest Neighbors
    • F1 Score = 0.75, Accuracy = 0.72
  • Most Influential Factors:
    • Genre (especially Comedy and Social)
    • Film Duration
    • Ticket Price
    • Popularity of Lead Actors
  • Recommender Validation: produced thematically coherent recommendations outperforming Filimo’s native system.

👥 Author & Supervision

Author:
Amir Masoud Almasi

Supervisor:
Prof. Reyhaneh Reikhtehgaran
Department of Mathematical Sciences
Isfahan University of Technology

About

Movie Recommendation and Sales Prediction via Machine Learning and Predictive Analytics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors