A memory-based collaborative filtering recommendation system that predicts user movie ratings using both item-based and user-based approaches. The system implements cosine similarity and Pearson correlation metrics to find similar items or users, enabling personalized movie recommendations based on neighborhood-based predictions.
git clone https://github.com/Khoa-BOB/Netflix_RecommendationSystem.git
cd Netflix_RecommendationSystempython3 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtpython main.pypython main.py --k <neighborhood_size> --metrics <similarity_metric> --cf_type <cf_type>Available parameters:
--k: Neighborhood size (default: 10)- Example values: 3, 6, 9, 12, 15, 20
--metrics: Similarity metric (default: cosine)- Options:
cosineorpearson
- Options:
--cf_type: Collaborative filtering type (default: item)- Options:
itemoruser
- Options:
Examples:
# Item-based CF with cosine similarity and k=20
python main.py --k 20 --metrics cosine --cf_type item
# User-based CF with pearson correlation and k=10
python main.py --k 10 --metrics pearson --cf_type user
# Item-based CF with pearson and k=15
python main.py --k 15 --metrics pearson --cf_type itemTo test multiple k values across all CF types and similarity metrics:
# For local testing
python run_k_experiments.py
# For HPC cluster submission (SLURM)
sbatch submit_k_experiments.sbatchThis will test 24 configurations and generate detailed results.
| k | MSE | RMSE | MAE |
|---|---|---|---|
| 3 | 1.537 | 1.240 | 0.933 |
| 6 | 1.529 | 1.236 | 0.934 |
| 9 | 1.534 | 1.238 | 0.937 |
| 12 | 1.531 | 1.237 | 0.936 |
| 15 | 1.530 | 1.237 | 0.935 |
| 20 | 1.530 | 1.237 | 0.934 |
Best k for Item-Cosine: k=20 (MSE: 1.530)
| k | MSE | RMSE | MAE |
|---|---|---|---|
| 3 | 1.560 | 1.249 | 0.940 |
| 6 | 1.552 | 1.246 | 0.934 |
| 9 | 1.556 | 1.248 | 0.937 |
| 12 | 1.555 | 1.247 | 0.938 |
| 15 | 1.554 | 1.246 | 0.938 |
| 20 | 1.552 | 1.246 | 0.937 |
Best k for Item-Pearson: k=20 (MSE: 1.552)
| k | MSE | RMSE | MAE |
|---|---|---|---|
| 3 | 1.663 | 1.290 | 1.015 |
| 6 | 1.622 | 1.274 | 1.012 |
| 9 | 1.620 | 1.273 | 1.013 |
| 12 | 1.629 | 1.276 | 1.020 |
| 15 | 1.624 | 1.274 | 1.021 |
| 20 | 1.613 | 1.270 | 1.018 |
Best k for User-Cosine: k=20 (MSE: 1.613)
| k | MSE | RMSE | MAE |
|---|---|---|---|
| 3 | 1.635 | 1.279 | 1.002 |
| 6 | 1.624 | 1.274 | 0.999 |
| 9 | 1.626 | 1.275 | 0.999 |
| 12 | 1.626 | 1.275 | 0.999 |
| 15 | 1.626 | 1.275 | 0.999 |
| 20 | 1.626 | 1.275 | 0.999 |
Best k for User-Pearson: k=6 (MSE: 1.624)
Configuration: ITEM-based CF with COSINE similarity k: 20 MSE: 1.530 RMSE: 1.237 MAE: 0.934
- Item-based CF outperforms User-based CF across all configurations
- Cosine similarity performs slightly better than Pearson for both CF types
- MSE generally improves with larger k values, with diminishing returns after k=6
- Item-Cosine with k=20 achieved the lowest MSE of 1.530
- User-based methods show higher error metrics (MSE ~1.6) compared to item-based (MSE ~1.5)
Results generated from: k_experiments_results_20251125_073304.json
Empirical Analysis of Predictive Algorithms for Collaborative Filtering