Netflicks - Movie Recommendation System

A machine learning-based movie recommendation system that provides personalized movie suggestions to users. The system integrates collaborative filtering, content-based filtering, and popularity-based models within a microservices architecture. It features real-time data ingestion, model training, CI/CD, and a REST API.

System Architecture

The Netflicks recommendation system follows a modular and scalable microservices architecture with the following key components:

1. Data Pipeline

Data Collection: Raw movie and user interaction data is simulated and stored in the data/ directory.
Kafka Integration: Simulated real-time event streams including movie ratings, watch logs, and recommendation requests.
Preprocessing: Data cleaning, transformation, genre standardization, and one-hot encoding in the preprocessing/ module.
Database: PostgreSQL stores normalized and validated movie, user, rating, and watch history data in the db/ directory.

2. Model Training Pipeline

Feature Engineering: Constructs user-item sparse matrices and genre vectors.
Model Training: Implements collaborative filtering (ALS), content-based filtering (genre similarity), and popularity models in model_training/.
Model Storage: Trained models and vectors saved as versioned .pkl files in the models/ directory using MLflow.
Model Evaluation: Performance metrics tracked using offline (RMSE, HitRate) and online evaluation.

3. API Service

Flask Server: REST API implemented in api/ to serve recommendations.
Model Serving: Dynamically serves predictions using CF, CBF, or popularity models depending on user history.
Docker Containerization: API, training, and validation workflows containerized for consistent deployment.
Load Balancing: Designed to handle concurrent user requests.

4. Testing & Quality Assurance

Unit Tests: Component-level testing in testing/.
Pipeline Testing: Simulates API calls and validates recommendation quality.
CI/CD: Automated retraining, validation, and deployment using Jenkins.

Data Flow

Kafka → Preprocessing → PostgreSQL
PostgreSQL → Model Training → Model Artifacts
API Request → Inference Engine → Recommendations

Project Structure

.
├── api/                 # API server implementation
├── data/                # Simulated and processed data storage
├── db/                  # PostgreSQL schema and database manager
├── kafka_import/        # Kafka stream simulator
├── model_training/      # Model training scripts for CF, CBF, Popularity
├── models/              # Trained model and vector storage
├── pipeline/            # Offline and online evaluation scripts
├── preprocessing/       # Data transformation, formatting, and loading
└── testing/             # Unit and integration test scripts

Prerequisites

Python 3.10
Docker
pip

Installation

Clone the repository:

git clone <repository-url>
cd Netflicks

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.run.txt      # For running the API
pip install -r requirements.train.txt    # For training the model
pip install -r requirements.data.txt     # For data processing

Environment Setup

Create a .env file in the root directory with the following variables:

# Add your environment variables here

Running the Application

Using Docker

Build the Docker image:

docker build -f Dockerfile.run -t netflicks-api .

Run the container:

docker run -p 8082:8082 -v $(pwd)/models:/app/models netflicks-api

Running Locally

Start the API server:

python api/server.py

The API will be available at http://localhost:8082

API Endpoints

GET /recommend/{user_id}: Get movie recommendations for a specific user

Development

Training Pipeline

Data Preprocessing:

python preprocessing/preprocess.py

Model Training:

python model_training/train.py

Testing

Run tests using:

python -m pytest testing/

CI/CD

This project uses Jenkins for CI/CD. The pipeline performs the following steps:

Pulls the latest data version using DVC
Trains models and logs them to MLflow
Validates and deploys models via Docker containers
Starts monitoring services (Prometheus, Grafana)

Configuration:

Jenkinsfile defines stages for preprocessing, training, validation, API testing, and monitoring.
Containers: Dockerfile.train, Dockerfile.validate, Dockerfile.run
Monitoring: Prometheus scrapes metrics; Grafana displays dashboards

Benefits:

Builds triggered on code commits or daily schedule
Secrets handled via Jenkins credentials
Consistent environments across dev, test, and prod

Monitoring Dashboards:

Request latency histogram
Recommendation count per user
Recommendation item count
User segment activity breakdown

MLflow & DVC:

MLflow tracks metrics, parameters, and artifacts for each model
DVC manages dataset versions

Check out the demo: Netflicks Demo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Netflicks - Movie Recommendation System

System Architecture

1. Data Pipeline

2. Model Training Pipeline

3. API Service

4. Testing & Quality Assurance

Data Flow

Project Structure

Prerequisites

Installation

Environment Setup

Running the Application

Using Docker

Running Locally

API Endpoints

Development

Training Pipeline

Testing

CI/CD

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 409 Commits
api		api
db		db
evaluation		evaluation
images		images
kafka_import		kafka_import
pipeline		pipeline
preprocessing		preprocessing
testing		testing
.gitignore		.gitignore
Dockerfile.data		Dockerfile.data
Dockerfile.offline		Dockerfile.offline
Dockerfile.online		Dockerfile.online
Dockerfile.run		Dockerfile.run
Dockerfile.train		Dockerfile.train
Dockerfile.validate		Dockerfile.validate
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
config.py		config.py
dbtocsv.py		dbtocsv.py
prometheus.yml		prometheus.yml
requirements.data.txt		requirements.data.txt
requirements.run.txt		requirements.run.txt
requirements.train.txt		requirements.train.txt

License

RAI-Recom/Netflicks

Folders and files

Latest commit

History

Repository files navigation

Netflicks - Movie Recommendation System

System Architecture

1. Data Pipeline

2. Model Training Pipeline

3. API Service

4. Testing & Quality Assurance

Data Flow

Project Structure

Prerequisites

Installation

Environment Setup

Running the Application

Using Docker

Running Locally

API Endpoints

Development

Training Pipeline

Testing

CI/CD

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages