Skip to content

Inline090/Travel-Destination-Recommendation-system

Repository files navigation

Travel Destination Recommendation System

Overview

This project implements a scalable, hybrid travel recommendation system natively designed for PySpark and Databricks. It combines Content-Based Filtering (via user-driven UI widgets) and Collaborative Filtering (using the Alternating Least Squares or ALS algorithm) to deliver highly personalized, top-15 travel destination rankings.

The system dynamically handles strict filtering requirements and features intelligent fallback protocols if initial user constraints yield zero results, ensuring continued relevance and functionality without breaking.

Features

  • Hybrid Recommendation Architecture: Fuses content-based pre-filtering with matrix factorization.
  • Interactive UI Widgets: Built-in Databricks widgets allow users to define preferences:
    • Max Budget (USD/Day)
    • Climate Type
    • Travel Type
    • Preferred Season
  • Intelligent Fallback Logic: Automatically relaxes strict "Climate" and "Season" constraints sequentially if multi-factor limits cannot be met, protecting essential budget and travel-type requirements.
  • Synthetic Interaction Generation: Bootstraps the ALS model with dynamically generated baseline user interactions (15,000 synthetic ratings) across 1,000 simulated user parameters.
  • Evaluation Metrics: Validates predictive performance objectively using Root Mean Square Error (RMSE).
  • Rich Output Format: Joins predictions back to the core dataset to display 17 dimensions of relevant destination data (e.g., predicted ratings, safety indexes, average temperature, visa requirements, currency).

Prerequisites

  • Platform: Apache Spark / Databricks Platform.
  • Runtime: Recommended DBR (Databricks Runtime) 10.4 LTS ML or higher to guarantee standard pyspark.ml and dbutils widget support.

Dataset

The primary recommendation model relies on a destination details dataset named ds1_grok. Ensure the dataset features the following core dimensions to function properly:

  • country
  • destination
  • avg_cost_usd_per_day (Numeric)
  • travel_type
  • best_season
  • climate
  • And other dimensional columns required in final output (top_attraction, currency, safety_rating, etc.)

Note: The code first attempts to load the dataset as a native Databricks table (spark.table("ds1_grok")). If unavailable, it gracefully falls back to reading the raw CSV direct from DBFS (/FileStore/tables/ds1_grok.csv).

Usage Instructions

  1. Import the Code: Create a new PySpark Notebook in your Databricks workspace and paste the script.
  2. Mount/Upload Data: Register ds1_grok.csv as a Spark table or upload the file to DBFS at /FileStore/tables/.
  3. Configure Dashboard Widgets: Run the first block to instantiate the UI. Interactive dropdown widgets will appear at the top of your notebook. Select your active preferences.
  4. Execute to Train & Predict: Run the remainder of the notebook.
    • Pre-filtering isolates valid destinations.
    • ALS collaborative filtering trains on the remaining valid index.
    • An evaluation (RMSE) prints to standard out.
    • Ranked top-15 destinations render seamlessly via Databricks' display() function.

Pipeline Architecture

  1. Widget Ingestion: Captures stateful user preferences.
  2. Normalization: Concatenates and cleans destination strings, enforcing uniqueness.
  3. Content Filtering: Narrow downs destinations via strict threshold logic or fallback cascades.
  4. Collaborative Filtering (ALS):
    • Implements string indexing.
    • Evaluates train/test split via Spark ML.
    • Trains matrix parameters.
  5. Prediction & Window Ranking: Extrapolates predicted user affinity across active filtered locations, maximizing performance overhead.
  6. Data Presentation: Structures and explicitly types all final outputs for robust end-user display.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors