Skip to content

End-to-end AI/ML system for NYC taxi fare prediction with multi-modal capabilities.

License

Notifications You must be signed in to change notification settings

misran3/nyc-scout

Repository files navigation

NYC Scout: Taxi Fare Predictor and AI Assistant

End-to-end ML system for NYC taxi fare prediction with multi-modal capabilities

A professional ML portfolio project featuring:

  • 🎯 XGBoost fare prediction with hyperparameter tuning
  • 🚀 Distributed training on Google Vertex AI
  • 🤖 RAG-powered NYC attractions chatbot
  • 🎙️ Multi-modal API (voice, text, chat)
  • 📊 Comprehensive data analysis with Plotly

🚀 Quick Start

Prerequisites

  • Python 3.12+
  • Google Cloud Platform account
  • gcloud CLI installed and authenticated

Installation

# Clone repository
git clone https://github.com/misran3/nyc-scout.git

# Create virtual environment
python -m venv venv
source venv/bin/activate 

# Install package
pip install -e .

# Copy environment template and edit with project details
cp .env.example .env

Setup GCP Infrastructure

# Authenticate with GCP
gcloud auth login
gcloud config set project YOUR_PROJECT_ID

# Run setup script to enable required APIs and create buckets
python scripts/setup_gcp_infrastructure.py \
  --project-id YOUR_PROJECT_ID \
  --region us-east1

Train a Model

# Local training (for testing)
python scripts/train.py \
  --max_depth 6 \
  --learning_rate 0.1 \
  --subsample 1.0 \
  --n_estimators 100

# Upload training package to GCS
python setup.py sdist
gsutil cp dist/*.tar.gz gs://YOUR_BUCKET/nyc-fare-predictor/dist/

# Launch Vertex AI hyperparameter tuning
gcloud ai hp-tuning-jobs create \
  --region=us-east1 \
  --display-name=nyc-fare-tuning-$(date +%m%d_%H%M) \
  --config=config/vertex_ai_training.yaml

Deploy Model

# Deploy trained model to Vertex AI endpoint
python scripts/deploy_model.py \
  --model-path gs://YOUR_BUCKET/path/to/model.bst \
  --model-name nyc-fare-xgboost \
  --endpoint-name nyc-fare-endpoint

# Update .env with endpoint ID
# VERTEX_AI_ENDPOINT_ID=<your-endpoint-id>

Setup RAG Knowledge Base

# Setup RAG corpus and upload knowledge base
python scripts/rag_pipeline.py --setup

# Test RAG system
python scripts/rag_pipeline.py --query "Tell me about museums in NYC"

📊 Architecture

Project Structure

nyc-scout/
├── src/                   # Core library code
│   ├── config.py          # Configuration management
│   ├── features/          # Feature engineering
│   ├── data/              # Data loading
│   ├── gcp/               # GCP utilities
│   └── rag/               # RAG pipeline
├── scripts/               # Executable scripts
│   ├── train.py           # Model training
│   ├── deploy_model.py    # Model deployment
│   ├── rag_pipeline.py    # RAG setup
│   └── setup_gcp_infrastructure.py
├── api/                   # Flask API (to be completed)
├── notebooks/             # Data analysis (to be completed)
├── knowledge_base/        # NYC attractions (17 files)
├── config/                # Configuration files
├── infrastructure/        # Deployment scripts (to be completed)
└── data/                  # Training data (Source: https://www.kaggle.com/competitions/new-york-city-taxi-fare-prediction)

About

End-to-end AI/ML system for NYC taxi fare prediction with multi-modal capabilities.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published