Skip to content

siddhigate/video-query-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video Query AI: Search your videos like you search text 🧠🎥

Overview

A fully local, privacy-first app that helps you semantically search through videos using natural language. Simply upload a video, and we’ll do the heavy lifting to process frames, generate descriptions using AI, embed them, and allow fast search.

✨ Features

  • 🔍 Semantic search: Find scenes by describing them in plain English
  • 📤 Drag-and-drop video upload
  • 🖼️ Instant timestamps & thumbnails for search results
  • ⚡ Real-time resumable processing updates
  • 🔒 100% local, privacy-first architecture

App Walkthrough ✨

1️⃣ 🏠 Home & Video Library

  • Land on a clean dashboard with a sidebar listing all uploaded videos.
  • Browse or search through your video collection easily.

2️⃣ 📤 Video Upload & Processing

  • Drag and drop a video file for processing.
  • The backend kicks off a background job to:
    • Extract frames using ffmpeg
    • Generate descriptions via LLaVA
    • Generate vector embeddings of those descriptions for semantic search
  • Real-time progress updates
    • Updates are streamed live via WebSockets.
    • Progress persists across reloads using Redis Pub/Sub for state sync

3️⃣ 🔍 Natural Language Search

  • Enter a query like "Where is the elephant?" or "Chef chopping onions"
  • The app performs a vector similarity search against frame captions.
  • You’ll get timestamps + thumbnails of the best matching moments in the video.

🛠 Under the Hood

Video Query AI follows a modular architecture built by:

  • Frontend: React + TypeScript with Vite and React Router
  • Backend: FastAPI serving REST and WebSocket endpoints
  • Job Queue: Redis + RQ for background processing
  • Embedding Store: ChromaDB for vector search
  • Realtime Updates: WebSockets with Redis Pub/Sub for progress tracking and resumable streams

📼 Video Processing Flow

Here’s what happens behind the scenes when you upload a video:

  1. Upload
    • File is saved to disk and its metadata is stored in ChromaDB.
  2. Job Queuing
    • A video processing job is pushed to a Redis Queue and handled asynchronously by a worker.
  3. Frame Extraction
    • Frames are extracted from the video using ffmpeg.
  4. Frame Analysis
    • Each frame is sent through LLaVA (via Ollama) to describe it.
    • Description is embedded into a vector using a sentence transformer.
  5. Storage
    • Vector embeddings + metadata is stored in ChromaDB.
  6. Progress Updates
    • Real-time progress is sent to the frontend via WebSockets + Redis PubSub.

🔍 Search Flow

Users can search across:

  • All uploaded videos
  • A single selected video

When a query is made:

  1. The backend embeds the query using the same embedding model.
  2. A vector similarity search is performed in ChromaDB.
  3. Top 10 closest matches (timestamps + thumbnails) are returned.

🔁 Real-Time Progress & Resumability

Even if the user refreshes the page mid-processing:

  • The frontend reconnects via WebSocket.
  • The backend reads the current job state from Redis and resumes updates seamlessly.

🛠 Useful Links

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published