RAG Voice Assistant with VideoSDK

A voice assistant that integrates VideoSDK, a RAG pipeline, and two custom APIs for document handling: one to upload PDFs to a vector database and another to search relevant content.

The system provides a full voice flow:

STT → RAG (docs) → LLM → TTS

with a custom VideoSDK plugin for seamless integration.

Demo Video : https://drive.google.com/file/d/1wGir_yHdMNhSbtloMPoyEjJMhPx9B6Nc/view?usp=sharing

Features

Voice Interaction: Real-time speech recognition using Deepgram STT.
Document Retrieval: Upload PDFs to a Qdrant vector database and retrieve context for LLM responses.
RAG Pipeline: Retrieves top-k relevant chunks from uploaded documents for more accurate answers.
Text-to-Speech: ElevenLabs TTS reads out LLM responses.
Custom VideoSDK Plugin: Integrates STT, RAG, LLM, and TTS in a single cascading pipeline.
Interruptible Conversation: User speech interrupts ongoing TTS or LLM generation.

Setup

Sample .env

OPENAI_API_KEY=<your_openai_api_key>
DEEPGRAM_API_KEY=<your_deepgram_api_key>
ELEVENLABS_API=<your_elevenlabs_api_key>
ROOM_ID=<videosdk_room_id>
AUTH_TOKEN=<videosdk_auth_token>
VECTOR_DB_URL=<qdrant_url>
VECTOR_DB_API_KEY=<qdrant_api_key>

Locally run qdrant

docker pull qdrant/qdrant

Create and activate virtual env

py -m venv venv

venv\Scripts\activate

Install requirements

pip install -r requirements.txt

For document upload and retrieve

cd backend/
uvicorn main:app --reload --port 8000

For Voice Agent (in console)

python voice_agent.py console

Sample Queries

user_input: Guide to Videosdk integration with rag pipeline?
agent_output: Build an AI Agent with RAG using VideoSDK Agents SDK Goal Your task is to build a voice AI agent using the VideoSDK Agents SDK.T......   (from document uploaded)

user_input: Explain human heart?
agent_output: The human heart is a muscular organ, roughly the size of a fist, located in the chest that pumps blood throughout the body... (llm response)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
backend		backend
docs		docs
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
voice_agent.py		voice_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Voice Assistant with VideoSDK

Demo Video : https://drive.google.com/file/d/1wGir_yHdMNhSbtloMPoyEjJMhPx9B6Nc/view?usp=sharing

Features

Setup

Sample Queries

About

Uh oh!

Releases

Packages

Languages

nandinigthub/VoiceAssistant

Folders and files

Latest commit

History

Repository files navigation

RAG Voice Assistant with VideoSDK

Demo Video : https://drive.google.com/file/d/1wGir_yHdMNhSbtloMPoyEjJMhPx9B6Nc/view?usp=sharing

Features

Setup

Sample Queries

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages