Skip to content

basita512/RealTime_AI_Voice_Orchestrator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎙️ Realtime AI Voice Orchestrator

A production-ready voice AI system combining LiveKit WebRTC, RAG retrieval, and multi-model inference — all in your browser.

LiveKit Qdrant


Realtime AI Voice Agent UI

Live transcript, RAG sources, knowledge base, and system prompt editor


Architecture

flowchart LR
    subgraph Browser["Browser  (React + Vite)"]
        direction TB
        UI["Voice Call UI"]
        TP["Transcript Panel"]
        RP["RAG Sources Panel"]
        KB["Knowledge Base"]
        PE["System Prompt Editor"]
    end

    subgraph FastAPI["FastAPI Server"]
        direction TB
        T["/api/livekit"]
        D["/api/kb"]
        P["/api/prompt"]
        R["/api/rag"]
    end

    subgraph Agent["LiveKit AI Agent"]
        direction TB
        STT["Groq STT (Whisper)"]
        LLM["Groq LLM (Llama 3)"]
        TTS["Deepgram TTS (Aura)"]
        VAD["Silero VAD"]
        CB["on_user_turn_completed\nLive Prompt Reload + RAG Injection"]
    end

    subgraph RAG["RAG Pipeline"]
        direction LR
        PDF["PDF Parser"]
        CHK["Semantic Chunker"]
        EMB["BAAI/bge Embedder"]
        VDB[("Qdrant Vector DB")]
        RRK["Cross-Encoder Reranker"]
    end

    subgraph Cloud["External Services"]
        direction TB
        LK["LiveKit Cloud (WebRTC)"]
        GQ["Groq API"]
        DG["Deepgram API"]
    end

    Browser <-->|"REST"| FastAPI
    Browser <-->|"WebRTC Audio"| LK
    Agent   <-->|"RTC Session"| LK
    STT     -->  GQ
    LLM     -->  GQ
    CB      -->  GQ
    TTS     -->  DG
    CB      -->|"query"| PDF
    PDF --> CHK --> EMB --> VDB
    VDB -->|"top-K"| RRK
    RRK -->|"context"| CB
    Agent   -->|"POST /api/rag/push"| R
    Browser <-->|"poll /api/rag/latest"| R

    style Browser  fill:#1e3a5f,stroke:#3b82f6,color:#fff
    style FastAPI  fill:#1e2d4e,stroke:#6366f1,color:#fff
    style Agent    fill:#3b1f5e,stroke:#a855f7,color:#fff
    style RAG      fill:#1a3a3a,stroke:#14b8a6,color:#fff
    style Cloud    fill:#1a2e1a,stroke:#22c55e,color:#fff
Loading

Features

Feature Description
Real-time Voice WebRTC audio via LiveKit — sub-second latency
RAG Q&A Upload PDFs → chunk → embed → retrieve → answer
Live Transcript Speaker-labelled transcript streamed in real time
RAG Source Panel See exactly which document pages the agent used
Live Prompt Editor Change the agent's personality mid-call — no restart needed
Knowledge Base UI Upload / delete documents with ingestion status
Two-stage Retrieval Bi-encoder ANN (Qdrant) + Cross-Encoder reranker
Dedup & Cleanup Auto-removes orphaned Qdrant vectors on startup

Tech Stack

Backend

Layer Technology
API Server FastAPI + Uvicorn
Voice Agent LiveKit Agents SDK
STT Groq Whisper (livekit-plugins-groq)
LLM Groq Llama 3 (livekit-plugins-groq)
TTS Deepgram Aura (livekit-plugins-deepgram)
VAD Silero VAD (livekit-plugins-silero)
Embedder BAAI/bge-large-en-v1.5 (local, sentence-transformers)
Reranker cross-encoder/ms-marco-MiniLM-L-6-v2 (local)
Vector DB Qdrant (local Docker)
PDF Parser PyMuPDF

Frontend

Layer Technology
Framework React 18 + Vite + TypeScript
Voice @livekit/components-react, livekit-client
Styling Tailwind CSS
Icons Lucide React

RAG Pipeline

The retrieval pipeline runs as a two-stage process triggered on every user turn, inside on_user_turn_completed:

User query
    │
    ▼
[Stage 1]  Bi-Encoder ANN Search
    Embed query with BAAI/bge-large-en-v1.5
    → cosine ANN search in Qdrant  (top-K candidates, default K=20)
    │
    ▼
[Stage 1.5]  Deduplication
    Group candidates by text[:200] — keeps highest bi-score per unique chunk
    Eliminates duplicates from repeated document uploads
    │
    ▼
[Stage 2]  Cross-Encoder Reranking
    Score each (query, chunk) pair with cross-encoder/ms-marco-MiniLM-L-6-v2
    Re-sort by cross-encoder score  →  top-N returned (default N=3)
    │
    ▼
[Stage 2.5]  Post-Rerank Dedup (safety net)
    Deduplicate by chunk_id — catches any duplicates with differing leading text
    │
    ▼
Injected as a system message into ChatContext
(placed just before the user message via created_at ordering)

Ingestion pipeline (triggered on PDF upload):

PDF upload → PyMuPDF page extraction
    → sentence tokenisation (NLTK punkt)
    → sliding-window chunking (max 200 tokens, 50-token overlap)
    → batch embedding with BAAI/bge-large-en-v1.5
    → upsert to Qdrant (stable chunk_id hash as point ID)

On every backend startup, orphaned Qdrant vectors (doc_ids not in doc_metadata.json) are automatically purged.

Project Structure

RealtimeAIVoice/
├── backend/
│   ├── agent.py               # LiveKit Agent — STT/LLM/TTS + RAG injection
│   ├── main.py                # FastAPI app — CORS, routers, startup cleanup
│   ├── system_prompt.txt      # Agent personality (editable live in UI)
│   ├── api/
│   │   ├── kb.py              # /api/kb — upload, list, delete, cleanup
│   │   ├── prompt.py          # /api/prompt — read/write system prompt
│   │   ├── rag_sources.py     # /api/rag — push + poll RAG events
│   │   └── livekit_token.py   # /api/livekit — JWT token generation
│   └── rag/
│       ├── parser.py          # PDF page extraction
│       ├── chunker.py         # Sliding-window semantic chunking
│       ├── embedder.py        # BAAI/bge-large-en-v1.5 embeddings
│       ├── store.py           # Qdrant upsert / search / orphan cleanup
│       ├── reranker.py        # Cross-Encoder reranking
│       ├── retriever.py       # Full pipeline: ANN → dedup → rerank
│       └── ingestor.py        # Orchestrates the ingestion flow
├── Frontend/
│   └── src/
│       ├── App.tsx
│       ├── components/
│       │   ├── CallControl.tsx
│       │   ├── TranscriptPanel.tsx
│       │   ├── RagSourcesPanel.tsx
│       │   ├── KnowledgeBase.tsx
│       │   └── PromptEditor.tsx
│       └── hooks/
│           ├── useRagSources.ts
│           └── useKnowledgeBase.ts
└── docs/
    └── image.png              # UI screenshot

Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • Qdrant running on localhost:6333
  • API keys: LiveKit, Groq, Deepgram

1. Clone & configure

git clone <repo-url>
cd RealtimeAIVoice
cp backend/.env.example backend/.env

Edit backend/.env:

LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
GROQ_API_KEY=your_groq_key
DEEPGRAM_API_KEY=your_deepgram_key

2. Start Qdrant

docker run -p 6333:6333 qdrant/qdrant

3. Start the backend

cd backend
run_backend.bat          # Windows
# or:
uv run uvicorn main:app --reload --port 8000

4. Start the LiveKit Agent

cd backend
uv run python agent.py start

5. Start the frontend

cd Frontend
npm install
npm run dev              # http://localhost:5173

How It Works

User speaks → Groq Whisper (STT) → on_user_turn_completed()
                                         │
                           ┌─────────────┴──────────────┐
                           │                            │
                    Live prompt reload           RAG retrieval
                    (re-reads system_prompt.txt)  (embed → Qdrant → rerank)
                           │                            │
                           └────────── Groq LLM ────────┘
                                           │
                               Deepgram Aura (TTS) → User hears answer
                                           │
                               POST /api/rag/push → Frontend polls → RAG panel updates

Environment Variables

Variable Required Description
LIVEKIT_URL Yes LiveKit server WebSocket URL
LIVEKIT_API_KEY Yes LiveKit API key
LIVEKIT_API_SECRET Yes LiveKit API secret
GROQ_API_KEY Yes Groq API key (STT + LLM)
DEEPGRAM_API_KEY Yes Deepgram API key (TTS)
QDRANT_URL No Qdrant URL (default: http://localhost:6333)
RAG_TOP_K No ANN candidates from Qdrant (default: 20)
RAG_TOP_N_RERANK No Final chunks after reranking (default: 3)
CHUNK_MAX_TOKENS No Max tokens per chunk (default: 200)
CHUNK_OVERLAP_TOKENS No Token overlap between chunks (default: 50)
CROSS_ENCODER_MODEL No Reranker model name (default: cross-encoder/ms-marco-MiniLM-L-6-v2)

Demo

A live demo of the application is available here: Live Demo


Built with LiveKit · Groq · Deepgram · Qdrant

About

AI voice system with LiveKit WebRTC answering pdf/docs using RAG (<95% accuracy)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors