🎙️ Realtime AI Voice Orchestrator

A production-ready voice AI system combining LiveKit WebRTC, RAG retrieval, and multi-model inference — all in your browser.

Live transcript, RAG sources, knowledge base, and system prompt editor

Architecture

flowchart LR
    subgraph Browser["Browser  (React + Vite)"]
        direction TB
        UI["Voice Call UI"]
        TP["Transcript Panel"]
        RP["RAG Sources Panel"]
        KB["Knowledge Base"]
        PE["System Prompt Editor"]
    end

    subgraph FastAPI["FastAPI Server"]
        direction TB
        T["/api/livekit"]
        D["/api/kb"]
        P["/api/prompt"]
        R["/api/rag"]
    end

    subgraph Agent["LiveKit AI Agent"]
        direction TB
        STT["Groq STT (Whisper)"]
        LLM["Groq LLM (Llama 3)"]
        TTS["Deepgram TTS (Aura)"]
        VAD["Silero VAD"]
        CB["on_user_turn_completed\nLive Prompt Reload + RAG Injection"]
    end

    subgraph RAG["RAG Pipeline"]
        direction LR
        PDF["PDF Parser"]
        CHK["Semantic Chunker"]
        EMB["BAAI/bge Embedder"]
        VDB[("Qdrant Vector DB")]
        RRK["Cross-Encoder Reranker"]
    end

    subgraph Cloud["External Services"]
        direction TB
        LK["LiveKit Cloud (WebRTC)"]
        GQ["Groq API"]
        DG["Deepgram API"]
    end

    Browser <-->|"REST"| FastAPI
    Browser <-->|"WebRTC Audio"| LK
    Agent   <-->|"RTC Session"| LK
    STT     -->  GQ
    LLM     -->  GQ
    CB      -->  GQ
    TTS     -->  DG
    CB      -->|"query"| PDF
    PDF --> CHK --> EMB --> VDB
    VDB -->|"top-K"| RRK
    RRK -->|"context"| CB
    Agent   -->|"POST /api/rag/push"| R
    Browser <-->|"poll /api/rag/latest"| R

    style Browser  fill:#1e3a5f,stroke:#3b82f6,color:#fff
    style FastAPI  fill:#1e2d4e,stroke:#6366f1,color:#fff
    style Agent    fill:#3b1f5e,stroke:#a855f7,color:#fff
    style RAG      fill:#1a3a3a,stroke:#14b8a6,color:#fff
    style Cloud    fill:#1a2e1a,stroke:#22c55e,color:#fff

Features

Feature	Description
Real-time Voice	WebRTC audio via LiveKit — sub-second latency
RAG Q&A	Upload PDFs → chunk → embed → retrieve → answer
Live Transcript	Speaker-labelled transcript streamed in real time
RAG Source Panel	See exactly which document pages the agent used
Live Prompt Editor	Change the agent's personality mid-call — no restart needed
Knowledge Base UI	Upload / delete documents with ingestion status
Two-stage Retrieval	Bi-encoder ANN (Qdrant) + Cross-Encoder reranker
Dedup & Cleanup	Auto-removes orphaned Qdrant vectors on startup

Tech Stack

Backend

Layer	Technology
API Server	FastAPI + Uvicorn
Voice Agent	LiveKit Agents SDK
STT	Groq Whisper (`livekit-plugins-groq`)
LLM	Groq Llama 3 (`livekit-plugins-groq`)
TTS	Deepgram Aura (`livekit-plugins-deepgram`)
VAD	Silero VAD (`livekit-plugins-silero`)
Embedder	`BAAI/bge-large-en-v1.5` (local, sentence-transformers)
Reranker	`cross-encoder/ms-marco-MiniLM-L-6-v2` (local)
Vector DB	Qdrant (local Docker)
PDF Parser	PyMuPDF

Frontend

Layer	Technology
Framework	React 18 + Vite + TypeScript
Voice	`@livekit/components-react`, `livekit-client`
Styling	Tailwind CSS
Icons	Lucide React

RAG Pipeline

The retrieval pipeline runs as a two-stage process triggered on every user turn, inside on_user_turn_completed:

User query
    │
    ▼
[Stage 1]  Bi-Encoder ANN Search
    Embed query with BAAI/bge-large-en-v1.5
    → cosine ANN search in Qdrant  (top-K candidates, default K=20)
    │
    ▼
[Stage 1.5]  Deduplication
    Group candidates by text[:200] — keeps highest bi-score per unique chunk
    Eliminates duplicates from repeated document uploads
    │
    ▼
[Stage 2]  Cross-Encoder Reranking
    Score each (query, chunk) pair with cross-encoder/ms-marco-MiniLM-L-6-v2
    Re-sort by cross-encoder score  →  top-N returned (default N=3)
    │
    ▼
[Stage 2.5]  Post-Rerank Dedup (safety net)
    Deduplicate by chunk_id — catches any duplicates with differing leading text
    │
    ▼
Injected as a system message into ChatContext
(placed just before the user message via created_at ordering)

Ingestion pipeline (triggered on PDF upload):

PDF upload → PyMuPDF page extraction
    → sentence tokenisation (NLTK punkt)
    → sliding-window chunking (max 200 tokens, 50-token overlap)
    → batch embedding with BAAI/bge-large-en-v1.5
    → upsert to Qdrant (stable chunk_id hash as point ID)

On every backend startup, orphaned Qdrant vectors (doc_ids not in doc_metadata.json) are automatically purged.

Project Structure

RealtimeAIVoice/
├── backend/
│   ├── agent.py               # LiveKit Agent — STT/LLM/TTS + RAG injection
│   ├── main.py                # FastAPI app — CORS, routers, startup cleanup
│   ├── system_prompt.txt      # Agent personality (editable live in UI)
│   ├── api/
│   │   ├── kb.py              # /api/kb — upload, list, delete, cleanup
│   │   ├── prompt.py          # /api/prompt — read/write system prompt
│   │   ├── rag_sources.py     # /api/rag — push + poll RAG events
│   │   └── livekit_token.py   # /api/livekit — JWT token generation
│   └── rag/
│       ├── parser.py          # PDF page extraction
│       ├── chunker.py         # Sliding-window semantic chunking
│       ├── embedder.py        # BAAI/bge-large-en-v1.5 embeddings
│       ├── store.py           # Qdrant upsert / search / orphan cleanup
│       ├── reranker.py        # Cross-Encoder reranking
│       ├── retriever.py       # Full pipeline: ANN → dedup → rerank
│       └── ingestor.py        # Orchestrates the ingestion flow
├── Frontend/
│   └── src/
│       ├── App.tsx
│       ├── components/
│       │   ├── CallControl.tsx
│       │   ├── TranscriptPanel.tsx
│       │   ├── RagSourcesPanel.tsx
│       │   ├── KnowledgeBase.tsx
│       │   └── PromptEditor.tsx
│       └── hooks/
│           ├── useRagSources.ts
│           └── useKnowledgeBase.ts
└── docs/
    └── image.png              # UI screenshot

Quick Start

Prerequisites

Python 3.11+
Node.js 18+
Qdrant running on localhost:6333
API keys: LiveKit, Groq, Deepgram

1. Clone & configure

git clone <repo-url>
cd RealtimeAIVoice
cp backend/.env.example backend/.env

Edit backend/.env:

LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
GROQ_API_KEY=your_groq_key
DEEPGRAM_API_KEY=your_deepgram_key

2. Start Qdrant

docker run -p 6333:6333 qdrant/qdrant

3. Start the backend

cd backend
run_backend.bat          # Windows
# or:
uv run uvicorn main:app --reload --port 8000

4. Start the LiveKit Agent

cd backend
uv run python agent.py start

5. Start the frontend

cd Frontend
npm install
npm run dev              # http://localhost:5173

How It Works

User speaks → Groq Whisper (STT) → on_user_turn_completed()
                                         │
                           ┌─────────────┴──────────────┐
                           │                            │
                    Live prompt reload           RAG retrieval
                    (re-reads system_prompt.txt)  (embed → Qdrant → rerank)
                           │                            │
                           └────────── Groq LLM ────────┘
                                           │
                               Deepgram Aura (TTS) → User hears answer
                                           │
                               POST /api/rag/push → Frontend polls → RAG panel updates

Environment Variables

Variable	Required	Description
`LIVEKIT_URL`	Yes	LiveKit server WebSocket URL
`LIVEKIT_API_KEY`	Yes	LiveKit API key
`LIVEKIT_API_SECRET`	Yes	LiveKit API secret
`GROQ_API_KEY`	Yes	Groq API key (STT + LLM)
`DEEPGRAM_API_KEY`	Yes	Deepgram API key (TTS)
`QDRANT_URL`	No	Qdrant URL (default: `http://localhost:6333`)
`RAG_TOP_K`	No	ANN candidates from Qdrant (default: `20`)
`RAG_TOP_N_RERANK`	No	Final chunks after reranking (default: `3`)
`CHUNK_MAX_TOKENS`	No	Max tokens per chunk (default: `200`)
`CHUNK_OVERLAP_TOKENS`	No	Token overlap between chunks (default: `50`)
`CROSS_ENCODER_MODEL`	No	Reranker model name (default: `cross-encoder/ms-marco-MiniLM-L-6-v2`)

Demo

A live demo of the application is available here: Live Demo

Built with LiveKit · Groq · Deepgram · Qdrant

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Frontend		Frontend
backend		backend
docs		docs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Realtime AI Voice Orchestrator

Architecture

Features

Tech Stack

Backend

Frontend

RAG Pipeline

Project Structure

Quick Start

Prerequisites

1. Clone & configure

2. Start Qdrant

3. Start the backend

4. Start the LiveKit Agent

5. Start the frontend

How It Works

Environment Variables

Demo

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ Realtime AI Voice Orchestrator

Architecture

Features

Tech Stack

Backend

Frontend

RAG Pipeline

Project Structure

Quick Start

Prerequisites

1. Clone & configure

2. Start Qdrant

3. Start the backend

4. Start the LiveKit Agent

5. Start the frontend

How It Works

Environment Variables

Demo

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages