A production-ready voice AI system combining LiveKit WebRTC, RAG retrieval, and multi-model inference — all in your browser.
flowchart LR
subgraph Browser["Browser (React + Vite)"]
direction TB
UI["Voice Call UI"]
TP["Transcript Panel"]
RP["RAG Sources Panel"]
KB["Knowledge Base"]
PE["System Prompt Editor"]
end
subgraph FastAPI["FastAPI Server"]
direction TB
T["/api/livekit"]
D["/api/kb"]
P["/api/prompt"]
R["/api/rag"]
end
subgraph Agent["LiveKit AI Agent"]
direction TB
STT["Groq STT (Whisper)"]
LLM["Groq LLM (Llama 3)"]
TTS["Deepgram TTS (Aura)"]
VAD["Silero VAD"]
CB["on_user_turn_completed\nLive Prompt Reload + RAG Injection"]
end
subgraph RAG["RAG Pipeline"]
direction LR
PDF["PDF Parser"]
CHK["Semantic Chunker"]
EMB["BAAI/bge Embedder"]
VDB[("Qdrant Vector DB")]
RRK["Cross-Encoder Reranker"]
end
subgraph Cloud["External Services"]
direction TB
LK["LiveKit Cloud (WebRTC)"]
GQ["Groq API"]
DG["Deepgram API"]
end
Browser <-->|"REST"| FastAPI
Browser <-->|"WebRTC Audio"| LK
Agent <-->|"RTC Session"| LK
STT --> GQ
LLM --> GQ
CB --> GQ
TTS --> DG
CB -->|"query"| PDF
PDF --> CHK --> EMB --> VDB
VDB -->|"top-K"| RRK
RRK -->|"context"| CB
Agent -->|"POST /api/rag/push"| R
Browser <-->|"poll /api/rag/latest"| R
style Browser fill:#1e3a5f,stroke:#3b82f6,color:#fff
style FastAPI fill:#1e2d4e,stroke:#6366f1,color:#fff
style Agent fill:#3b1f5e,stroke:#a855f7,color:#fff
style RAG fill:#1a3a3a,stroke:#14b8a6,color:#fff
style Cloud fill:#1a2e1a,stroke:#22c55e,color:#fff
| Feature | Description |
|---|---|
| Real-time Voice | WebRTC audio via LiveKit — sub-second latency |
| RAG Q&A | Upload PDFs → chunk → embed → retrieve → answer |
| Live Transcript | Speaker-labelled transcript streamed in real time |
| RAG Source Panel | See exactly which document pages the agent used |
| Live Prompt Editor | Change the agent's personality mid-call — no restart needed |
| Knowledge Base UI | Upload / delete documents with ingestion status |
| Two-stage Retrieval | Bi-encoder ANN (Qdrant) + Cross-Encoder reranker |
| Dedup & Cleanup | Auto-removes orphaned Qdrant vectors on startup |
| Layer | Technology |
|---|---|
| API Server | FastAPI + Uvicorn |
| Voice Agent | LiveKit Agents SDK |
| STT | Groq Whisper (livekit-plugins-groq) |
| LLM | Groq Llama 3 (livekit-plugins-groq) |
| TTS | Deepgram Aura (livekit-plugins-deepgram) |
| VAD | Silero VAD (livekit-plugins-silero) |
| Embedder | BAAI/bge-large-en-v1.5 (local, sentence-transformers) |
| Reranker | cross-encoder/ms-marco-MiniLM-L-6-v2 (local) |
| Vector DB | Qdrant (local Docker) |
| PDF Parser | PyMuPDF |
| Layer | Technology |
|---|---|
| Framework | React 18 + Vite + TypeScript |
| Voice | @livekit/components-react, livekit-client |
| Styling | Tailwind CSS |
| Icons | Lucide React |
The retrieval pipeline runs as a two-stage process triggered on every user turn, inside on_user_turn_completed:
User query
│
▼
[Stage 1] Bi-Encoder ANN Search
Embed query with BAAI/bge-large-en-v1.5
→ cosine ANN search in Qdrant (top-K candidates, default K=20)
│
▼
[Stage 1.5] Deduplication
Group candidates by text[:200] — keeps highest bi-score per unique chunk
Eliminates duplicates from repeated document uploads
│
▼
[Stage 2] Cross-Encoder Reranking
Score each (query, chunk) pair with cross-encoder/ms-marco-MiniLM-L-6-v2
Re-sort by cross-encoder score → top-N returned (default N=3)
│
▼
[Stage 2.5] Post-Rerank Dedup (safety net)
Deduplicate by chunk_id — catches any duplicates with differing leading text
│
▼
Injected as a system message into ChatContext
(placed just before the user message via created_at ordering)
Ingestion pipeline (triggered on PDF upload):
PDF upload → PyMuPDF page extraction
→ sentence tokenisation (NLTK punkt)
→ sliding-window chunking (max 200 tokens, 50-token overlap)
→ batch embedding with BAAI/bge-large-en-v1.5
→ upsert to Qdrant (stable chunk_id hash as point ID)
On every backend startup, orphaned Qdrant vectors (doc_ids not in doc_metadata.json) are automatically purged.
RealtimeAIVoice/
├── backend/
│ ├── agent.py # LiveKit Agent — STT/LLM/TTS + RAG injection
│ ├── main.py # FastAPI app — CORS, routers, startup cleanup
│ ├── system_prompt.txt # Agent personality (editable live in UI)
│ ├── api/
│ │ ├── kb.py # /api/kb — upload, list, delete, cleanup
│ │ ├── prompt.py # /api/prompt — read/write system prompt
│ │ ├── rag_sources.py # /api/rag — push + poll RAG events
│ │ └── livekit_token.py # /api/livekit — JWT token generation
│ └── rag/
│ ├── parser.py # PDF page extraction
│ ├── chunker.py # Sliding-window semantic chunking
│ ├── embedder.py # BAAI/bge-large-en-v1.5 embeddings
│ ├── store.py # Qdrant upsert / search / orphan cleanup
│ ├── reranker.py # Cross-Encoder reranking
│ ├── retriever.py # Full pipeline: ANN → dedup → rerank
│ └── ingestor.py # Orchestrates the ingestion flow
├── Frontend/
│ └── src/
│ ├── App.tsx
│ ├── components/
│ │ ├── CallControl.tsx
│ │ ├── TranscriptPanel.tsx
│ │ ├── RagSourcesPanel.tsx
│ │ ├── KnowledgeBase.tsx
│ │ └── PromptEditor.tsx
│ └── hooks/
│ ├── useRagSources.ts
│ └── useKnowledgeBase.ts
└── docs/
└── image.png # UI screenshot
- Python 3.11+
- Node.js 18+
- Qdrant running on
localhost:6333 - API keys: LiveKit, Groq, Deepgram
git clone <repo-url>
cd RealtimeAIVoice
cp backend/.env.example backend/.envEdit backend/.env:
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
GROQ_API_KEY=your_groq_key
DEEPGRAM_API_KEY=your_deepgram_keydocker run -p 6333:6333 qdrant/qdrantcd backend
run_backend.bat # Windows
# or:
uv run uvicorn main:app --reload --port 8000cd backend
uv run python agent.py startcd Frontend
npm install
npm run dev # http://localhost:5173User speaks → Groq Whisper (STT) → on_user_turn_completed()
│
┌─────────────┴──────────────┐
│ │
Live prompt reload RAG retrieval
(re-reads system_prompt.txt) (embed → Qdrant → rerank)
│ │
└────────── Groq LLM ────────┘
│
Deepgram Aura (TTS) → User hears answer
│
POST /api/rag/push → Frontend polls → RAG panel updates
| Variable | Required | Description |
|---|---|---|
LIVEKIT_URL |
Yes | LiveKit server WebSocket URL |
LIVEKIT_API_KEY |
Yes | LiveKit API key |
LIVEKIT_API_SECRET |
Yes | LiveKit API secret |
GROQ_API_KEY |
Yes | Groq API key (STT + LLM) |
DEEPGRAM_API_KEY |
Yes | Deepgram API key (TTS) |
QDRANT_URL |
No | Qdrant URL (default: http://localhost:6333) |
RAG_TOP_K |
No | ANN candidates from Qdrant (default: 20) |
RAG_TOP_N_RERANK |
No | Final chunks after reranking (default: 3) |
CHUNK_MAX_TOKENS |
No | Max tokens per chunk (default: 200) |
CHUNK_OVERLAP_TOKENS |
No | Token overlap between chunks (default: 50) |
CROSS_ENCODER_MODEL |
No | Reranker model name (default: cross-encoder/ms-marco-MiniLM-L-6-v2) |
A live demo of the application is available here: Live Demo
Built with LiveKit · Groq · Deepgram · Qdrant
