Local lab for realtime voice bots, meeting integrations stacks.
Require evaluation pipeline for use case stacks.
- Pipecat: docs.pipecat.ai | pipecat.ai
- LiveKit: docs.livekit.io | livekit.io
- In
pipecat-ai==0.0.90, some LiveKit flows may still process inbound video tracks even when video input is disabled. In camera-on sessions this can causeagent-runnermemory growth and container OOM (ExitCode 137/OOMKilled=true). - Typical symptom chain: bot drops after a few minutes, then bot-start/API calls fail (
fetch failed/ 502) becauseagent-runnerexited. - Latest validation on March 2, 2026 (
stacks/r3-livekit-meet-lab): single-user manual call remained stable for 18 minutes, and manual bot restart worked; multi-user longevity still pending. - Mitigation pattern for new stacks:
- Keep bot transport video ingest disabled (
video_in_enabled=False). - Add a guard so video subscriptions are ignored unless video processing is explicitly enabled.
- Set
agent-runnerrestart policy in Compose (restart: unless-stopped) to reduce downtime after crashes while debugging.
- Keep bot transport video ingest disabled (
UI entry point:
- User in browser (
web-clientormeet) talks to the bot. - UI captures mic audio and plays bot audio.
Core loop events:
- Audio capture in browser.
- Relay to backend (
agent-runner) via LiveKit/WebRTC or other transport. - Transcribe or speech-to-speech ingest.
- LLM reasoning (plus optional tool calls).
- Audio synthesis (TTS or direct speech-to-speech output).
- Relay synthesized audio back to browser.
- Browser playback to user.
- Containers and local orchestration: Docker Compose, Make
- Backend services: Python, FastAPI, Pipecat
- Frontend apps: Next.js, React
- Realtime media/control: LiveKit + WebRTC, Zoom join/leave adapter flow
- AI/speech APIs: OpenAI Realtime, Gemini Live, Deepgram, Cartesia
- Exploration workflow: Jupyter notebooks with
uv
stacks/r0-dev-explorationbaseline LiveKit + agent-runner + web-clientstacks/r1-eval-s2s-openaispeech-to-speech eval stack (OpenAI)stacks/r2-eval-s2s-geminispeech-to-speech eval stack (Gemini)stacks/r3-livekit-meet-labMeet + Desk admin stacknotebooksexploratory notebooks for TTS/S2S experiments
- Pick a stack in
stacks/and open its README. - Copy that stack's
.envexample files. - Add required API keys.
- Run
make startin the stack directory. - Open the local URLs listed in that stack README.
Stop with make stop.