VoiceOps Latency: Blocking Conversational Quality

## Summary

Development on VoiceOps has stalled due to end-to-end latency that prevents genuine conversational interaction. The current pipeline takes several seconds from when the user finishes speaking to when a response begins playing back — far outside the threshold required for natural, phone-quality conversation.

## Problem

The round-trip latency between human speech and agent response is not conversational. A typical human conversation tolerates roughly 200-300ms of delay. The current pipeline exceeds this by a significant margin, making real-time voice interaction feel robotic and unnatural.

**Observed behavior:** Multi-second delay between end of user speech and start of agent response.
**Expected behavior:** Near-instantaneous response onset, consistent with phone-call quality dialogue.

## Why This Matters

Solving this unlocks something significant. VoiceOps is, to our knowledge, the first full-duplex voice pipeline built for an open agent platform like OpenClaw. If latency can be brought into a conversational range, this becomes a genuinely novel capability — the ability to speak with autonomous agents the same way you speak with a person on the phone.

Commercial solutions (e.g. ChatGPT Advanced Voice) have solved this, but with proprietary infrastructure, custom hardware, and large engineering teams. The open-source equivalent does not yet exist at this quality level.

## Known Bottlenecks (for investigation)

- **ASR (Whisper):** Transcription latency after voice activity detection ends
- **LLM inference:** Time-to-first-token from the agent backend
- **TTS (kokoro-js):** Synthesis time before audio playback begins
- **Streaming:** Whether each stage is streamed or batched end-to-end

## Goal

Identify and reduce latency at each stage of the pipeline so that the perceived response delay feels conversational. Streaming optimizations, model quantization, VAD tuning, and parallel pipeline stages are all worth exploring.

If anyone has experience with low-latency voice pipelines or has solved similar bottlenecks in open-source projects, contributions and suggestions are welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VoiceOps Latency: Blocking Conversational Quality #1

Summary

Problem

Why This Matters

Known Bottlenecks (for investigation)

Goal

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

VoiceOps Latency: Blocking Conversational Quality #1

Description

Summary

Problem

Why This Matters

Known Bottlenecks (for investigation)

Goal

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions