Add Respeecher TTS #1

Kharacternyk · 2025-09-01T15:30:47Z

Co-PR for the docs: respeecher/pipecat-docs#1
Branch with the quickstart modified to use Respeecher/Gemini instead of Cartesia/OpenAI (not to be upstreamed, just for testing): https://github.com/respeecher/pipecat/compare/respeecher...respeecher:pipecat:respeecher-quickstart?expand=1

Summary by CodeRabbit

New Features
- Added Respeecher as a TTS service provider with WebSocket-based streaming audio support and context management.
Documentation
- Added example script demonstrating interruptible voice chat integration with Respeecher.
Configuration
- Added Respeecher optional dependency and API key configuration.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-09-01T15:30:54Z

📝 Walkthrough

Walkthrough

This PR introduces Respeecher TTS service integration to Pipecat. It adds a new RespeecherTTSService class implementing WebSocket-based text-to-speech with audio context management and streaming capabilities. The changes include environment configuration for the Respeecher API key, an optional dependency declaration, and a new example script demonstrating an interruptible chatbot with integrated VAD, STT, LLM, and Respeecher TTS services. Additionally, the base TTS service classes are refactored to introduce _AudioContextServiceMixin, centralizing audio-context orchestration, silence generation, and lifecycle management across TTS implementations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Poem

🐰 With whispered words through wires bright,
Respeecher's voice joins Pipecat's flight!
Context frames and silence sweet,
Our chatbots chat—the refactor's complete! 🎤✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add Respeecher TTS' accurately summarizes the main change: introducing a new Respeecher TTS service with supporting configuration, example, and dependency management.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch respeecher

📜 Recent review details

Configuration used: Repository: respeecher/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e1a9c1b and d6ee787.

📒 Files selected for processing (2)

env.example
pyproject.toml

🚧 Files skipped from review as they are similar to previous changes (2)

pyproject.toml
env.example

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Coverage

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (9)

env.example (1)

143-145: Expose voice selection in env for the example

Add a voice variable (optional but helpful for quickstart parity), then read it in the example.
 # Respeecher
 RESPEECHER_API_KEY=...
+RESPEECHER_VOICE_ID=samantha

examples/foundational/07ad-interruptible-respeecher.py (4)

12-12: Avoid hard dependency on Silero extra in example; lazily import with fallback

Top-level import breaks if users don’t have the silero extra. Lazy import and continue without VAD.

-from pipecat.audio.vad.silero import SileroVADAnalyzer
+def _get_silero_vad():
+    try:
+        from pipecat.audio.vad.silero import SileroVADAnalyzer
+        return SileroVADAnalyzer()
+    except Exception:
+        logger.warning("Silero VAD not installed; continuing without VAD.")
+        return None
@@
 transport_params = {
-    "daily": lambda: DailyParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(),
-    ),
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=_get_silero_vad(),
+    ),
-    "twilio": lambda: FastAPIWebsocketParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(),
-    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=_get_silero_vad(),
+    ),
-    "webrtc": lambda: TransportParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(),
-    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=_get_silero_vad(),
+    ),
 }

Also applies to: 31-47

55-66: Parameterize Respeecher voice (and optionally model) via env

Make the example configurable without code edits.

 tts = RespeecherTTSService(
-    api_key=os.getenv("RESPEECHER_API_KEY"),
-    voice_id="samantha",
+    api_key=os.getenv("RESPEECHER_API_KEY"),
+    voice_id=os.getenv("RESPEECHER_VOICE_ID", "samantha"),
+    # model=os.getenv("RESPEECHER_TTS_MODEL", "public/tts/en-rt"),

100-106: Use context.add_message for clarity over aliasing the original list

This avoids reliance on OpenAILLMContext referencing the same list object.

-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})

50-52: Nit: drop unnecessary f-string

-    logger.info(f"Starting bot")
+    logger.info("Starting bot")

src/pipecat/services/tts_service.py (2)

367-380: Consider extracting the magic number for bytes per sample.

The silence frame generation is well implemented, but the magic number 2 for 16-bit audio could be made more explicit.

 def silence_frame(self, duration_s: float) -> TTSAudioRawFrame:
     """Create a frame of silence.
 
     Args:
         duration_s: Silence duration in seconds.
     """
-    silence_num_bytes = int(duration_s * self.sample_rate * 2)  # 16-bit
+    BYTES_PER_SAMPLE = 2  # 16-bit audio
+    silence_num_bytes = int(duration_s * self.sample_rate * BYTES_PER_SAMPLE)
 
     return TTSAudioRawFrame(
         audio=b"\x00" * silence_num_bytes,
         sample_rate=self.sample_rate,
         num_channels=1,
     )

802-806: Consider making the silence duration configurable.

The 1-second silence between contexts is hardcoded. Consider making this configurable for more flexibility.

 class _AudioContextServiceMixin(ABC):
     """A service that supports audio contexts.
 
     This class does not inherit from other service base classes to avoid
     diamond inheritance.
     """
 
-    def __init__(self):
+    def __init__(self, silence_between_contexts: float = 1.0):
         """Initialize the service."""
         self._contexts: Dict[str, asyncio.Queue] = {}
         self._audio_context_task = None
+        self._silence_between_contexts = silence_between_contexts

And update line 803:

-                # Append some silence between contexts.
-                SILENCE_BETWEEN_CONTEXTS = 1
-                silence_frame = self.silence_frame(SILENCE_BETWEEN_CONTEXTS)
+                # Append some silence between contexts.
+                silence_frame = self.silence_frame(self._silence_between_contexts)

src/pipecat/services/respeecher/tts.py (2)

56-64: Consider adding validation for sampling parameters.

The InputParams model accepts any SamplingParams without validation. Consider adding validation or documentation about valid parameters.

Add documentation about valid sampling parameters in the docstring:
 class InputParams(BaseModel):
     """Input parameters for Respeecher TTS configuration.
 
     Parameters:
         sampling_params: Sampling parameters used for speech synthesis.
+                        Valid parameters include temperature, top_p, etc.
+                        See Respeecher documentation for full list.
     """
298-301: Consider potential race condition in connection check.

There's a small window where the WebSocket state could change between the check and usage. Consider combining the connection check with the connection attempt.
-            if not self._websocket or self._websocket.state is State.CLOSED:
-                await self._connect()
+            await self._ensure_connected()
Add a helper method:
async def _ensure_connected(self):
    """Ensure WebSocket is connected, reconnecting if necessary."""
    if not self._websocket or self._websocket.state is not State.OPEN:
        await self._connect()

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 83b90da and fa4624b.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (6)

README.md (1 hunks)
env.example (1 hunks)
examples/foundational/07ad-interruptible-respeecher.py (1 hunks)
pyproject.toml (1 hunks)
src/pipecat/services/respeecher/tts.py (1 hunks)
src/pipecat/services/tts_service.py (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

examples/foundational/07ad-interruptible-respeecher.py (14)

src/pipecat/audio/vad/silero.py (1)

SileroVADAnalyzer (130-226)

src/pipecat/pipeline/pipeline.py (1)

Pipeline (91-212)

src/pipecat/pipeline/runner.py (1)

PipelineRunner (26-124)

src/pipecat/pipeline/task.py (2)

PipelineParams (62-101)

PipelineTask (104-733)

src/pipecat/processors/aggregators/openai_llm_context.py (1)

OpenAILLMContext (59-348)

src/pipecat/runner/types.py (1)

RunnerArguments (20-30)

src/pipecat/runner/utils.py (1)

create_transport (418-514)

src/pipecat/services/deepgram/stt.py (1)

DeepgramSTTService (46-322)

src/pipecat/services/openai/llm.py (1)

OpenAILLMService (58-108)

src/pipecat/services/respeecher/tts.py (1)

RespeecherTTSService (49-321)

src/pipecat/transports/base_transport.py (2)

BaseTransport (155-196)

TransportParams (27-152)

src/pipecat/transports/network/fastapi_websocket.py (1)

FastAPIWebsocketParams (53-64)

src/pipecat/transports/services/daily.py (1)

DailyParams (180-201)

src/pipecat/processors/aggregators/llm_response.py (2)

context (311-317)

get_context_frame (319-325)

src/pipecat/services/respeecher/tts.py (4)

src/pipecat/processors/frame_processor.py (3)

FrameDirection (46-55)

stop_all_metrics (432-435)

push_error (635-643)

src/pipecat/services/tts_service.py (12)

stop (231-240)

stop (500-507)

stop (888-900)

stop (951-963)

create_task (834-835)

flush_audio (215-217)

audio_context_available (765-774)

push_frame (342-365)

push_frame (842-843)

remove_audio_context (750-763)

append_to_audio_context (737-748)

create_audio_context (727-735)

src/pipecat/utils/asyncio/watchdog_async_iterator.py (1)

WatchdogAsyncIterator (20-103)

src/pipecat/utils/base_object.py (1)

_call_event_handler (120-140)

src/pipecat/services/tts_service.py (3)

src/pipecat/processors/frame_processor.py (8)

task_manager (346-357)

push_frame (645-655)

reset_watchdog (496-498)

create_task (437-476)

cancel_task (478-485)

put (101-118)

wait_for_task (487-494)

FrameDirection (46-55)

src/pipecat/utils/asyncio/task_manager.py (8)

BaseTaskManager (43-148)

reset_watchdog (332-340)

create_task (69-92)

create_task (207-271)

cancel_task (110-120)

cancel_task (302-330)

wait_for_task (95-107)

wait_for_task (273-300)

src/pipecat/frames/frames.py (2)

TTSAudioRawFrame (235-241)

Frame (86-117)

🔇 Additional comments (13)

pyproject.toml (1)
88-89: Add websockets to the respeecher extra

Replace in pyproject.toml:
-respeecher = [ "respeecher~=1.0" ]
+respeecher = [ "respeecher~=1.0", "websockets>=13.1,<15.0" ]
Confirmed PyPI package name is respeecher at version 1.0.1.
src/pipecat/services/tts_service.py (5)

10-11: LGTM! Good use of abstract patterns.

The addition of Coroutine to imports and BaseTaskManager import are appropriate for the new mixin architecture.

349-353: LGTM! Clean refactoring of silence generation.

The switch to using the new silence_frame() method provides a cleaner API for silence generation and makes the code more maintainable.

715-853: Well-designed audio context management mixin.

The _AudioContextServiceMixin provides a clean abstraction for managing audio contexts with proper queue-based ordering and timeout handling. The abstract methods create a clear contract for implementers.

855-915: LGTM! Clean integration of mixin with TTS services.

The AudioContextTTSService class properly integrates the mixin with WebSocket TTS, handling lifecycle methods correctly and ensuring graceful shutdown with the queue termination signal (None).

917-978: LGTM! Consistent implementation for word-based TTS.

The AudioContextWordTTSService maintains consistency with the non-word variant while properly extending WebsocketWordTTSService.

src/pipecat/services/respeecher/tts.py (7)

1-8: Copyright year is 2025 - verify if intentional.

The copyright notice shows 2025, but based on the PR date (2025-09-01), this appears to be September 2025. Please confirm this is intentional.

34-47: LGTM! Good error handling for optional dependencies.

The module import guard with helpful error messages is well implemented, guiding users to install the optional dependency when missing.

132-147: Add validation for required context_id.

Good use of assertion to ensure context_id is set. The request building is clean and well-structured.

189-202: LGTM! Robust WebSocket connection handling.

The connection logic properly checks for existing open connections and handles errors gracefully with event callbacks.

220-227: Good interruption handling.

The interruption handler properly cancels the current context and cleans up metrics. The WebSocket cancel message ensures server-side cleanup.

249-285: Excellent message handling with validation.

The receive handler properly validates messages with Pydantic, handles all response types, and includes good error logging. The watchdog iterator ensures the task doesn't timeout during long waits.

286-322: Well-structured TTS generation with proper error recovery.

The run_tts method handles connection state, creates audio contexts appropriately, and includes proper error handling with reconnection logic. The tracing decorator enables performance monitoring.

README.md

src/pipecat/services/respeecher/tts.py

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/pipecat/services/tts_service.py (1)
732-741: Fix race: create the per-context queue before enqueuing the context id.
await self._contexts_queue.put(context_id) can yield; the handler may read context_id and access self._contexts[context_id] before it exists, raising KeyError.

Apply this diff:
-        await self._contexts_queue.put(context_id)
-        self._contexts[context_id] = asyncio.Queue()
+        self._contexts[context_id] = asyncio.Queue()
+        await self._contexts_queue.put(context_id)

🧹 Nitpick comments (7)

src/pipecat/services/tts_service.py (7)
357-361: Preserve direction when injecting silence after TTSStoppedFrame.
Forward the original direction to avoid surprising routing if callers ever use non-default directions.
-            await self.push_frame(silence_frame)
+            await self.push_frame(silence_frame, direction)
806-810: Make inter-context silence duration configurable.
A hard-coded 1s gap may be too long/short depending on UX. Prefer a class attribute to allow easy tuning/override.
-                SILENCE_BETWEEN_CONTEXTS = 1
-                silence_frame = self.silence_frame(SILENCE_BETWEEN_CONTEXTS)
+                silence_frame = self.silence_frame(self.CONTEXT_SILENCE_TIME_S)
Add outside this hunk (class scope near line 720):
# class _AudioContextServiceMixin(ABC):
CONTEXT_SILENCE_TIME_S: float = 1.0
815-832: Avoid premature context termination; make the timeout a tunable attribute.
A fixed 3.0s may truncate audio on slow networks. Expose it for services to override and document the tradeoff.
-        AUDIO_CONTEXT_TIMEOUT = 3.0
+        timeout_s = self.CONTEXT_TIMEOUT_S
 ...
-                frame = await asyncio.wait_for(queue.get(), timeout=AUDIO_CONTEXT_TIMEOUT)
+                frame = await asyncio.wait_for(queue.get(), timeout=timeout_s)
Add outside this hunk (class scope near line 720):
# class _AudioContextServiceMixin(ABC):
CONTEXT_TIMEOUT_S: float = 3.0
Please verify typical TTS stall gaps in your environments; adjust CONTEXT_TIMEOUT_S if you see tail truncation under load.

833-857: Align abstract method signatures with concrete implementations for better type-safety.
Broaden abstracts to match existing FrameProcessor/services API (name/timeout/direction), avoiding mypy/IDE warnings and future surprises.
-    def create_task(self, coroutine: Coroutine) -> asyncio.Task:
+    def create_task(self, coroutine: Coroutine, name: Optional[str] = None) -> asyncio.Task:
@@
-    async def cancel_task(self, task: asyncio.Task) -> None:
+    async def cancel_task(self, task: asyncio.Task, timeout: Optional[float] = None) -> None:
@@
-    async def push_frame(self, frame: Frame) -> None:
+    async def push_frame(
+        self,
+        frame: Frame,
+        direction: "FrameDirection" = FrameDirection.DOWNSTREAM,
+    ) -> None:
781-786: Optional: provide backpressure knobs for queues.
Unbounded queues can grow under bursty producers. Consider class attrs to set maxsize for _contexts_queue and per-context queues.

Example (outside this hunk, class scope):
CONTEXTS_QUEUE_MAXSIZE: int = 0      # 0 = unbounded
CONTEXT_QUEUE_MAXSIZE: int = 0
And update initializations:
-            self._contexts_queue = asyncio.Queue()
+            self._contexts_queue = asyncio.Queue(maxsize=self.CONTEXTS_QUEUE_MAXSIZE)
And in create_audio_context (with the race fix applied):
-        self._contexts[context_id] = asyncio.Queue()
+        self._contexts[context_id] = asyncio.Queue(maxsize=self.CONTEXT_QUEUE_MAXSIZE)
889-905: Stop path is clean; awaits graceful drain after sentinel.
LGTM. Consider mirroring the same approach in the word-timestamps variant for consistency.

961-968: Avoid deprecated wait_for_task; await the task directly (match TTS variant).
Keeps behavior consistent and avoids deprecation warnings.
-            await self.wait_for_task(self._audio_context_task)
+            await self._audio_context_task

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fa4624b and 79b9f28.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

README.md (1 hunks)
env.example (1 hunks)
pyproject.toml (1 hunks)
src/pipecat/services/tts_service.py (7 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

pyproject.toml
env.example
README.md

🧰 Additional context used

🧬 Code graph analysis (1)

src/pipecat/services/tts_service.py (5)

src/pipecat/transports/network/websocket_client.py (1)

task_manager (102-115)

src/pipecat/processors/frame_processor.py (5)

task_manager (325-336)

create_task (416-430)

cancel_task (432-439)

put (95-112)

wait_for_task (441-466)

src/pipecat/utils/asyncio/task_manager.py (5)

BaseTaskManager (33-92)

create_task (58-70)

create_task (141-175)

cancel_task (73-83)

cancel_task (177-203)

src/pipecat/frames/frames.py (4)

TTSAudioRawFrame (261-267)

Frame (112-143)

StartFrame (717-741)

EndFrame (1288-1298)

src/pipecat/transports/base_output.py (8)

sample_rate (88-94)

sample_rate (386-392)

start (105-117)

start (403-426)

stop (119-126)

stop (428-452)

cancel (128-135)

cancel (454-463)

🔇 Additional comments (4)

src/pipecat/services/tts_service.py (4)

10-11: Good: abstract base + Coroutine typing are appropriate for the new mixin.
No issues here; this enables clean protocol-style hooks.

40-41: Good: task-manager import aligns with the mixin’s abstract task_manager property.
This keeps orchestration consistent with existing TaskManager-based infra.

374-386: Silence frame helper looks good.
Generates correct 16‑bit mono silence with the current sample rate.

823-829: Confirm reset_watchdog availability and semantics.
Ensure concrete services implement reset_watchdog() to keep connections alive during context gaps/timeouts; otherwise the mixin will leave classes abstract or NOP.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/pipecat/services/respeecher/tts.py (1)
166-173: Docstring refers to “Stop” and “end frame” in cancel(); should be “Cancel” and “cancel frame”.

Same point was raised earlier; aligning terminology improves API clarity.

Apply this diff:
-    async def cancel(self, frame: CancelFrame):
-        """Stop the Respeecher TTS service.
+    async def cancel(self, frame: CancelFrame):
+        """Cancel the Respeecher TTS service.
@@
-            frame: The end frame.
+            frame: The cancel frame.

🧹 Nitpick comments (7)

src/pipecat/services/respeecher/tts.py (4)

219-226: Proactively clear audio context on interruption to avoid ~3s timeout stall.

Currently you rely on the mixin timeout to finish the context. Remove it immediately after sending cancel to improve responsiveness.

Apply this diff:

-        if self._context_id:
-            cancel_request = json.dumps({"context_id": self._context_id, "cancel": True})
-            await self._get_websocket().send(cancel_request)
-            self._context_id = None
+        if self._context_id:
+            ctx = self._context_id
+            cancel_request = json.dumps({"context_id": ctx, "cancel": True})
+            await self._get_websocket().send(cancel_request)
+            await self.remove_audio_context(ctx)
+            self._context_id = None

188-196: Add connection timeout to websocket connect.

Prevents hangs on network issues.

Apply this diff:

-            self._websocket = await websocket_connect(
-                f"{self._url}/{self._model_name}/tts/websocket?api_key={self._api_key}"
-            )
+            self._websocket = await websocket_connect(
+                f"{self._url}/{self._model_name}/tts/websocket?api_key={self._api_key}",
+                open_timeout=10,
+            )

296-298: Broaden state check before (re)connecting.

Reconnect unless the socket is OPEN.

Apply this diff:

-            if not self._websocket or self._websocket.state is State.CLOSED:
+            if not self._websocket or self._websocket.state is not State.OPEN:

271-276: Emit TTSStoppedFrame when response.type == "done"
Add a stop frame after removing the audio context to align with other TTS services and enable downstream buffers to flush:

             if response.type == "done":
                 await self.stop_ttfb_metrics()
                 await self.remove_audio_context(response.context_id)
+                await self.push_frame(TTSStoppedFrame())

src/pipecat/services/tts_service.py (3)

805-809: Also emit TTSStoppedFrame at the end of each audio context.

This helps downstream components finalize packets/buffers per context.

Apply this diff:

-                # Append some silence between contexts.
-                SILENCE_BETWEEN_CONTEXTS = 1
-                silence_frame = self.silence_frame(SILENCE_BETWEEN_CONTEXTS)
-                await self.push_frame(silence_frame)
+                # Mark context end and add inter-context silence.
+                await self.push_frame(TTSStoppedFrame())
+                SILENCE_BETWEEN_CONTEXTS = 1
+                await self.push_frame(self.silence_frame(SILENCE_BETWEEN_CONTEXTS))

831-849: Abstract method signatures should match concrete implementations.

Align with FrameProcessor.create_task/cancel_task/push_frame to keep type checkers happy and avoid subtle MRO issues.

Apply this diff:

-    def create_task(self, coroutine: Coroutine) -> asyncio.Task:
+    def create_task(self, coroutine: Coroutine, name: Optional[str] = None) -> asyncio.Task:
@@
-    async def cancel_task(self, task: asyncio.Task) -> None:
+    async def cancel_task(self, task: asyncio.Task, timeout: Optional[float] = None) -> None:
@@
-    async def push_frame(self, frame: Frame) -> None:
+    async def push_frame(self, frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM) -> None:

816-817: Make timeouts/silence duration configurable constants.

Expose AUDIO_CONTEXT_TIMEOUT and SILENCE_BETWEEN_CONTEXTS as class attrs for tuning/testing.

Also applies to: 805-809

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 79b9f28 and 233fef4.

📒 Files selected for processing (2)

src/pipecat/services/respeecher/tts.py (1 hunks)
src/pipecat/services/tts_service.py (7 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-01T15:48:25.154Z

Learnt from: Kharacternyk
PR: respeecher/pipecat#1
File: src/pipecat/services/respeecher/tts.py:109-110
Timestamp: 2025-09-01T15:48:25.154Z
Learning: Pipecat requires Python 3.10+ as specified in pyproject.toml with "requires-python = ">=3.10"". This means newer Python syntax like `str | None` union types are appropriate and preferred over `Optional[str]`.

Applied to files:

src/pipecat/services/respeecher/tts.py

🧬 Code graph analysis (2)

src/pipecat/services/tts_service.py (3)

src/pipecat/processors/frame_processor.py (5)

task_manager (325-336)

push_frame (606-616)

create_task (416-430)

cancel_task (432-439)

FrameDirection (42-51)

src/pipecat/transports/network/websocket_client.py (1)

task_manager (102-115)

src/pipecat/utils/asyncio/task_manager.py (5)

BaseTaskManager (33-92)

create_task (58-70)

create_task (141-175)

cancel_task (73-83)

cancel_task (177-203)

src/pipecat/services/respeecher/tts.py (4)

src/pipecat/frames/frames.py (10)

CancelFrame (745-752)

EndFrame (1288-1298)

ErrorFrame (756-774)

Frame (112-143)

LLMFullResponseEndFrame (1384-1391)

StartFrame (717-741)

StartInterruptionFrame (860-869)

TTSAudioRawFrame (261-267)

TTSStartedFrame (1395-1405)

TTSStoppedFrame (1409-1412)

src/pipecat/processors/frame_processor.py (3)

FrameDirection (42-51)

stop_all_metrics (411-414)

push_error (596-604)

src/pipecat/services/tts_service.py (32)

AudioContextTTSService (852-911)

sample_rate (138-144)

start (218-228)

start (495-502)

start (876-883)

start (939-946)

stop (230-239)

stop (504-511)

stop (885-897)

stop (948-960)

cancel (241-250)

cancel (513-520)

cancel (899-906)

cancel (962-969)

create_task (831-832)

_report_error (601-603)

_report_error (673-675)

cancel_task (835-836)

_handle_interruption (387-391)

_handle_interruption (536-539)

_handle_interruption (626-630)

_handle_interruption (698-702)

_handle_interruption (908-911)

_handle_interruption (971-974)

flush_audio (214-216)

audio_context_available (769-778)

push_frame (348-371)

push_frame (839-840)

remove_audio_context (754-767)

append_to_audio_context (741-752)

run_tts (181-192)

create_audio_context (731-739)

src/pipecat/utils/base_object.py (1)

_call_event_handler (121-141)

🔇 Additional comments (6)

src/pipecat/services/respeecher/tts.py (4)
108-109: Union type str | None is correct for this codebase (Python 3.10+).

No action needed.

178-180: receive_task_handler Exists in Base Class
Confirmed: respeecher/tts.py’s class inherits (directly or via MRO) from websocket_service.WebsocketService, which defines
async def _receive_task_handler(self, report_error: Callable[[ErrorFrame], Awaitable[None]]):
    …
so _receive_task_handler is present at runtime. No action needed.

193-195: Verify header/subprotocol support
No public documentation indicates the streaming TTS websocket accepts API keys via an Authorization header or subprotocol—embedding keys in URLs can leak via logs or proxies. Please contact Respeecher support to confirm or request header-based/subprotocol authentication for the websocket.

284-284: Ignore signature change – None yields are intentional.

process_generator’s docstring and implementation expect the generator to yield None as a handshake marker (it skips falsy values), so removing yield None and changing the return type would break the streaming flow. Leave the signature and yield None intact.

Likely an incorrect or invalid review comment.
src/pipecat/services/tts_service.py (2)

355-359: Silence insertion API looks good.

Centralizing silence creation via silence_frame() improves reuse and readability.

Also applies to: 373-386

948-961: Verify availability of wait_for_task in TTSService
Found wait_for_task defined in src/pipecat/processors/frame_processor.py:441; ensure TTSService actually inherits this helper before swapping to a raw await self._audio_context_task.

src/pipecat/services/respeecher/__init__.py

src/pipecat/services/respeecher/tts.py

mitrushchienkova · 2025-09-02T11:01:54Z

src/pipecat/services/respeecher/tts.py

+        self._api_key = api_key
+        self._url = url
+        self._output_format: StreamingOutputFormatParams = {
+            "encoding": "pcm_s16le",


We also accept "pcm_f32le", should we allow users to set it up?

I made it not configurable because at a glance there is no floating-point sample support in Pipecat audio facilities. I should ask them in the future PR

Maybe it is actually better that they don't support it since it doubles the size of the audio, and it's hard to imagine how it could be useful.

src/pipecat/services/respeecher/tts.py

mitrushchienkova · 2025-09-02T11:24:51Z

src/pipecat/services/respeecher/tts.py

+        await super().stop(frame)
+        await self._disconnect()
+
+    async def cancel(self, frame: CancelFrame):


What's the difference between stop and cancel?

No difference for us and other services that I've been skimming through. Everyone does the same in both handlers. Conceptually, from what I understand, a stop must leave the object in a state where another start is then possible, whereas there's no such requirement for cancel, so cancel could be quicker for some services

mitrushchienkova · 2025-09-02T11:38:11Z

src/pipecat/services/tts_service.py

            True if the context exists and is available.
        """
        return context_id in self._contexts



The changes in this file, do not break anything for other integrations, right?

I hope so:) Of course that's my goal, but it's difficult to test properly. The main change is conceptually simple: decouple audio context management from word timestamp functionality. We would be the first Pipecat TTS service that has the former but not the latter. Decoupling's done mostly by moving code verbatim into a separate class, so there should be low probability of subtle bugs

I haven't delved deep into the Pipecat, but it looks like they are calling methods like push_frame without passing any channel id. Doesn't that mean that this class does not support multiple independent channels, so contexts are not useful?

Discussed in DMs. TL;DR: in Pipecat contexts are useful for interruption handling rather than multiplexing per se

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

src/pipecat/services/respeecher/tts.py (1)

166-173: Docstring is now correct for cancel.

Thanks for addressing the earlier feedback; this now accurately describes the cancel semantics.

🧹 Nitpick comments (5)

src/pipecat/services/respeecher/tts.py (5)
188-196: Add WebSocket timeouts to prevent hangs.

Configure reasonable open_timeout/close_timeout (and optionally ping intervals) to avoid indefinite waits and improve recovery.
-            self._websocket = await websocket_connect(
-                f"{self._url}/{self._model_name}/tts/websocket?api_key={self._api_key}"
-            )
+            self._websocket = await websocket_connect(
+                f"{self._url}/{self._model_name}/tts/websocket?api_key={self._api_key}",
+                open_timeout=10,
+                close_timeout=10,
+            )
43-45: Preserve exception type and traceback for missing optional deps.

Raising a generic Exception discards useful diagnostics. Re-raise as ModuleNotFoundError (or ImportError) and chain from e.
-    raise Exception(f"Missing module: {e}")
+    raise ModuleNotFoundError(f"Missing module: {e}") from e
108-110: Type hygiene nit: annotate _receive_task and standardize Optional style.

Since 3.10+ is required, consider asyncio.Task | None and using | None consistently, or keep as-is for local consistency.
+from typing import AsyncGenerator, Optional
+import asyncio
@@
-        self._receive_task = None
+        self._receive_task: asyncio.Task | None = None
219-226: Interruption cleanup: confirm server-ack cancel reliably removes the context.

If the server drops the connection or loses the cancel, the context might linger. Consider a fallback removal after a short grace period if no done/error arrives.

248-283: Metrics ordering is idempotent, but consider guarding repeated stop_ttfb_metrics().

stop_ttfb_metrics() is called on every chunk and on done. If it’s not idempotent, add a guard; otherwise ignore.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 233fef4 and b0e971b.

📒 Files selected for processing (1)

src/pipecat/services/respeecher/tts.py (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-01T15:48:25.154Z

Learnt from: Kharacternyk
PR: respeecher/pipecat#1
File: src/pipecat/services/respeecher/tts.py:109-110
Timestamp: 2025-09-01T15:48:25.154Z
Learning: Pipecat requires Python 3.10+ as specified in pyproject.toml with "requires-python = ">=3.10"". This means newer Python syntax like `str | None` union types are appropriate and preferred over `Optional[str]`.

Applied to files:

src/pipecat/services/respeecher/tts.py

🧬 Code graph analysis (1)

src/pipecat/services/respeecher/tts.py (4)

src/pipecat/frames/frames.py (10)

CancelFrame (745-752)

EndFrame (1288-1298)

ErrorFrame (756-774)

Frame (112-143)

LLMFullResponseEndFrame (1384-1391)

StartFrame (717-741)

StartInterruptionFrame (860-869)

TTSAudioRawFrame (261-267)

TTSStartedFrame (1395-1405)

TTSStoppedFrame (1409-1412)

src/pipecat/processors/frame_processor.py (1)

FrameDirection (42-51)

src/pipecat/services/tts_service.py (15)

sample_rate (138-144)

start (218-228)

start (495-502)

start (876-883)

start (939-946)

stop (230-239)

stop (504-511)

stop (885-897)

stop (948-960)

flush_audio (214-216)

audio_context_available (769-778)

remove_audio_context (754-767)

append_to_audio_context (741-752)

run_tts (181-192)

create_audio_context (731-739)

src/pipecat/utils/tracing/service_decorators.py (1)

traced_tts (101-218)

src/pipecat/services/respeecher/tts.py

mitrushchienkova

LGTM overall

The only note: should we do something about the coma splitting issue (partially discussed in Slack)?

greaber · 2025-09-05T06:24:08Z

examples/foundational/07ad-interruptible-respeecher.py

+
+# We store functions so objects (e.g. SileroVADAnalyzer) don't get
+# instantiated. The function will be called when the desired transport gets
+# selected.


This comment is a little confusing because it looks like you do call SileroVADAnalyzer

It's called in a lambda on-demand, not at the top-level. This comment is the same across examples with different TTSes, our custom code is just lines 55-65

pyproject.toml

greaber · 2025-09-05T07:08:39Z

src/pipecat/services/respeecher/tts.py

+
+        request: ContextfulGenerationRequestParams = {
+            "transcript": text,
+            "continue": continue_transcript,


Should we harmonize naming of this "continue" variable? You said that in inf we also use continue_ and do_continue, right? Also, is there a reason we need to call the text to convert sometimes text and sometimes transcript?

continue: the official piece of API

continue_: our auto-generated SDK since continue is a keyword

do_continue: internal code for the same reason that continue is a keyword, can be aligned with continue_

Variable naming (continue_transcript, text, etc.) in this file mostly follows other TTS integration files so that the probability of a delay in merging due to minor naming inconsistencies is minimized

OK, this makes sense. I guess we can stick with the naming system used by other TTS integration files, especially if it will make it easier for people to switch to ours. And maybe switch do_continue in inf when you think of it to at least eliminate that variant.

greaber · 2025-09-05T07:20:06Z

src/pipecat/services/tts_service.py

-                )
-                await self.push_frame(frame)
+                # Append some silence between contexts.
+                SILENCE_BETWEEN_CONTEXTS = 1


Following up on my comment above, why would someone switch context? (And does it actually make sense to add silence if they do?)

There must be reasons for adding silence if the logic is there. It's not specific to our integration so I wouldn't concentrate much on this. I'm already doubtful if I should include this piece of refactoring. (It's a byproduct of an earlier version of the integration where I did need a way to disable this silence adding.)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/pipecat/services/tts_service.py (1)
959-959: Consider using async task cancellation instead of wait_for_task.

Line 959 uses the deprecated wait_for_task method. Based on the deprecation notice in the relevant code snippets, this should be replaced with direct task awaiting for consistency.

Apply this diff to use the modern approach:
-            await self.wait_for_task(self._audio_context_task)
+            await self._audio_context_task

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fbdb350 and e1a9c1b.

📒 Files selected for processing (4)

README.md (1 hunks)
env.example (1 hunks)
pyproject.toml (1 hunks)
src/pipecat/services/tts_service.py (7 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

README.md
env.example
pyproject.toml

🧰 Additional context used

🧬 Code graph analysis (1)

src/pipecat/services/tts_service.py (3)

src/pipecat/processors/frame_processor.py (5)

task_manager (351-362)

create_task (442-456)

cancel_task (458-465)

put (96-113)

wait_for_task (467-492)

src/pipecat/utils/asyncio/task_manager.py (5)

BaseTaskManager (33-92)

create_task (58-70)

create_task (141-175)

cancel_task (73-83)

cancel_task (177-203)

src/pipecat/frames/frames.py (2)

TTSAudioRawFrame (261-267)

Frame (112-143)

🔇 Additional comments (7)

src/pipecat/services/tts_service.py (7)

10-11: LGTM! Imports align with the new mixin architecture.

The added ABC imports and BaseTaskManager import are necessary for the new _AudioContextServiceMixin abstract methods and task management requirements.

Also applies to: 40-40

356-385: Excellent refactoring! Centralized silence frame generation.

The silence_frame method provides a clean, reusable way to generate silence frames with proper audio parameters. This is a good abstraction that will be used by both the regular TTS service and the new mixin.

719-849: Well-designed mixin for audio context management.

The _AudioContextServiceMixin provides a clean separation of concerns by centralizing audio context orchestration. The abstract methods create a proper contract that implementing classes must fulfill. The queue-based architecture ensures ordered processing of audio contexts.

A few observations:

The timeout handling (3 seconds) provides good resilience against hanging contexts

The silence insertion between contexts (1 second) helps with audio clarity

Proper lifecycle management with graceful shutdown

852-912: LGTM! Clean implementation of AudioContextTTSService.

The class properly inherits from both WebsocketTTSService and _AudioContextServiceMixin, with correct initialization order and proper lifecycle management. The interruption handling that restarts the audio context task is appropriate for maintaining clean state after interruptions.

914-974: LGTM! Consistent AudioContextWordTTSService implementation.

The implementation mirrors AudioContextTTSService but extends the word timestamp functionality. The pattern is consistent and maintains the same lifecycle management approach.

830-849: Abstract method implementations verified — no action required.

FrameProcessor (src/pipecat/processors/frame_processor.py) provides create_task, cancel_task, push_frame and task_manager; TTSService (src/pipecat/services/tts_service.py) provides silence_frame; AudioContextTTSService and AudioContextWordTTSService inherit these via WebsocketTTSService/WebsocketWordTTSService so the mixin contract is satisfied.

805-808: Make inter-context silence configurable and document it — src/pipecat/services/tts_service.py:805–808

Hardcoded 1s silence is likely noticeable; SSML guidance favors shorter defaults (100–250ms short pauses, 400–1000ms sentence/paragraph). Actions:

Replace the hardcoded SILENCE_BETWEEN_CONTEXTS=1 with a configurable duration (ms) or emit SSML ; consider a default ~400–600ms.

Document the insertion behavior in user-facing docs/README and how to override it.

Run listening tests across target voices/devices and identify flows (short prompts, concatenated snippets, low-latency UX) where a 1s gap is problematic.

Adds support for using Ultravox Realtime as a speech-to-speech service. Also removes the deprecated Ultravox speech-to-text vllm model integration to avoid confusion.

Kharacternyk added 18 commits August 25, 2025 13:59

Add a dependency group for Respeecher

10db1bd

Add Respeecher to env.example

24565b2

Spit the audio context functionality out

c488706

Fix some typos

1417791

Make the type checker happy

051c5fc

Rename a class

ee2f11d

Add a link to the future docs page

eb467d4

Add initial implementation

c9a12b4

Fix some issues

5efe01b

Refactor

adadc6a

Forbid overriding encoding

554ff8e

Use a TypedDict for convenience

d26e2fb

Add some more type checking

44b3ed0

Add a foundational example

999d073

Fix an example

f640eae

Move sampling params into settings

1e19dac

Switch back to single context

d565fc0

Refactor

fa4624b

Kharacternyk requested review from greaber and mitrushchienkova September 1, 2025 15:30

coderabbitai bot reviewed Sep 1, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

src/pipecat/services/respeecher/tts.py Show resolved Hide resolved

src/pipecat/services/respeecher/tts.py Show resolved Hide resolved

Merge branch 'main' into respeecher

79b9f28

coderabbitai bot reviewed Sep 1, 2025

View reviewed changes

Remove the watchdog

233fef4

coderabbitai bot reviewed Sep 1, 2025

View reviewed changes

mitrushchienkova reviewed Sep 2, 2025

View reviewed changes

Fix a docstring

b0e971b

coderabbitai bot reviewed Sep 2, 2025

View reviewed changes

src/pipecat/services/respeecher/tts.py Show resolved Hide resolved

src/pipecat/services/respeecher/tts.py Show resolved Hide resolved

src/pipecat/services/respeecher/tts.py Show resolved Hide resolved

src/pipecat/services/respeecher/tts.py Show resolved Hide resolved

mitrushchienkova reviewed Sep 3, 2025

View reviewed changes

greaber reviewed Sep 5, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

greaber reviewed Sep 5, 2025

View reviewed changes

Kharacternyk added 3 commits September 5, 2025 11:35

Adjust a dependency specifier

fbdb350

Merge branch 'main' into respeecher

62b5427

Fix a type hint

e1a9c1b

coderabbitai bot reviewed Sep 23, 2025

View reviewed changes

Kharacternyk pushed a commit that referenced this pull request Dec 26, 2025

Add Ultravox service (#1)

4b81be7

Adds support for using Ultravox Realtime as a speech-to-speech service. Also removes the deprecated Ultravox speech-to-text vllm model integration to avoid confusion.

Merge branch 'main' into respeecher

d6ee787

Add Respeecher TTS #1

Are you sure you want to change the base?

Add Respeecher TTS #1

Uh oh!

Conversation

Kharacternyk commented Sep 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mitrushchienkova left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Kharacternyk commented Sep 1, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 1, 2025 •

edited

Loading