Skip to content

Conversation

@Kharacternyk
Copy link
Member

@Kharacternyk Kharacternyk commented Sep 1, 2025

Co-PR for the docs: respeecher/pipecat-docs#1
Branch with the quickstart modified to use Respeecher/Gemini instead of Cartesia/OpenAI (not to be upstreamed, just for testing): https://github.com/respeecher/pipecat/compare/respeecher...respeecher:pipecat:respeecher-quickstart?expand=1

Summary by CodeRabbit

  • New Features

    • Added Respeecher as a TTS service provider with WebSocket-based streaming audio support and context management.
  • Documentation

    • Added example script demonstrating interruptible voice chat integration with Respeecher.
  • Configuration

    • Added Respeecher optional dependency and API key configuration.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Sep 1, 2025

📝 Walkthrough

Walkthrough

This PR introduces Respeecher TTS service integration to Pipecat. It adds a new RespeecherTTSService class implementing WebSocket-based text-to-speech with audio context management and streaming capabilities. The changes include environment configuration for the Respeecher API key, an optional dependency declaration, and a new example script demonstrating an interruptible chatbot with integrated VAD, STT, LLM, and Respeecher TTS services. Additionally, the base TTS service classes are refactored to introduce _AudioContextServiceMixin, centralizing audio-context orchestration, silence generation, and lifecycle management across TTS implementations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Poem

🐰 With whispered words through wires bright,
Respeecher's voice joins Pipecat's flight!
Context frames and silence sweet,
Our chatbots chat—the refactor's complete! 🎤✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add Respeecher TTS' accurately summarizes the main change: introducing a new Respeecher TTS service with supporting configuration, example, and dependency management.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch respeecher

📜 Recent review details

Configuration used: Repository: respeecher/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e1a9c1b and d6ee787.

📒 Files selected for processing (2)
  • env.example
  • pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (2)
  • pyproject.toml
  • env.example
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Coverage

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (9)
env.example (1)

143-145: Expose voice selection in env for the example

Add a voice variable (optional but helpful for quickstart parity), then read it in the example.

 # Respeecher
 RESPEECHER_API_KEY=...
+RESPEECHER_VOICE_ID=samantha
examples/foundational/07ad-interruptible-respeecher.py (4)

12-12: Avoid hard dependency on Silero extra in example; lazily import with fallback

Top-level import breaks if users don’t have the silero extra. Lazy import and continue without VAD.

-from pipecat.audio.vad.silero import SileroVADAnalyzer
+def _get_silero_vad():
+    try:
+        from pipecat.audio.vad.silero import SileroVADAnalyzer
+        return SileroVADAnalyzer()
+    except Exception:
+        logger.warning("Silero VAD not installed; continuing without VAD.")
+        return None
@@
 transport_params = {
-    "daily": lambda: DailyParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(),
-    ),
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=_get_silero_vad(),
+    ),
-    "twilio": lambda: FastAPIWebsocketParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(),
-    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=_get_silero_vad(),
+    ),
-    "webrtc": lambda: TransportParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(),
-    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=_get_silero_vad(),
+    ),
 }

Also applies to: 31-47


55-66: Parameterize Respeecher voice (and optionally model) via env

Make the example configurable without code edits.

 tts = RespeecherTTSService(
-    api_key=os.getenv("RESPEECHER_API_KEY"),
-    voice_id="samantha",
+    api_key=os.getenv("RESPEECHER_API_KEY"),
+    voice_id=os.getenv("RESPEECHER_VOICE_ID", "samantha"),
+    # model=os.getenv("RESPEECHER_TTS_MODEL", "public/tts/en-rt"),

100-106: Use context.add_message for clarity over aliasing the original list

This avoids reliance on OpenAILLMContext referencing the same list object.

-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})

50-52: Nit: drop unnecessary f-string

-    logger.info(f"Starting bot")
+    logger.info("Starting bot")
src/pipecat/services/tts_service.py (2)

367-380: Consider extracting the magic number for bytes per sample.

The silence frame generation is well implemented, but the magic number 2 for 16-bit audio could be made more explicit.

 def silence_frame(self, duration_s: float) -> TTSAudioRawFrame:
     """Create a frame of silence.
 
     Args:
         duration_s: Silence duration in seconds.
     """
-    silence_num_bytes = int(duration_s * self.sample_rate * 2)  # 16-bit
+    BYTES_PER_SAMPLE = 2  # 16-bit audio
+    silence_num_bytes = int(duration_s * self.sample_rate * BYTES_PER_SAMPLE)
 
     return TTSAudioRawFrame(
         audio=b"\x00" * silence_num_bytes,
         sample_rate=self.sample_rate,
         num_channels=1,
     )

802-806: Consider making the silence duration configurable.

The 1-second silence between contexts is hardcoded. Consider making this configurable for more flexibility.

 class _AudioContextServiceMixin(ABC):
     """A service that supports audio contexts.
 
     This class does not inherit from other service base classes to avoid
     diamond inheritance.
     """
 
-    def __init__(self):
+    def __init__(self, silence_between_contexts: float = 1.0):
         """Initialize the service."""
         self._contexts: Dict[str, asyncio.Queue] = {}
         self._audio_context_task = None
+        self._silence_between_contexts = silence_between_contexts

And update line 803:

-                # Append some silence between contexts.
-                SILENCE_BETWEEN_CONTEXTS = 1
-                silence_frame = self.silence_frame(SILENCE_BETWEEN_CONTEXTS)
+                # Append some silence between contexts.
+                silence_frame = self.silence_frame(self._silence_between_contexts)
src/pipecat/services/respeecher/tts.py (2)

56-64: Consider adding validation for sampling parameters.

The InputParams model accepts any SamplingParams without validation. Consider adding validation or documentation about valid parameters.

Add documentation about valid sampling parameters in the docstring:

 class InputParams(BaseModel):
     """Input parameters for Respeecher TTS configuration.
 
     Parameters:
         sampling_params: Sampling parameters used for speech synthesis.
+                        Valid parameters include temperature, top_p, etc.
+                        See Respeecher documentation for full list.
     """

298-301: Consider potential race condition in connection check.

There's a small window where the WebSocket state could change between the check and usage. Consider combining the connection check with the connection attempt.

-            if not self._websocket or self._websocket.state is State.CLOSED:
-                await self._connect()
+            await self._ensure_connected()

Add a helper method:

async def _ensure_connected(self):
    """Ensure WebSocket is connected, reconnecting if necessary."""
    if not self._websocket or self._websocket.state is not State.OPEN:
        await self._connect()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 83b90da and fa4624b.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • README.md (1 hunks)
  • env.example (1 hunks)
  • examples/foundational/07ad-interruptible-respeecher.py (1 hunks)
  • pyproject.toml (1 hunks)
  • src/pipecat/services/respeecher/tts.py (1 hunks)
  • src/pipecat/services/tts_service.py (7 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
examples/foundational/07ad-interruptible-respeecher.py (14)
src/pipecat/audio/vad/silero.py (1)
  • SileroVADAnalyzer (130-226)
src/pipecat/pipeline/pipeline.py (1)
  • Pipeline (91-212)
src/pipecat/pipeline/runner.py (1)
  • PipelineRunner (26-124)
src/pipecat/pipeline/task.py (2)
  • PipelineParams (62-101)
  • PipelineTask (104-733)
src/pipecat/processors/aggregators/openai_llm_context.py (1)
  • OpenAILLMContext (59-348)
src/pipecat/runner/types.py (1)
  • RunnerArguments (20-30)
src/pipecat/runner/utils.py (1)
  • create_transport (418-514)
src/pipecat/services/deepgram/stt.py (1)
  • DeepgramSTTService (46-322)
src/pipecat/services/openai/llm.py (1)
  • OpenAILLMService (58-108)
src/pipecat/services/respeecher/tts.py (1)
  • RespeecherTTSService (49-321)
src/pipecat/transports/base_transport.py (2)
  • BaseTransport (155-196)
  • TransportParams (27-152)
src/pipecat/transports/network/fastapi_websocket.py (1)
  • FastAPIWebsocketParams (53-64)
src/pipecat/transports/services/daily.py (1)
  • DailyParams (180-201)
src/pipecat/processors/aggregators/llm_response.py (2)
  • context (311-317)
  • get_context_frame (319-325)
src/pipecat/services/respeecher/tts.py (4)
src/pipecat/processors/frame_processor.py (3)
  • FrameDirection (46-55)
  • stop_all_metrics (432-435)
  • push_error (635-643)
src/pipecat/services/tts_service.py (12)
  • stop (231-240)
  • stop (500-507)
  • stop (888-900)
  • stop (951-963)
  • create_task (834-835)
  • flush_audio (215-217)
  • audio_context_available (765-774)
  • push_frame (342-365)
  • push_frame (842-843)
  • remove_audio_context (750-763)
  • append_to_audio_context (737-748)
  • create_audio_context (727-735)
src/pipecat/utils/asyncio/watchdog_async_iterator.py (1)
  • WatchdogAsyncIterator (20-103)
src/pipecat/utils/base_object.py (1)
  • _call_event_handler (120-140)
src/pipecat/services/tts_service.py (3)
src/pipecat/processors/frame_processor.py (8)
  • task_manager (346-357)
  • push_frame (645-655)
  • reset_watchdog (496-498)
  • create_task (437-476)
  • cancel_task (478-485)
  • put (101-118)
  • wait_for_task (487-494)
  • FrameDirection (46-55)
src/pipecat/utils/asyncio/task_manager.py (8)
  • BaseTaskManager (43-148)
  • reset_watchdog (332-340)
  • create_task (69-92)
  • create_task (207-271)
  • cancel_task (110-120)
  • cancel_task (302-330)
  • wait_for_task (95-107)
  • wait_for_task (273-300)
src/pipecat/frames/frames.py (2)
  • TTSAudioRawFrame (235-241)
  • Frame (86-117)
🔇 Additional comments (13)
pyproject.toml (1)

88-89: Add websockets to the respeecher extra

Replace in pyproject.toml:

-respeecher = [ "respeecher~=1.0" ]
+respeecher = [ "respeecher~=1.0", "websockets>=13.1,<15.0" ]

Confirmed PyPI package name is respeecher at version 1.0.1.

src/pipecat/services/tts_service.py (5)

10-11: LGTM! Good use of abstract patterns.

The addition of Coroutine to imports and BaseTaskManager import are appropriate for the new mixin architecture.


349-353: LGTM! Clean refactoring of silence generation.

The switch to using the new silence_frame() method provides a cleaner API for silence generation and makes the code more maintainable.


715-853: Well-designed audio context management mixin.

The _AudioContextServiceMixin provides a clean abstraction for managing audio contexts with proper queue-based ordering and timeout handling. The abstract methods create a clear contract for implementers.


855-915: LGTM! Clean integration of mixin with TTS services.

The AudioContextTTSService class properly integrates the mixin with WebSocket TTS, handling lifecycle methods correctly and ensuring graceful shutdown with the queue termination signal (None).


917-978: LGTM! Consistent implementation for word-based TTS.

The AudioContextWordTTSService maintains consistency with the non-word variant while properly extending WebsocketWordTTSService.

src/pipecat/services/respeecher/tts.py (7)

1-8: Copyright year is 2025 - verify if intentional.

The copyright notice shows 2025, but based on the PR date (2025-09-01), this appears to be September 2025. Please confirm this is intentional.


34-47: LGTM! Good error handling for optional dependencies.

The module import guard with helpful error messages is well implemented, guiding users to install the optional dependency when missing.


132-147: Add validation for required context_id.

Good use of assertion to ensure context_id is set. The request building is clean and well-structured.


189-202: LGTM! Robust WebSocket connection handling.

The connection logic properly checks for existing open connections and handles errors gracefully with event callbacks.


220-227: Good interruption handling.

The interruption handler properly cancels the current context and cleans up metrics. The WebSocket cancel message ensures server-side cleanup.


249-285: Excellent message handling with validation.

The receive handler properly validates messages with Pydantic, handles all response types, and includes good error logging. The watchdog iterator ensures the task doesn't timeout during long waits.


286-322: Well-structured TTS generation with proper error recovery.

The run_tts method handles connection state, creates audio contexts appropriately, and includes proper error handling with reconnection logic. The tracing decorator enables performance monitoring.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/pipecat/services/tts_service.py (1)

732-741: Fix race: create the per-context queue before enqueuing the context id.
await self._contexts_queue.put(context_id) can yield; the handler may read context_id and access self._contexts[context_id] before it exists, raising KeyError.

Apply this diff:

-        await self._contexts_queue.put(context_id)
-        self._contexts[context_id] = asyncio.Queue()
+        self._contexts[context_id] = asyncio.Queue()
+        await self._contexts_queue.put(context_id)
🧹 Nitpick comments (7)
src/pipecat/services/tts_service.py (7)

357-361: Preserve direction when injecting silence after TTSStoppedFrame.
Forward the original direction to avoid surprising routing if callers ever use non-default directions.

-            await self.push_frame(silence_frame)
+            await self.push_frame(silence_frame, direction)

806-810: Make inter-context silence duration configurable.
A hard-coded 1s gap may be too long/short depending on UX. Prefer a class attribute to allow easy tuning/override.

-                SILENCE_BETWEEN_CONTEXTS = 1
-                silence_frame = self.silence_frame(SILENCE_BETWEEN_CONTEXTS)
+                silence_frame = self.silence_frame(self.CONTEXT_SILENCE_TIME_S)

Add outside this hunk (class scope near line 720):

# class _AudioContextServiceMixin(ABC):
CONTEXT_SILENCE_TIME_S: float = 1.0

815-832: Avoid premature context termination; make the timeout a tunable attribute.
A fixed 3.0s may truncate audio on slow networks. Expose it for services to override and document the tradeoff.

-        AUDIO_CONTEXT_TIMEOUT = 3.0
+        timeout_s = self.CONTEXT_TIMEOUT_S
 ...
-                frame = await asyncio.wait_for(queue.get(), timeout=AUDIO_CONTEXT_TIMEOUT)
+                frame = await asyncio.wait_for(queue.get(), timeout=timeout_s)

Add outside this hunk (class scope near line 720):

# class _AudioContextServiceMixin(ABC):
CONTEXT_TIMEOUT_S: float = 3.0

Please verify typical TTS stall gaps in your environments; adjust CONTEXT_TIMEOUT_S if you see tail truncation under load.


833-857: Align abstract method signatures with concrete implementations for better type-safety.
Broaden abstracts to match existing FrameProcessor/services API (name/timeout/direction), avoiding mypy/IDE warnings and future surprises.

-    def create_task(self, coroutine: Coroutine) -> asyncio.Task:
+    def create_task(self, coroutine: Coroutine, name: Optional[str] = None) -> asyncio.Task:
@@
-    async def cancel_task(self, task: asyncio.Task) -> None:
+    async def cancel_task(self, task: asyncio.Task, timeout: Optional[float] = None) -> None:
@@
-    async def push_frame(self, frame: Frame) -> None:
+    async def push_frame(
+        self,
+        frame: Frame,
+        direction: "FrameDirection" = FrameDirection.DOWNSTREAM,
+    ) -> None:

781-786: Optional: provide backpressure knobs for queues.
Unbounded queues can grow under bursty producers. Consider class attrs to set maxsize for _contexts_queue and per-context queues.

Example (outside this hunk, class scope):

CONTEXTS_QUEUE_MAXSIZE: int = 0      # 0 = unbounded
CONTEXT_QUEUE_MAXSIZE: int = 0

And update initializations:

-            self._contexts_queue = asyncio.Queue()
+            self._contexts_queue = asyncio.Queue(maxsize=self.CONTEXTS_QUEUE_MAXSIZE)

And in create_audio_context (with the race fix applied):

-        self._contexts[context_id] = asyncio.Queue()
+        self._contexts[context_id] = asyncio.Queue(maxsize=self.CONTEXT_QUEUE_MAXSIZE)

889-905: Stop path is clean; awaits graceful drain after sentinel.
LGTM. Consider mirroring the same approach in the word-timestamps variant for consistency.


961-968: Avoid deprecated wait_for_task; await the task directly (match TTS variant).
Keeps behavior consistent and avoids deprecation warnings.

-            await self.wait_for_task(self._audio_context_task)
+            await self._audio_context_task
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fa4624b and 79b9f28.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • README.md (1 hunks)
  • env.example (1 hunks)
  • pyproject.toml (1 hunks)
  • src/pipecat/services/tts_service.py (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • pyproject.toml
  • env.example
  • README.md
🧰 Additional context used
🧬 Code graph analysis (1)
src/pipecat/services/tts_service.py (5)
src/pipecat/transports/network/websocket_client.py (1)
  • task_manager (102-115)
src/pipecat/processors/frame_processor.py (5)
  • task_manager (325-336)
  • create_task (416-430)
  • cancel_task (432-439)
  • put (95-112)
  • wait_for_task (441-466)
src/pipecat/utils/asyncio/task_manager.py (5)
  • BaseTaskManager (33-92)
  • create_task (58-70)
  • create_task (141-175)
  • cancel_task (73-83)
  • cancel_task (177-203)
src/pipecat/frames/frames.py (4)
  • TTSAudioRawFrame (261-267)
  • Frame (112-143)
  • StartFrame (717-741)
  • EndFrame (1288-1298)
src/pipecat/transports/base_output.py (8)
  • sample_rate (88-94)
  • sample_rate (386-392)
  • start (105-117)
  • start (403-426)
  • stop (119-126)
  • stop (428-452)
  • cancel (128-135)
  • cancel (454-463)
🔇 Additional comments (4)
src/pipecat/services/tts_service.py (4)

10-11: Good: abstract base + Coroutine typing are appropriate for the new mixin.
No issues here; this enables clean protocol-style hooks.


40-41: Good: task-manager import aligns with the mixin’s abstract task_manager property.
This keeps orchestration consistent with existing TaskManager-based infra.


374-386: Silence frame helper looks good.
Generates correct 16‑bit mono silence with the current sample rate.


823-829: Confirm reset_watchdog availability and semantics.
Ensure concrete services implement reset_watchdog() to keep connections alive during context gaps/timeouts; otherwise the mixin will leave classes abstract or NOP.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/pipecat/services/respeecher/tts.py (1)

166-173: Docstring refers to “Stop” and “end frame” in cancel(); should be “Cancel” and “cancel frame”.

Same point was raised earlier; aligning terminology improves API clarity.

Apply this diff:

-    async def cancel(self, frame: CancelFrame):
-        """Stop the Respeecher TTS service.
+    async def cancel(self, frame: CancelFrame):
+        """Cancel the Respeecher TTS service.
@@
-            frame: The end frame.
+            frame: The cancel frame.
🧹 Nitpick comments (7)
src/pipecat/services/respeecher/tts.py (4)

219-226: Proactively clear audio context on interruption to avoid ~3s timeout stall.

Currently you rely on the mixin timeout to finish the context. Remove it immediately after sending cancel to improve responsiveness.

Apply this diff:

-        if self._context_id:
-            cancel_request = json.dumps({"context_id": self._context_id, "cancel": True})
-            await self._get_websocket().send(cancel_request)
-            self._context_id = None
+        if self._context_id:
+            ctx = self._context_id
+            cancel_request = json.dumps({"context_id": ctx, "cancel": True})
+            await self._get_websocket().send(cancel_request)
+            await self.remove_audio_context(ctx)
+            self._context_id = None

188-196: Add connection timeout to websocket connect.

Prevents hangs on network issues.

Apply this diff:

-            self._websocket = await websocket_connect(
-                f"{self._url}/{self._model_name}/tts/websocket?api_key={self._api_key}"
-            )
+            self._websocket = await websocket_connect(
+                f"{self._url}/{self._model_name}/tts/websocket?api_key={self._api_key}",
+                open_timeout=10,
+            )

296-298: Broaden state check before (re)connecting.

Reconnect unless the socket is OPEN.

Apply this diff:

-            if not self._websocket or self._websocket.state is State.CLOSED:
+            if not self._websocket or self._websocket.state is not State.OPEN:

271-276: Emit TTSStoppedFrame when response.type == "done"
Add a stop frame after removing the audio context to align with other TTS services and enable downstream buffers to flush:

             if response.type == "done":
                 await self.stop_ttfb_metrics()
                 await self.remove_audio_context(response.context_id)
+                await self.push_frame(TTSStoppedFrame())
src/pipecat/services/tts_service.py (3)

805-809: Also emit TTSStoppedFrame at the end of each audio context.

This helps downstream components finalize packets/buffers per context.

Apply this diff:

-                # Append some silence between contexts.
-                SILENCE_BETWEEN_CONTEXTS = 1
-                silence_frame = self.silence_frame(SILENCE_BETWEEN_CONTEXTS)
-                await self.push_frame(silence_frame)
+                # Mark context end and add inter-context silence.
+                await self.push_frame(TTSStoppedFrame())
+                SILENCE_BETWEEN_CONTEXTS = 1
+                await self.push_frame(self.silence_frame(SILENCE_BETWEEN_CONTEXTS))

831-849: Abstract method signatures should match concrete implementations.

Align with FrameProcessor.create_task/cancel_task/push_frame to keep type checkers happy and avoid subtle MRO issues.

Apply this diff:

-    def create_task(self, coroutine: Coroutine) -> asyncio.Task:
+    def create_task(self, coroutine: Coroutine, name: Optional[str] = None) -> asyncio.Task:
@@
-    async def cancel_task(self, task: asyncio.Task) -> None:
+    async def cancel_task(self, task: asyncio.Task, timeout: Optional[float] = None) -> None:
@@
-    async def push_frame(self, frame: Frame) -> None:
+    async def push_frame(self, frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM) -> None:

816-817: Make timeouts/silence duration configurable constants.

Expose AUDIO_CONTEXT_TIMEOUT and SILENCE_BETWEEN_CONTEXTS as class attrs for tuning/testing.

Also applies to: 805-809

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 79b9f28 and 233fef4.

📒 Files selected for processing (2)
  • src/pipecat/services/respeecher/tts.py (1 hunks)
  • src/pipecat/services/tts_service.py (7 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-01T15:48:25.154Z
Learnt from: Kharacternyk
PR: respeecher/pipecat#1
File: src/pipecat/services/respeecher/tts.py:109-110
Timestamp: 2025-09-01T15:48:25.154Z
Learning: Pipecat requires Python 3.10+ as specified in pyproject.toml with "requires-python = ">=3.10"". This means newer Python syntax like `str | None` union types are appropriate and preferred over `Optional[str]`.

Applied to files:

  • src/pipecat/services/respeecher/tts.py
🧬 Code graph analysis (2)
src/pipecat/services/tts_service.py (3)
src/pipecat/processors/frame_processor.py (5)
  • task_manager (325-336)
  • push_frame (606-616)
  • create_task (416-430)
  • cancel_task (432-439)
  • FrameDirection (42-51)
src/pipecat/transports/network/websocket_client.py (1)
  • task_manager (102-115)
src/pipecat/utils/asyncio/task_manager.py (5)
  • BaseTaskManager (33-92)
  • create_task (58-70)
  • create_task (141-175)
  • cancel_task (73-83)
  • cancel_task (177-203)
src/pipecat/services/respeecher/tts.py (4)
src/pipecat/frames/frames.py (10)
  • CancelFrame (745-752)
  • EndFrame (1288-1298)
  • ErrorFrame (756-774)
  • Frame (112-143)
  • LLMFullResponseEndFrame (1384-1391)
  • StartFrame (717-741)
  • StartInterruptionFrame (860-869)
  • TTSAudioRawFrame (261-267)
  • TTSStartedFrame (1395-1405)
  • TTSStoppedFrame (1409-1412)
src/pipecat/processors/frame_processor.py (3)
  • FrameDirection (42-51)
  • stop_all_metrics (411-414)
  • push_error (596-604)
src/pipecat/services/tts_service.py (32)
  • AudioContextTTSService (852-911)
  • sample_rate (138-144)
  • start (218-228)
  • start (495-502)
  • start (876-883)
  • start (939-946)
  • stop (230-239)
  • stop (504-511)
  • stop (885-897)
  • stop (948-960)
  • cancel (241-250)
  • cancel (513-520)
  • cancel (899-906)
  • cancel (962-969)
  • create_task (831-832)
  • _report_error (601-603)
  • _report_error (673-675)
  • cancel_task (835-836)
  • _handle_interruption (387-391)
  • _handle_interruption (536-539)
  • _handle_interruption (626-630)
  • _handle_interruption (698-702)
  • _handle_interruption (908-911)
  • _handle_interruption (971-974)
  • flush_audio (214-216)
  • audio_context_available (769-778)
  • push_frame (348-371)
  • push_frame (839-840)
  • remove_audio_context (754-767)
  • append_to_audio_context (741-752)
  • run_tts (181-192)
  • create_audio_context (731-739)
src/pipecat/utils/base_object.py (1)
  • _call_event_handler (121-141)
🔇 Additional comments (6)
src/pipecat/services/respeecher/tts.py (4)

108-109: Union type str | None is correct for this codebase (Python 3.10+).

No action needed.


178-180: receive_task_handler Exists in Base Class
Confirmed: respeecher/tts.py’s class inherits (directly or via MRO) from websocket_service.WebsocketService, which defines

async def _receive_task_handler(self, report_error: Callable[[ErrorFrame], Awaitable[None]]):
    …

so _receive_task_handler is present at runtime. No action needed.


193-195: Verify header/subprotocol support
No public documentation indicates the streaming TTS websocket accepts API keys via an Authorization header or subprotocol—embedding keys in URLs can leak via logs or proxies. Please contact Respeecher support to confirm or request header-based/subprotocol authentication for the websocket.


284-284: Ignore signature change – None yields are intentional.

process_generator’s docstring and implementation expect the generator to yield None as a handshake marker (it skips falsy values), so removing yield None and changing the return type would break the streaming flow. Leave the signature and yield None intact.

Likely an incorrect or invalid review comment.

src/pipecat/services/tts_service.py (2)

355-359: Silence insertion API looks good.

Centralizing silence creation via silence_frame() improves reuse and readability.

Also applies to: 373-386


948-961: Verify availability of wait_for_task in TTSService
Found wait_for_task defined in src/pipecat/processors/frame_processor.py:441; ensure TTSService actually inherits this helper before swapping to a raw await self._audio_context_task.

self._api_key = api_key
self._url = url
self._output_format: StreamingOutputFormatParams = {
"encoding": "pcm_s16le",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also accept "pcm_f32le", should we allow users to set it up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it not configurable because at a glance there is no floating-point sample support in Pipecat audio facilities. I should ask them in the future PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is actually better that they don't support it since it doubles the size of the audio, and it's hard to imagine how it could be useful.

await super().stop(frame)
await self._disconnect()

async def cancel(self, frame: CancelFrame):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between stop and cancel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No difference for us and other services that I've been skimming through. Everyone does the same in both handlers. Conceptually, from what I understand, a stop must leave the object in a state where another start is then possible, whereas there's no such requirement for cancel, so cancel could be quicker for some services

True if the context exists and is available.
"""
return context_id in self._contexts

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file, do not break anything for other integrations, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope so:) Of course that's my goal, but it's difficult to test properly. The main change is conceptually simple: decouple audio context management from word timestamp functionality. We would be the first Pipecat TTS service that has the former but not the latter. Decoupling's done mostly by moving code verbatim into a separate class, so there should be low probability of subtle bugs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't delved deep into the Pipecat, but it looks like they are calling methods like push_frame without passing any channel id. Doesn't that mean that this class does not support multiple independent channels, so contexts are not useful?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in DMs. TL;DR: in Pipecat contexts are useful for interruption handling rather than multiplexing per se

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
src/pipecat/services/respeecher/tts.py (1)

166-173: Docstring is now correct for cancel.

Thanks for addressing the earlier feedback; this now accurately describes the cancel semantics.

🧹 Nitpick comments (5)
src/pipecat/services/respeecher/tts.py (5)

188-196: Add WebSocket timeouts to prevent hangs.

Configure reasonable open_timeout/close_timeout (and optionally ping intervals) to avoid indefinite waits and improve recovery.

-            self._websocket = await websocket_connect(
-                f"{self._url}/{self._model_name}/tts/websocket?api_key={self._api_key}"
-            )
+            self._websocket = await websocket_connect(
+                f"{self._url}/{self._model_name}/tts/websocket?api_key={self._api_key}",
+                open_timeout=10,
+                close_timeout=10,
+            )

43-45: Preserve exception type and traceback for missing optional deps.

Raising a generic Exception discards useful diagnostics. Re-raise as ModuleNotFoundError (or ImportError) and chain from e.

-    raise Exception(f"Missing module: {e}")
+    raise ModuleNotFoundError(f"Missing module: {e}") from e

108-110: Type hygiene nit: annotate _receive_task and standardize Optional style.

Since 3.10+ is required, consider asyncio.Task | None and using | None consistently, or keep as-is for local consistency.

+from typing import AsyncGenerator, Optional
+import asyncio
@@
-        self._receive_task = None
+        self._receive_task: asyncio.Task | None = None

219-226: Interruption cleanup: confirm server-ack cancel reliably removes the context.

If the server drops the connection or loses the cancel, the context might linger. Consider a fallback removal after a short grace period if no done/error arrives.


248-283: Metrics ordering is idempotent, but consider guarding repeated stop_ttfb_metrics().

stop_ttfb_metrics() is called on every chunk and on done. If it’s not idempotent, add a guard; otherwise ignore.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 233fef4 and b0e971b.

📒 Files selected for processing (1)
  • src/pipecat/services/respeecher/tts.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-01T15:48:25.154Z
Learnt from: Kharacternyk
PR: respeecher/pipecat#1
File: src/pipecat/services/respeecher/tts.py:109-110
Timestamp: 2025-09-01T15:48:25.154Z
Learning: Pipecat requires Python 3.10+ as specified in pyproject.toml with "requires-python = ">=3.10"". This means newer Python syntax like `str | None` union types are appropriate and preferred over `Optional[str]`.

Applied to files:

  • src/pipecat/services/respeecher/tts.py
🧬 Code graph analysis (1)
src/pipecat/services/respeecher/tts.py (4)
src/pipecat/frames/frames.py (10)
  • CancelFrame (745-752)
  • EndFrame (1288-1298)
  • ErrorFrame (756-774)
  • Frame (112-143)
  • LLMFullResponseEndFrame (1384-1391)
  • StartFrame (717-741)
  • StartInterruptionFrame (860-869)
  • TTSAudioRawFrame (261-267)
  • TTSStartedFrame (1395-1405)
  • TTSStoppedFrame (1409-1412)
src/pipecat/processors/frame_processor.py (1)
  • FrameDirection (42-51)
src/pipecat/services/tts_service.py (15)
  • sample_rate (138-144)
  • start (218-228)
  • start (495-502)
  • start (876-883)
  • start (939-946)
  • stop (230-239)
  • stop (504-511)
  • stop (885-897)
  • stop (948-960)
  • flush_audio (214-216)
  • audio_context_available (769-778)
  • remove_audio_context (754-767)
  • append_to_audio_context (741-752)
  • run_tts (181-192)
  • create_audio_context (731-739)
src/pipecat/utils/tracing/service_decorators.py (1)
  • traced_tts (101-218)

Copy link

@mitrushchienkova mitrushchienkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

The only note: should we do something about the coma splitting issue (partially discussed in Slack)?


# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a little confusing because it looks like you do call SileroVADAnalyzer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called in a lambda on-demand, not at the top-level. This comment is the same across examples with different TTSes, our custom code is just lines 55-65


request: ContextfulGenerationRequestParams = {
"transcript": text,
"continue": continue_transcript,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we harmonize naming of this "continue" variable? You said that in inf we also use continue_ and do_continue, right? Also, is there a reason we need to call the text to convert sometimes text and sometimes transcript?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • continue: the official piece of API
  • continue_: our auto-generated SDK since continue is a keyword
  • do_continue: internal code for the same reason that continue is a keyword, can be aligned with continue_

Variable naming (continue_transcript, text, etc.) in this file mostly follows other TTS integration files so that the probability of a delay in merging due to minor naming inconsistencies is minimized

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this makes sense. I guess we can stick with the naming system used by other TTS integration files, especially if it will make it easier for people to switch to ours. And maybe switch do_continue in inf when you think of it to at least eliminate that variant.

)
await self.push_frame(frame)
# Append some silence between contexts.
SILENCE_BETWEEN_CONTEXTS = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on my comment above, why would someone switch context? (And does it actually make sense to add silence if they do?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There must be reasons for adding silence if the logic is there. It's not specific to our integration so I wouldn't concentrate much on this. I'm already doubtful if I should include this piece of refactoring. (It's a byproduct of an earlier version of the integration where I did need a way to disable this silence adding.)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/pipecat/services/tts_service.py (1)

959-959: Consider using async task cancellation instead of wait_for_task.

Line 959 uses the deprecated wait_for_task method. Based on the deprecation notice in the relevant code snippets, this should be replaced with direct task awaiting for consistency.

Apply this diff to use the modern approach:

-            await self.wait_for_task(self._audio_context_task)
+            await self._audio_context_task
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fbdb350 and e1a9c1b.

📒 Files selected for processing (4)
  • README.md (1 hunks)
  • env.example (1 hunks)
  • pyproject.toml (1 hunks)
  • src/pipecat/services/tts_service.py (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • README.md
  • env.example
  • pyproject.toml
🧰 Additional context used
🧬 Code graph analysis (1)
src/pipecat/services/tts_service.py (3)
src/pipecat/processors/frame_processor.py (5)
  • task_manager (351-362)
  • create_task (442-456)
  • cancel_task (458-465)
  • put (96-113)
  • wait_for_task (467-492)
src/pipecat/utils/asyncio/task_manager.py (5)
  • BaseTaskManager (33-92)
  • create_task (58-70)
  • create_task (141-175)
  • cancel_task (73-83)
  • cancel_task (177-203)
src/pipecat/frames/frames.py (2)
  • TTSAudioRawFrame (261-267)
  • Frame (112-143)
🔇 Additional comments (7)
src/pipecat/services/tts_service.py (7)

10-11: LGTM! Imports align with the new mixin architecture.

The added ABC imports and BaseTaskManager import are necessary for the new _AudioContextServiceMixin abstract methods and task management requirements.

Also applies to: 40-40


356-385: Excellent refactoring! Centralized silence frame generation.

The silence_frame method provides a clean, reusable way to generate silence frames with proper audio parameters. This is a good abstraction that will be used by both the regular TTS service and the new mixin.


719-849: Well-designed mixin for audio context management.

The _AudioContextServiceMixin provides a clean separation of concerns by centralizing audio context orchestration. The abstract methods create a proper contract that implementing classes must fulfill. The queue-based architecture ensures ordered processing of audio contexts.

A few observations:

  • The timeout handling (3 seconds) provides good resilience against hanging contexts
  • The silence insertion between contexts (1 second) helps with audio clarity
  • Proper lifecycle management with graceful shutdown

852-912: LGTM! Clean implementation of AudioContextTTSService.

The class properly inherits from both WebsocketTTSService and _AudioContextServiceMixin, with correct initialization order and proper lifecycle management. The interruption handling that restarts the audio context task is appropriate for maintaining clean state after interruptions.


914-974: LGTM! Consistent AudioContextWordTTSService implementation.

The implementation mirrors AudioContextTTSService but extends the word timestamp functionality. The pattern is consistent and maintains the same lifecycle management approach.


830-849: Abstract method implementations verified — no action required.

FrameProcessor (src/pipecat/processors/frame_processor.py) provides create_task, cancel_task, push_frame and task_manager; TTSService (src/pipecat/services/tts_service.py) provides silence_frame; AudioContextTTSService and AudioContextWordTTSService inherit these via WebsocketTTSService/WebsocketWordTTSService so the mixin contract is satisfied.


805-808: Make inter-context silence configurable and document it — src/pipecat/services/tts_service.py:805–808

Hardcoded 1s silence is likely noticeable; SSML guidance favors shorter defaults (100–250ms short pauses, 400–1000ms sentence/paragraph). Actions:

  • Replace the hardcoded SILENCE_BETWEEN_CONTEXTS=1 with a configurable duration (ms) or emit SSML ; consider a default ~400–600ms.
  • Document the insertion behavior in user-facing docs/README and how to override it.
  • Run listening tests across target voices/devices and identify flows (short prompts, concatenated snippets, low-latency UX) where a 1s gap is problematic.

Kharacternyk pushed a commit that referenced this pull request Dec 26, 2025
Adds support for using Ultravox Realtime as a speech-to-speech service.

Also removes the deprecated Ultravox speech-to-text vllm model integration to avoid confusion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants