-
Notifications
You must be signed in to change notification settings - Fork 0
Add Respeecher TTS #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis PR introduces Respeecher TTS service integration to Pipecat. It adds a new RespeecherTTSService class implementing WebSocket-based text-to-speech with audio context management and streaming capabilities. The changes include environment configuration for the Respeecher API key, an optional dependency declaration, and a new example script demonstrating an interruptible chatbot with integrated VAD, STT, LLM, and Respeecher TTS services. Additionally, the base TTS service classes are refactored to introduce _AudioContextServiceMixin, centralizing audio-context orchestration, silence generation, and lifecycle management across TTS implementations. Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Repository: respeecher/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (9)
env.example (1)
143-145: Expose voice selection in env for the exampleAdd a voice variable (optional but helpful for quickstart parity), then read it in the example.
# Respeecher RESPEECHER_API_KEY=... +RESPEECHER_VOICE_ID=samanthaexamples/foundational/07ad-interruptible-respeecher.py (4)
12-12: Avoid hard dependency on Silero extra in example; lazily import with fallbackTop-level import breaks if users don’t have the silero extra. Lazy import and continue without VAD.
-from pipecat.audio.vad.silero import SileroVADAnalyzer +def _get_silero_vad(): + try: + from pipecat.audio.vad.silero import SileroVADAnalyzer + return SileroVADAnalyzer() + except Exception: + logger.warning("Silero VAD not installed; continuing without VAD.") + return None @@ transport_params = { - "daily": lambda: DailyParams( - audio_in_enabled=True, - audio_out_enabled=True, - vad_analyzer=SileroVADAnalyzer(), - ), + "daily": lambda: DailyParams( + audio_in_enabled=True, + audio_out_enabled=True, + vad_analyzer=_get_silero_vad(), + ), - "twilio": lambda: FastAPIWebsocketParams( - audio_in_enabled=True, - audio_out_enabled=True, - vad_analyzer=SileroVADAnalyzer(), - ), + "twilio": lambda: FastAPIWebsocketParams( + audio_in_enabled=True, + audio_out_enabled=True, + vad_analyzer=_get_silero_vad(), + ), - "webrtc": lambda: TransportParams( - audio_in_enabled=True, - audio_out_enabled=True, - vad_analyzer=SileroVADAnalyzer(), - ), + "webrtc": lambda: TransportParams( + audio_in_enabled=True, + audio_out_enabled=True, + vad_analyzer=_get_silero_vad(), + ), }Also applies to: 31-47
55-66: Parameterize Respeecher voice (and optionally model) via envMake the example configurable without code edits.
tts = RespeecherTTSService( - api_key=os.getenv("RESPEECHER_API_KEY"), - voice_id="samantha", + api_key=os.getenv("RESPEECHER_API_KEY"), + voice_id=os.getenv("RESPEECHER_VOICE_ID", "samantha"), + # model=os.getenv("RESPEECHER_TTS_MODEL", "public/tts/en-rt"),
100-106: Use context.add_message for clarity over aliasing the original listThis avoids reliance on OpenAILLMContext referencing the same list object.
- messages.append({"role": "system", "content": "Please introduce yourself to the user."}) + context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
50-52: Nit: drop unnecessary f-string- logger.info(f"Starting bot") + logger.info("Starting bot")src/pipecat/services/tts_service.py (2)
367-380: Consider extracting the magic number for bytes per sample.The silence frame generation is well implemented, but the magic number
2for 16-bit audio could be made more explicit.def silence_frame(self, duration_s: float) -> TTSAudioRawFrame: """Create a frame of silence. Args: duration_s: Silence duration in seconds. """ - silence_num_bytes = int(duration_s * self.sample_rate * 2) # 16-bit + BYTES_PER_SAMPLE = 2 # 16-bit audio + silence_num_bytes = int(duration_s * self.sample_rate * BYTES_PER_SAMPLE) return TTSAudioRawFrame( audio=b"\x00" * silence_num_bytes, sample_rate=self.sample_rate, num_channels=1, )
802-806: Consider making the silence duration configurable.The 1-second silence between contexts is hardcoded. Consider making this configurable for more flexibility.
class _AudioContextServiceMixin(ABC): """A service that supports audio contexts. This class does not inherit from other service base classes to avoid diamond inheritance. """ - def __init__(self): + def __init__(self, silence_between_contexts: float = 1.0): """Initialize the service.""" self._contexts: Dict[str, asyncio.Queue] = {} self._audio_context_task = None + self._silence_between_contexts = silence_between_contextsAnd update line 803:
- # Append some silence between contexts. - SILENCE_BETWEEN_CONTEXTS = 1 - silence_frame = self.silence_frame(SILENCE_BETWEEN_CONTEXTS) + # Append some silence between contexts. + silence_frame = self.silence_frame(self._silence_between_contexts)src/pipecat/services/respeecher/tts.py (2)
56-64: Consider adding validation for sampling parameters.The
InputParamsmodel accepts anySamplingParamswithout validation. Consider adding validation or documentation about valid parameters.Add documentation about valid sampling parameters in the docstring:
class InputParams(BaseModel): """Input parameters for Respeecher TTS configuration. Parameters: sampling_params: Sampling parameters used for speech synthesis. + Valid parameters include temperature, top_p, etc. + See Respeecher documentation for full list. """
298-301: Consider potential race condition in connection check.There's a small window where the WebSocket state could change between the check and usage. Consider combining the connection check with the connection attempt.
- if not self._websocket or self._websocket.state is State.CLOSED: - await self._connect() + await self._ensure_connected()Add a helper method:
async def _ensure_connected(self): """Ensure WebSocket is connected, reconnecting if necessary.""" if not self._websocket or self._websocket.state is not State.OPEN: await self._connect()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
README.md(1 hunks)env.example(1 hunks)examples/foundational/07ad-interruptible-respeecher.py(1 hunks)pyproject.toml(1 hunks)src/pipecat/services/respeecher/tts.py(1 hunks)src/pipecat/services/tts_service.py(7 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
examples/foundational/07ad-interruptible-respeecher.py (14)
src/pipecat/audio/vad/silero.py (1)
SileroVADAnalyzer(130-226)src/pipecat/pipeline/pipeline.py (1)
Pipeline(91-212)src/pipecat/pipeline/runner.py (1)
PipelineRunner(26-124)src/pipecat/pipeline/task.py (2)
PipelineParams(62-101)PipelineTask(104-733)src/pipecat/processors/aggregators/openai_llm_context.py (1)
OpenAILLMContext(59-348)src/pipecat/runner/types.py (1)
RunnerArguments(20-30)src/pipecat/runner/utils.py (1)
create_transport(418-514)src/pipecat/services/deepgram/stt.py (1)
DeepgramSTTService(46-322)src/pipecat/services/openai/llm.py (1)
OpenAILLMService(58-108)src/pipecat/services/respeecher/tts.py (1)
RespeecherTTSService(49-321)src/pipecat/transports/base_transport.py (2)
BaseTransport(155-196)TransportParams(27-152)src/pipecat/transports/network/fastapi_websocket.py (1)
FastAPIWebsocketParams(53-64)src/pipecat/transports/services/daily.py (1)
DailyParams(180-201)src/pipecat/processors/aggregators/llm_response.py (2)
context(311-317)get_context_frame(319-325)
src/pipecat/services/respeecher/tts.py (4)
src/pipecat/processors/frame_processor.py (3)
FrameDirection(46-55)stop_all_metrics(432-435)push_error(635-643)src/pipecat/services/tts_service.py (12)
stop(231-240)stop(500-507)stop(888-900)stop(951-963)create_task(834-835)flush_audio(215-217)audio_context_available(765-774)push_frame(342-365)push_frame(842-843)remove_audio_context(750-763)append_to_audio_context(737-748)create_audio_context(727-735)src/pipecat/utils/asyncio/watchdog_async_iterator.py (1)
WatchdogAsyncIterator(20-103)src/pipecat/utils/base_object.py (1)
_call_event_handler(120-140)
src/pipecat/services/tts_service.py (3)
src/pipecat/processors/frame_processor.py (8)
task_manager(346-357)push_frame(645-655)reset_watchdog(496-498)create_task(437-476)cancel_task(478-485)put(101-118)wait_for_task(487-494)FrameDirection(46-55)src/pipecat/utils/asyncio/task_manager.py (8)
BaseTaskManager(43-148)reset_watchdog(332-340)create_task(69-92)create_task(207-271)cancel_task(110-120)cancel_task(302-330)wait_for_task(95-107)wait_for_task(273-300)src/pipecat/frames/frames.py (2)
TTSAudioRawFrame(235-241)Frame(86-117)
🔇 Additional comments (13)
pyproject.toml (1)
88-89: Addwebsocketsto therespeecherextraReplace in pyproject.toml:
-respeecher = [ "respeecher~=1.0" ] +respeecher = [ "respeecher~=1.0", "websockets>=13.1,<15.0" ]Confirmed PyPI package name is
respeecherat version 1.0.1.src/pipecat/services/tts_service.py (5)
10-11: LGTM! Good use of abstract patterns.The addition of
Coroutineto imports andBaseTaskManagerimport are appropriate for the new mixin architecture.
349-353: LGTM! Clean refactoring of silence generation.The switch to using the new
silence_frame()method provides a cleaner API for silence generation and makes the code more maintainable.
715-853: Well-designed audio context management mixin.The
_AudioContextServiceMixinprovides a clean abstraction for managing audio contexts with proper queue-based ordering and timeout handling. The abstract methods create a clear contract for implementers.
855-915: LGTM! Clean integration of mixin with TTS services.The
AudioContextTTSServiceclass properly integrates the mixin with WebSocket TTS, handling lifecycle methods correctly and ensuring graceful shutdown with the queue termination signal (None).
917-978: LGTM! Consistent implementation for word-based TTS.The
AudioContextWordTTSServicemaintains consistency with the non-word variant while properly extendingWebsocketWordTTSService.src/pipecat/services/respeecher/tts.py (7)
1-8: Copyright year is 2025 - verify if intentional.The copyright notice shows 2025, but based on the PR date (2025-09-01), this appears to be September 2025. Please confirm this is intentional.
34-47: LGTM! Good error handling for optional dependencies.The module import guard with helpful error messages is well implemented, guiding users to install the optional dependency when missing.
132-147: Add validation for required context_id.Good use of assertion to ensure context_id is set. The request building is clean and well-structured.
189-202: LGTM! Robust WebSocket connection handling.The connection logic properly checks for existing open connections and handles errors gracefully with event callbacks.
220-227: Good interruption handling.The interruption handler properly cancels the current context and cleans up metrics. The WebSocket cancel message ensures server-side cleanup.
249-285: Excellent message handling with validation.The receive handler properly validates messages with Pydantic, handles all response types, and includes good error logging. The watchdog iterator ensures the task doesn't timeout during long waits.
286-322: Well-structured TTS generation with proper error recovery.The
run_ttsmethod handles connection state, creates audio contexts appropriately, and includes proper error handling with reconnection logic. The tracing decorator enables performance monitoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/pipecat/services/tts_service.py (1)
732-741: Fix race: create the per-context queue before enqueuing the context id.
await self._contexts_queue.put(context_id)can yield; the handler may readcontext_idand accessself._contexts[context_id]before it exists, raising KeyError.Apply this diff:
- await self._contexts_queue.put(context_id) - self._contexts[context_id] = asyncio.Queue() + self._contexts[context_id] = asyncio.Queue() + await self._contexts_queue.put(context_id)
🧹 Nitpick comments (7)
src/pipecat/services/tts_service.py (7)
357-361: Preserve direction when injecting silence after TTSStoppedFrame.
Forward the originaldirectionto avoid surprising routing if callers ever use non-default directions.- await self.push_frame(silence_frame) + await self.push_frame(silence_frame, direction)
806-810: Make inter-context silence duration configurable.
A hard-coded 1s gap may be too long/short depending on UX. Prefer a class attribute to allow easy tuning/override.- SILENCE_BETWEEN_CONTEXTS = 1 - silence_frame = self.silence_frame(SILENCE_BETWEEN_CONTEXTS) + silence_frame = self.silence_frame(self.CONTEXT_SILENCE_TIME_S)Add outside this hunk (class scope near line 720):
# class _AudioContextServiceMixin(ABC): CONTEXT_SILENCE_TIME_S: float = 1.0
815-832: Avoid premature context termination; make the timeout a tunable attribute.
A fixed 3.0s may truncate audio on slow networks. Expose it for services to override and document the tradeoff.- AUDIO_CONTEXT_TIMEOUT = 3.0 + timeout_s = self.CONTEXT_TIMEOUT_S ... - frame = await asyncio.wait_for(queue.get(), timeout=AUDIO_CONTEXT_TIMEOUT) + frame = await asyncio.wait_for(queue.get(), timeout=timeout_s)Add outside this hunk (class scope near line 720):
# class _AudioContextServiceMixin(ABC): CONTEXT_TIMEOUT_S: float = 3.0Please verify typical TTS stall gaps in your environments; adjust
CONTEXT_TIMEOUT_Sif you see tail truncation under load.
833-857: Align abstract method signatures with concrete implementations for better type-safety.
Broaden abstracts to match existingFrameProcessor/services API (name/timeout/direction), avoiding mypy/IDE warnings and future surprises.- def create_task(self, coroutine: Coroutine) -> asyncio.Task: + def create_task(self, coroutine: Coroutine, name: Optional[str] = None) -> asyncio.Task: @@ - async def cancel_task(self, task: asyncio.Task) -> None: + async def cancel_task(self, task: asyncio.Task, timeout: Optional[float] = None) -> None: @@ - async def push_frame(self, frame: Frame) -> None: + async def push_frame( + self, + frame: Frame, + direction: "FrameDirection" = FrameDirection.DOWNSTREAM, + ) -> None:
781-786: Optional: provide backpressure knobs for queues.
Unbounded queues can grow under bursty producers. Consider class attrs to setmaxsizefor_contexts_queueand per-context queues.Example (outside this hunk, class scope):
CONTEXTS_QUEUE_MAXSIZE: int = 0 # 0 = unbounded CONTEXT_QUEUE_MAXSIZE: int = 0And update initializations:
- self._contexts_queue = asyncio.Queue() + self._contexts_queue = asyncio.Queue(maxsize=self.CONTEXTS_QUEUE_MAXSIZE)And in create_audio_context (with the race fix applied):
- self._contexts[context_id] = asyncio.Queue() + self._contexts[context_id] = asyncio.Queue(maxsize=self.CONTEXT_QUEUE_MAXSIZE)
889-905: Stop path is clean; awaits graceful drain after sentinel.
LGTM. Consider mirroring the same approach in the word-timestamps variant for consistency.
961-968: Avoid deprecatedwait_for_task; await the task directly (match TTS variant).
Keeps behavior consistent and avoids deprecation warnings.- await self.wait_for_task(self._audio_context_task) + await self._audio_context_task
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
README.md(1 hunks)env.example(1 hunks)pyproject.toml(1 hunks)src/pipecat/services/tts_service.py(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- pyproject.toml
- env.example
- README.md
🧰 Additional context used
🧬 Code graph analysis (1)
src/pipecat/services/tts_service.py (5)
src/pipecat/transports/network/websocket_client.py (1)
task_manager(102-115)src/pipecat/processors/frame_processor.py (5)
task_manager(325-336)create_task(416-430)cancel_task(432-439)put(95-112)wait_for_task(441-466)src/pipecat/utils/asyncio/task_manager.py (5)
BaseTaskManager(33-92)create_task(58-70)create_task(141-175)cancel_task(73-83)cancel_task(177-203)src/pipecat/frames/frames.py (4)
TTSAudioRawFrame(261-267)Frame(112-143)StartFrame(717-741)EndFrame(1288-1298)src/pipecat/transports/base_output.py (8)
sample_rate(88-94)sample_rate(386-392)start(105-117)start(403-426)stop(119-126)stop(428-452)cancel(128-135)cancel(454-463)
🔇 Additional comments (4)
src/pipecat/services/tts_service.py (4)
10-11: Good: abstract base + Coroutine typing are appropriate for the new mixin.
No issues here; this enables clean protocol-style hooks.
40-41: Good: task-manager import aligns with the mixin’s abstracttask_managerproperty.
This keeps orchestration consistent with existing TaskManager-based infra.
374-386: Silence frame helper looks good.
Generates correct 16‑bit mono silence with the current sample rate.
823-829: Confirmreset_watchdogavailability and semantics.
Ensure concrete services implementreset_watchdog()to keep connections alive during context gaps/timeouts; otherwise the mixin will leave classes abstract or NOP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/pipecat/services/respeecher/tts.py (1)
166-173: Docstring refers to “Stop” and “end frame” incancel(); should be “Cancel” and “cancel frame”.Same point was raised earlier; aligning terminology improves API clarity.
Apply this diff:
- async def cancel(self, frame: CancelFrame): - """Stop the Respeecher TTS service. + async def cancel(self, frame: CancelFrame): + """Cancel the Respeecher TTS service. @@ - frame: The end frame. + frame: The cancel frame.
🧹 Nitpick comments (7)
src/pipecat/services/respeecher/tts.py (4)
219-226: Proactively clear audio context on interruption to avoid ~3s timeout stall.Currently you rely on the mixin timeout to finish the context. Remove it immediately after sending cancel to improve responsiveness.
Apply this diff:
- if self._context_id: - cancel_request = json.dumps({"context_id": self._context_id, "cancel": True}) - await self._get_websocket().send(cancel_request) - self._context_id = None + if self._context_id: + ctx = self._context_id + cancel_request = json.dumps({"context_id": ctx, "cancel": True}) + await self._get_websocket().send(cancel_request) + await self.remove_audio_context(ctx) + self._context_id = None
188-196: Add connection timeout to websocket connect.Prevents hangs on network issues.
Apply this diff:
- self._websocket = await websocket_connect( - f"{self._url}/{self._model_name}/tts/websocket?api_key={self._api_key}" - ) + self._websocket = await websocket_connect( + f"{self._url}/{self._model_name}/tts/websocket?api_key={self._api_key}", + open_timeout=10, + )
296-298: Broaden state check before (re)connecting.Reconnect unless the socket is OPEN.
Apply this diff:
- if not self._websocket or self._websocket.state is State.CLOSED: + if not self._websocket or self._websocket.state is not State.OPEN:
271-276: Emit TTSStoppedFrame when response.type == "done"
Add a stop frame after removing the audio context to align with other TTS services and enable downstream buffers to flush:if response.type == "done": await self.stop_ttfb_metrics() await self.remove_audio_context(response.context_id) + await self.push_frame(TTSStoppedFrame())src/pipecat/services/tts_service.py (3)
805-809: Also emit TTSStoppedFrame at the end of each audio context.This helps downstream components finalize packets/buffers per context.
Apply this diff:
- # Append some silence between contexts. - SILENCE_BETWEEN_CONTEXTS = 1 - silence_frame = self.silence_frame(SILENCE_BETWEEN_CONTEXTS) - await self.push_frame(silence_frame) + # Mark context end and add inter-context silence. + await self.push_frame(TTSStoppedFrame()) + SILENCE_BETWEEN_CONTEXTS = 1 + await self.push_frame(self.silence_frame(SILENCE_BETWEEN_CONTEXTS))
831-849: Abstract method signatures should match concrete implementations.Align with
FrameProcessor.create_task/cancel_task/push_frameto keep type checkers happy and avoid subtle MRO issues.Apply this diff:
- def create_task(self, coroutine: Coroutine) -> asyncio.Task: + def create_task(self, coroutine: Coroutine, name: Optional[str] = None) -> asyncio.Task: @@ - async def cancel_task(self, task: asyncio.Task) -> None: + async def cancel_task(self, task: asyncio.Task, timeout: Optional[float] = None) -> None: @@ - async def push_frame(self, frame: Frame) -> None: + async def push_frame(self, frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM) -> None:
816-817: Make timeouts/silence duration configurable constants.Expose
AUDIO_CONTEXT_TIMEOUTandSILENCE_BETWEEN_CONTEXTSas class attrs for tuning/testing.Also applies to: 805-809
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/pipecat/services/respeecher/tts.py(1 hunks)src/pipecat/services/tts_service.py(7 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-01T15:48:25.154Z
Learnt from: Kharacternyk
PR: respeecher/pipecat#1
File: src/pipecat/services/respeecher/tts.py:109-110
Timestamp: 2025-09-01T15:48:25.154Z
Learning: Pipecat requires Python 3.10+ as specified in pyproject.toml with "requires-python = ">=3.10"". This means newer Python syntax like `str | None` union types are appropriate and preferred over `Optional[str]`.
Applied to files:
src/pipecat/services/respeecher/tts.py
🧬 Code graph analysis (2)
src/pipecat/services/tts_service.py (3)
src/pipecat/processors/frame_processor.py (5)
task_manager(325-336)push_frame(606-616)create_task(416-430)cancel_task(432-439)FrameDirection(42-51)src/pipecat/transports/network/websocket_client.py (1)
task_manager(102-115)src/pipecat/utils/asyncio/task_manager.py (5)
BaseTaskManager(33-92)create_task(58-70)create_task(141-175)cancel_task(73-83)cancel_task(177-203)
src/pipecat/services/respeecher/tts.py (4)
src/pipecat/frames/frames.py (10)
CancelFrame(745-752)EndFrame(1288-1298)ErrorFrame(756-774)Frame(112-143)LLMFullResponseEndFrame(1384-1391)StartFrame(717-741)StartInterruptionFrame(860-869)TTSAudioRawFrame(261-267)TTSStartedFrame(1395-1405)TTSStoppedFrame(1409-1412)src/pipecat/processors/frame_processor.py (3)
FrameDirection(42-51)stop_all_metrics(411-414)push_error(596-604)src/pipecat/services/tts_service.py (32)
AudioContextTTSService(852-911)sample_rate(138-144)start(218-228)start(495-502)start(876-883)start(939-946)stop(230-239)stop(504-511)stop(885-897)stop(948-960)cancel(241-250)cancel(513-520)cancel(899-906)cancel(962-969)create_task(831-832)_report_error(601-603)_report_error(673-675)cancel_task(835-836)_handle_interruption(387-391)_handle_interruption(536-539)_handle_interruption(626-630)_handle_interruption(698-702)_handle_interruption(908-911)_handle_interruption(971-974)flush_audio(214-216)audio_context_available(769-778)push_frame(348-371)push_frame(839-840)remove_audio_context(754-767)append_to_audio_context(741-752)run_tts(181-192)create_audio_context(731-739)src/pipecat/utils/base_object.py (1)
_call_event_handler(121-141)
🔇 Additional comments (6)
src/pipecat/services/respeecher/tts.py (4)
108-109: Union typestr | Noneis correct for this codebase (Python 3.10+).No action needed.
178-180: receive_task_handler Exists in Base Class
Confirmed:respeecher/tts.py’s class inherits (directly or via MRO) fromwebsocket_service.WebsocketService, which definesasync def _receive_task_handler(self, report_error: Callable[[ErrorFrame], Awaitable[None]]): …so
_receive_task_handleris present at runtime. No action needed.
193-195: Verify header/subprotocol support
No public documentation indicates the streaming TTS websocket accepts API keys via an Authorization header or subprotocol—embedding keys in URLs can leak via logs or proxies. Please contact Respeecher support to confirm or request header-based/subprotocol authentication for the websocket.
284-284: Ignore signature change –Noneyields are intentional.
process_generator’s docstring and implementation expect the generator to yieldNoneas a handshake marker (it skips falsy values), so removingyield Noneand changing the return type would break the streaming flow. Leave the signature andyield Noneintact.Likely an incorrect or invalid review comment.
src/pipecat/services/tts_service.py (2)
355-359: Silence insertion API looks good.Centralizing silence creation via
silence_frame()improves reuse and readability.Also applies to: 373-386
948-961: Verify availability ofwait_for_taskin TTSService
Foundwait_for_taskdefined insrc/pipecat/processors/frame_processor.py:441; ensureTTSServiceactually inherits this helper before swapping to a rawawait self._audio_context_task.
| self._api_key = api_key | ||
| self._url = url | ||
| self._output_format: StreamingOutputFormatParams = { | ||
| "encoding": "pcm_s16le", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also accept "pcm_f32le", should we allow users to set it up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it not configurable because at a glance there is no floating-point sample support in Pipecat audio facilities. I should ask them in the future PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is actually better that they don't support it since it doubles the size of the audio, and it's hard to imagine how it could be useful.
| await super().stop(frame) | ||
| await self._disconnect() | ||
|
|
||
| async def cancel(self, frame: CancelFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between stop and cancel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No difference for us and other services that I've been skimming through. Everyone does the same in both handlers. Conceptually, from what I understand, a stop must leave the object in a state where another start is then possible, whereas there's no such requirement for cancel, so cancel could be quicker for some services
| True if the context exists and is available. | ||
| """ | ||
| return context_id in self._contexts | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this file, do not break anything for other integrations, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope so:) Of course that's my goal, but it's difficult to test properly. The main change is conceptually simple: decouple audio context management from word timestamp functionality. We would be the first Pipecat TTS service that has the former but not the latter. Decoupling's done mostly by moving code verbatim into a separate class, so there should be low probability of subtle bugs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't delved deep into the Pipecat, but it looks like they are calling methods like push_frame without passing any channel id. Doesn't that mean that this class does not support multiple independent channels, so contexts are not useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in DMs. TL;DR: in Pipecat contexts are useful for interruption handling rather than multiplexing per se
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (1)
src/pipecat/services/respeecher/tts.py (1)
166-173: Docstring is now correct for cancel.Thanks for addressing the earlier feedback; this now accurately describes the cancel semantics.
🧹 Nitpick comments (5)
src/pipecat/services/respeecher/tts.py (5)
188-196: Add WebSocket timeouts to prevent hangs.Configure reasonable
open_timeout/close_timeout(and optionally ping intervals) to avoid indefinite waits and improve recovery.- self._websocket = await websocket_connect( - f"{self._url}/{self._model_name}/tts/websocket?api_key={self._api_key}" - ) + self._websocket = await websocket_connect( + f"{self._url}/{self._model_name}/tts/websocket?api_key={self._api_key}", + open_timeout=10, + close_timeout=10, + )
43-45: Preserve exception type and traceback for missing optional deps.Raising a generic
Exceptiondiscards useful diagnostics. Re-raise asModuleNotFoundError(orImportError) and chainfrom e.- raise Exception(f"Missing module: {e}") + raise ModuleNotFoundError(f"Missing module: {e}") from e
108-110: Type hygiene nit: annotate_receive_taskand standardize Optional style.Since 3.10+ is required, consider
asyncio.Task | Noneand using| Noneconsistently, or keep as-is for local consistency.+from typing import AsyncGenerator, Optional +import asyncio @@ - self._receive_task = None + self._receive_task: asyncio.Task | None = None
219-226: Interruption cleanup: confirm server-ack cancel reliably removes the context.If the server drops the connection or loses the cancel, the context might linger. Consider a fallback removal after a short grace period if no
done/errorarrives.
248-283: Metrics ordering is idempotent, but consider guarding repeatedstop_ttfb_metrics().
stop_ttfb_metrics()is called on every chunk and on done. If it’s not idempotent, add a guard; otherwise ignore.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/pipecat/services/respeecher/tts.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-01T15:48:25.154Z
Learnt from: Kharacternyk
PR: respeecher/pipecat#1
File: src/pipecat/services/respeecher/tts.py:109-110
Timestamp: 2025-09-01T15:48:25.154Z
Learning: Pipecat requires Python 3.10+ as specified in pyproject.toml with "requires-python = ">=3.10"". This means newer Python syntax like `str | None` union types are appropriate and preferred over `Optional[str]`.
Applied to files:
src/pipecat/services/respeecher/tts.py
🧬 Code graph analysis (1)
src/pipecat/services/respeecher/tts.py (4)
src/pipecat/frames/frames.py (10)
CancelFrame(745-752)EndFrame(1288-1298)ErrorFrame(756-774)Frame(112-143)LLMFullResponseEndFrame(1384-1391)StartFrame(717-741)StartInterruptionFrame(860-869)TTSAudioRawFrame(261-267)TTSStartedFrame(1395-1405)TTSStoppedFrame(1409-1412)src/pipecat/processors/frame_processor.py (1)
FrameDirection(42-51)src/pipecat/services/tts_service.py (15)
sample_rate(138-144)start(218-228)start(495-502)start(876-883)start(939-946)stop(230-239)stop(504-511)stop(885-897)stop(948-960)flush_audio(214-216)audio_context_available(769-778)remove_audio_context(754-767)append_to_audio_context(741-752)run_tts(181-192)create_audio_context(731-739)src/pipecat/utils/tracing/service_decorators.py (1)
traced_tts(101-218)
mitrushchienkova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall
The only note: should we do something about the coma splitting issue (partially discussed in Slack)?
|
|
||
| # We store functions so objects (e.g. SileroVADAnalyzer) don't get | ||
| # instantiated. The function will be called when the desired transport gets | ||
| # selected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a little confusing because it looks like you do call SileroVADAnalyzer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's called in a lambda on-demand, not at the top-level. This comment is the same across examples with different TTSes, our custom code is just lines 55-65
|
|
||
| request: ContextfulGenerationRequestParams = { | ||
| "transcript": text, | ||
| "continue": continue_transcript, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we harmonize naming of this "continue" variable? You said that in inf we also use continue_ and do_continue, right? Also, is there a reason we need to call the text to convert sometimes text and sometimes transcript?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continue: the official piece of APIcontinue_: our auto-generated SDK sincecontinueis a keyworddo_continue: internal code for the same reason thatcontinueis a keyword, can be aligned withcontinue_
Variable naming (continue_transcript, text, etc.) in this file mostly follows other TTS integration files so that the probability of a delay in merging due to minor naming inconsistencies is minimized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this makes sense. I guess we can stick with the naming system used by other TTS integration files, especially if it will make it easier for people to switch to ours. And maybe switch do_continue in inf when you think of it to at least eliminate that variant.
| ) | ||
| await self.push_frame(frame) | ||
| # Append some silence between contexts. | ||
| SILENCE_BETWEEN_CONTEXTS = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on my comment above, why would someone switch context? (And does it actually make sense to add silence if they do?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There must be reasons for adding silence if the logic is there. It's not specific to our integration so I wouldn't concentrate much on this. I'm already doubtful if I should include this piece of refactoring. (It's a byproduct of an earlier version of the integration where I did need a way to disable this silence adding.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/pipecat/services/tts_service.py (1)
959-959: Consider using async task cancellation instead of wait_for_task.Line 959 uses the deprecated
wait_for_taskmethod. Based on the deprecation notice in the relevant code snippets, this should be replaced with direct task awaiting for consistency.Apply this diff to use the modern approach:
- await self.wait_for_task(self._audio_context_task) + await self._audio_context_task
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
README.md(1 hunks)env.example(1 hunks)pyproject.toml(1 hunks)src/pipecat/services/tts_service.py(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- README.md
- env.example
- pyproject.toml
🧰 Additional context used
🧬 Code graph analysis (1)
src/pipecat/services/tts_service.py (3)
src/pipecat/processors/frame_processor.py (5)
task_manager(351-362)create_task(442-456)cancel_task(458-465)put(96-113)wait_for_task(467-492)src/pipecat/utils/asyncio/task_manager.py (5)
BaseTaskManager(33-92)create_task(58-70)create_task(141-175)cancel_task(73-83)cancel_task(177-203)src/pipecat/frames/frames.py (2)
TTSAudioRawFrame(261-267)Frame(112-143)
🔇 Additional comments (7)
src/pipecat/services/tts_service.py (7)
10-11: LGTM! Imports align with the new mixin architecture.The added ABC imports and BaseTaskManager import are necessary for the new
_AudioContextServiceMixinabstract methods and task management requirements.Also applies to: 40-40
356-385: Excellent refactoring! Centralized silence frame generation.The
silence_framemethod provides a clean, reusable way to generate silence frames with proper audio parameters. This is a good abstraction that will be used by both the regular TTS service and the new mixin.
719-849: Well-designed mixin for audio context management.The
_AudioContextServiceMixinprovides a clean separation of concerns by centralizing audio context orchestration. The abstract methods create a proper contract that implementing classes must fulfill. The queue-based architecture ensures ordered processing of audio contexts.A few observations:
- The timeout handling (3 seconds) provides good resilience against hanging contexts
- The silence insertion between contexts (1 second) helps with audio clarity
- Proper lifecycle management with graceful shutdown
852-912: LGTM! Clean implementation of AudioContextTTSService.The class properly inherits from both
WebsocketTTSServiceand_AudioContextServiceMixin, with correct initialization order and proper lifecycle management. The interruption handling that restarts the audio context task is appropriate for maintaining clean state after interruptions.
914-974: LGTM! Consistent AudioContextWordTTSService implementation.The implementation mirrors
AudioContextTTSServicebut extends the word timestamp functionality. The pattern is consistent and maintains the same lifecycle management approach.
830-849: Abstract method implementations verified — no action required.FrameProcessor (src/pipecat/processors/frame_processor.py) provides create_task, cancel_task, push_frame and task_manager; TTSService (src/pipecat/services/tts_service.py) provides silence_frame; AudioContextTTSService and AudioContextWordTTSService inherit these via WebsocketTTSService/WebsocketWordTTSService so the mixin contract is satisfied.
805-808: Make inter-context silence configurable and document it — src/pipecat/services/tts_service.py:805–808Hardcoded 1s silence is likely noticeable; SSML guidance favors shorter defaults (100–250ms short pauses, 400–1000ms sentence/paragraph). Actions:
- Replace the hardcoded SILENCE_BETWEEN_CONTEXTS=1 with a configurable duration (ms) or emit SSML ; consider a default ~400–600ms.
- Document the insertion behavior in user-facing docs/README and how to override it.
- Run listening tests across target voices/devices and identify flows (short prompts, concatenated snippets, low-latency UX) where a 1s gap is problematic.
Adds support for using Ultravox Realtime as a speech-to-speech service. Also removes the deprecated Ultravox speech-to-text vllm model integration to avoid confusion.
Co-PR for the docs: respeecher/pipecat-docs#1
Branch with the quickstart modified to use Respeecher/Gemini instead of Cartesia/OpenAI (not to be upstreamed, just for testing): https://github.com/respeecher/pipecat/compare/respeecher...respeecher:pipecat:respeecher-quickstart?expand=1
Summary by CodeRabbit
New Features
Documentation
Configuration
✏️ Tip: You can customize this high-level summary in your review settings.