Skip to content

Conversation

@Kharacternyk
Copy link
Member

@Kharacternyk Kharacternyk commented Sep 1, 2025

Summary by CodeRabbit

  • Documentation
    • Added a Respeecher Text-to-Speech integration page with overview, installation (pip and API key), usage examples, dynamic configuration, metrics, and operational notes for websocket-based TTS.
    • Updated Server APIs navigation to include Respeecher under Text-to-Speech.
    • Updated Supported Services list to include Respeecher with link and installation guidance.

@coderabbitai
Copy link

coderabbitai bot commented Sep 1, 2025

Walkthrough

Adds documentation for a new Text-to-Speech service, Respeecher: navigation entry in docs.json, supported-services listing update, and a new detailed doc page server/services/tts/respeecher.mdx. No runtime code or public API changes.

Changes

Cohort / File(s) Summary
Docs navigation update
docs.json
Inserts server/services/tts/respeecher into the Server APIs Text-to-Speech navigation between playht and rime.
Supported services index
server/services/supported-services.mdx
Adds a Respeecher entry (link + install instruction) to the Text-to-Speech section (positioned after Piper, before Rime).
Respeecher TTS documentation
server/services/tts/respeecher.mdx
New documentation page with front matter, overview, installation (pip install "pipecat-ai[respeecher]" + RESPEECHER_API_KEY), WebSocket frame definitions (input/output frame types), language support, Python usage example (RespeecherTTSService + Pipeline and InputParams), dynamic configuration via TTSUpdateSettingsFrame, metrics notes, and operational/connection guidance.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client as Dev App
  participant Server as Local Server
  participant Respeecher as Respeecher WS

  Note over Client,Server: Client requests TTS (text + params)
  Client->>Server: Start TTS task (RespeecherTTSService)
  Server->>Respeecher: Open WebSocket / Authenticate
  Server->>Respeecher: Send initialization / TTSUpdateSettingsFrame
  Server->>Respeecher: Send TTSSpeakFrame (text frames)
  Respeecher-->>Server: TTSStartedFrame / TTSAudioRawFrame chunks
  Server-->>Client: Stream audio chunks
  alt Interrupt
    Client->>Server: Cancel / interrupt
    Server->>Respeecher: Send TTSStop / close WS
  end
  Respeecher-->>Server: TTSStoppedFrame / metrics
  Server-->>Client: Finalize stream & deliver metrics
  Note over Server: Pipeline may enforce global sample rate
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Add Respeecher TTS to the docs" is a concise, single-sentence summary that accurately reflects the main change in the changeset (adding Respeecher TTS documentation and navigation/support entries); it is specific, relevant, and clear for reviewers scanning project history.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Poem

A rabbit taps keys with a sprightly cheer,
“Respeecher has hopped into TTS here!”
Docs neatly stitched, frames ready to speak,
Voices on queues, audio to seek.
I twitch my nose—another service clear. 🐇🎙️

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 83d616e and a4098c7.

📒 Files selected for processing (1)
  • server/services/tts/respeecher.mdx (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/services/tts/respeecher.mdx
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch respeecher

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
server/services/tts/respeecher.mdx (4)

42-47: Add explicit env var example.

Showing the export helps reduce setup friction.

 You'll also need to set up your Respeecher API key as an environment variable: `RESPEECHER_API_KEY`.
 
 <Tip>
-  Get your API key by signing up at
-  [Respeecher Space](https://space.respeecher.com/).
+  Get your API key by signing up at [Respeecher Space](https://space.respeecher.com/).
+  ```bash
+  export RESPEECHER_API_KEY="your_api_key_here"
+  ```
 </Tip>

53-57: Clarify the primary input frame.

State which frame is preferred to avoid ambiguity between TextFrame and TTSSpeakFrame.

- - `TextFrame` - Text content to synthesize into speech
- - `TTSSpeakFrame` - Text that the TTS service should speak
+ - `TTSSpeakFrame` (recommended) — text the TTS service should speak
+ - `TextFrame` — supported for compatibility; Pipecat will wrap/route as needed

74-98: Tighten example: import Pipeline and use placeholder voice_id.

Avoid implying a specific voice exists; also import Pipeline explicitly.

-from pipecat.services.respeecher.tts import RespeecherTTSService
+from pipecat.services.respeecher.tts import RespeecherTTSService
+from pipecat.pipeline import Pipeline
 import os
 
 # Configure WebSocket service
 tts = RespeecherTTSService(
     api_key=os.getenv("RESPEECHER_API_KEY"),
-    voice_id="samantha",
+    voice_id="your-voice-id",
     params=RespeecherTTSService.InputParams(
         sampling_params={
             # Optional sampling params overrides
             # See https://space.respeecher.com/docs/api/tts/sampling-params-guide
             # "temperature": 0.5
         },
     ),
 )
 
 # Use in pipeline
 pipeline = Pipeline([
     transport.input(),
     stt,
     llm,
     tts,
     transport.output()
 ])

126-127: Qualify reconnection behavior.

If reconnection/backoff is handled by a shared transport layer, link it and note any tunables (max retries, backoff).

-- **Connection Management**: WebSocket lifecycle is handled automatically with reconnection support
+- **Connection Management**: WebSocket lifecycle is handled automatically with reconnection support (see [Pipeline heartbeats](/server/pipeline/heartbeats)); configure retries/backoff via your transport or service settings if needed.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0d796c2 and 3a0f535.

📒 Files selected for processing (3)
  • docs.json (1 hunks)
  • server/services/supported-services.mdx (1 hunks)
  • server/services/tts/respeecher.mdx (1 hunks)
🧰 Additional context used
🪛 LanguageTool
server/services/tts/respeecher.mdx

[grammar] ~116-~116: There might be a mistake here.
Context: ...- Latency from text input to first audio - Processing Duration - Total synthesis ...

(QB_NEW_EN)


[grammar] ~117-~117: There might be a mistake here.
Context: ...essing Duration** - Total synthesis time - Usage Metrics - Character count and sy...

(QB_NEW_EN)

🔇 Additional comments (6)
server/services/supported-services.mdx (1)

102-102: Add: verify pip extra exists before publishing.

Install hint uses extras: pipecat-ai[respeecher]. Please confirm the extras_require key "respeecher" is published in the pipecat-ai package to avoid install errors.

docs.json (1)

184-184: LGTM: nav entry added in the right spot.

Path matches server/services/tts/respeecher and ordering between playht and rime is consistent.

server/services/tts/respeecher.mdx (4)

14-17: Verify API Reference URL.

Check that the reference-server link resolves to the Respeecher TTS service page after deploy.


28-31: Verify example link exists.

Ensure examples/foundational/07ad-interruptible-respeecher.py is present on main, or update the path.


67-68: Confirm language claim.

“Currently only English is supported for public models.” Please verify this against the latest Respeecher catalog.


103-110: Retain the “voice” key in Respeecher settings. TTSUpdateSettingsFrame’s handler accepts both “voice” and “voice_id,” and RespeecherTTSService expects the “voice” field in its update payload.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
server/services/tts/respeecher.mdx (1)

114-118: Tighten metrics wording and fix list punctuation (em dashes, consistent casing).

Matches house style and resolves the grammar lint hints.

-This service provides:
-
-- **Time to First Byte (TTFB)** - Latency from text input to first audio
-- **Processing Duration** - Total synthesis time
-- **Usage Metrics** - Character count and synthesis statistics
+This service provides:
+
+- **Time to first byte (TTFB)** — latency from text input to first audio
+- **Processing duration** — total synthesis time
+- **Usage metrics** — character count and synthesis statistics
🧹 Nitpick comments (1)
server/services/tts/respeecher.mdx (1)

53-57: Clarify when to use each input frame.

A short note reduces confusion between TextFrame vs TTSSpeakFrame and whether LLM frames are required.

 ### Input
 
 - `TextFrame` - Text content to synthesize into speech
 - `TTSSpeakFrame` - Text that the TTS service should speak
 - `TTSUpdateSettingsFrame` - Runtime configuration updates (e.g., voice)
 - `LLMFullResponseStartFrame` / `LLMFullResponseEndFrame` - LLM response boundaries
+
+<Info>
+  Use <code>TTSSpeakFrame</code> to speak explicit text directly. <code>TextFrame</code> is typically produced upstream (e.g., by an LLM). The
+  <code>LLMFullResponseStart/End</code> frames are optional and only needed if you gate TTS on complete LLM responses.
+</Info>
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3a0f535 and 2fa39e8.

📒 Files selected for processing (1)
  • server/services/tts/respeecher.mdx (1 hunks)
🧰 Additional context used
🪛 LanguageTool
server/services/tts/respeecher.mdx

[grammar] ~116-~116: There might be a mistake here.
Context: ...- Latency from text input to first audio - Processing Duration - Total synthesis ...

(QB_NEW_EN)


[grammar] ~117-~117: There might be a mistake here.
Context: ...essing Duration** - Total synthesis time - Usage Metrics - Character count and sy...

(QB_NEW_EN)

🔇 Additional comments (1)
server/services/tts/respeecher.mdx (1)

1-128: Solid, clear doc with actionable examples.

Structure, links, and install guidance look good. After addressing the minor fixes above, this will be ready to land.

<Card
title="Respeecher Docs"
icon="book"
href="https://space.respeecher.com/docs/quickstart"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just https://space.respeecher.com/docs (even though it doesn't really matter since both URLs get you to the same content)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we could improve the titles and descriptions to make it crystal clear that there are just two docs links, one for something to do with pipecat (actually I'm not sure what is there) and then our API docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the link. Regarding titles and descriptions, they are standard across all the services, I don't think we should change them


Respeecher Space API provides high-quality streaming text-to-speech synthesis with low latency.

<CardGroup cols={3}>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cards are not formatting properly when you view the page in Github and links are not working
Знімок екрана 2025-09-06 о 4 05 36 пп

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It renders fine in the docs preview (pnpx mint dev). The first and the third link are broken since the PRs haven't been merged upstream yet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but why would the rendering be broken in Github? It's not broken for other services

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not broken for other services

I don't know the reason, but my gut feeling is that it's nothing important. If we indeed have some syntax errors, maybe the upstream reviewers will point them out


## Language Support

Currently only English is supported for public models.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is language specific about pipecat anyway? I assume that we will have other languages soon -- would we need to make another PR to pipecat every time we add a language? Also, given that we do have non-public models already for other languages, is there any difficulty using them with pipecat?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is language specific about pipecat anyway? I assume that we will have other languages soon -- would we need to make another PR to pipecat every time we add a language?

It depends on whether our multilingual models are going to need a language embedding. If yes, we would need to add some integration code that converts a Pipecat language ID into a language ID of our API. If no, everything should work as is.

Maybe we could adjust wording here to something like: Refer to [a link to our docs] for language support in different models.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense. Not just say that we only support English, but also not add any language embedding support unless/until we need it

## Language Support

Currently only English is supported for public models.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 has a section on sample rates / output formats before the usage example. Should we have one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think their section highlights that the integration can infer the right Eleven API's output format enum string value from the sample rate integer value set by the pipeline. Since our API just accepts the sample rate integer value directly, maybe we don't need a similar section

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, but still someone could carelessly read our docs and wonder what sample rates we support or if you can choose that. So maybe better to make it explicit and easy for people to understand that this is not a disadvantage of us vs Eleven

tts,
transport.output()
])
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 puts aggregators in their pipeline. We don't need them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cartesia doesn't. It's up to us, it's just an example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand if users should put in aggregators or not.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I we want to properly read sequence of numbers with digits e.g. "1, 2, 3, 4, ..." we currently need an aggregator (otherwise our API treats "," as a deliminator for thousands). But for most cases even without aggregator the conversion is fine.


This service provides:

- **Time to First Byte (TTFB)** - Latency from text input to first audio
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be neat to measure TTFB for Respeecher and various competitors

Copy link

@mitrushchienkova mitrushchienkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It would be cool to mention Respeecher in the Websocket (Recommended) section here. But I'm not sure Pipecat team will approve it.
  2. Why do not we add support for HTTP-based TTS, like other providers?

Overall, LGTM

@Kharacternyk
Copy link
Member Author

  1. I don't think we can recommend ourselves:)
  2. I don't really understand in what scenarios an HTTP version is useful, I'll ask upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants