-
Notifications
You must be signed in to change notification settings - Fork 0
Add Respeecher TTS to the docs #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds documentation for a new Text-to-Speech service, Respeecher: navigation entry in Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client as Dev App
participant Server as Local Server
participant Respeecher as Respeecher WS
Note over Client,Server: Client requests TTS (text + params)
Client->>Server: Start TTS task (RespeecherTTSService)
Server->>Respeecher: Open WebSocket / Authenticate
Server->>Respeecher: Send initialization / TTSUpdateSettingsFrame
Server->>Respeecher: Send TTSSpeakFrame (text frames)
Respeecher-->>Server: TTSStartedFrame / TTSAudioRawFrame chunks
Server-->>Client: Stream audio chunks
alt Interrupt
Client->>Server: Cancel / interrupt
Server->>Respeecher: Send TTSStop / close WS
end
Respeecher-->>Server: TTSStoppedFrame / metrics
Server-->>Client: Finalize stream & deliver metrics
Note over Server: Pipeline may enforce global sample rate
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing touches🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
server/services/tts/respeecher.mdx (4)
42-47: Add explicit env var example.Showing the export helps reduce setup friction.
You'll also need to set up your Respeecher API key as an environment variable: `RESPEECHER_API_KEY`. <Tip> - Get your API key by signing up at - [Respeecher Space](https://space.respeecher.com/). + Get your API key by signing up at [Respeecher Space](https://space.respeecher.com/). + ```bash + export RESPEECHER_API_KEY="your_api_key_here" + ``` </Tip>
53-57: Clarify the primary input frame.State which frame is preferred to avoid ambiguity between TextFrame and TTSSpeakFrame.
- - `TextFrame` - Text content to synthesize into speech - - `TTSSpeakFrame` - Text that the TTS service should speak + - `TTSSpeakFrame` (recommended) — text the TTS service should speak + - `TextFrame` — supported for compatibility; Pipecat will wrap/route as needed
74-98: Tighten example: import Pipeline and use placeholder voice_id.Avoid implying a specific voice exists; also import Pipeline explicitly.
-from pipecat.services.respeecher.tts import RespeecherTTSService +from pipecat.services.respeecher.tts import RespeecherTTSService +from pipecat.pipeline import Pipeline import os # Configure WebSocket service tts = RespeecherTTSService( api_key=os.getenv("RESPEECHER_API_KEY"), - voice_id="samantha", + voice_id="your-voice-id", params=RespeecherTTSService.InputParams( sampling_params={ # Optional sampling params overrides # See https://space.respeecher.com/docs/api/tts/sampling-params-guide # "temperature": 0.5 }, ), ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, tts, transport.output() ])
126-127: Qualify reconnection behavior.If reconnection/backoff is handled by a shared transport layer, link it and note any tunables (max retries, backoff).
-- **Connection Management**: WebSocket lifecycle is handled automatically with reconnection support +- **Connection Management**: WebSocket lifecycle is handled automatically with reconnection support (see [Pipeline heartbeats](/server/pipeline/heartbeats)); configure retries/backoff via your transport or service settings if needed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
docs.json(1 hunks)server/services/supported-services.mdx(1 hunks)server/services/tts/respeecher.mdx(1 hunks)
🧰 Additional context used
🪛 LanguageTool
server/services/tts/respeecher.mdx
[grammar] ~116-~116: There might be a mistake here.
Context: ...- Latency from text input to first audio - Processing Duration - Total synthesis ...
(QB_NEW_EN)
[grammar] ~117-~117: There might be a mistake here.
Context: ...essing Duration** - Total synthesis time - Usage Metrics - Character count and sy...
(QB_NEW_EN)
🔇 Additional comments (6)
server/services/supported-services.mdx (1)
102-102: Add: verify pip extra exists before publishing.Install hint uses extras: pipecat-ai[respeecher]. Please confirm the extras_require key "respeecher" is published in the pipecat-ai package to avoid install errors.
docs.json (1)
184-184: LGTM: nav entry added in the right spot.Path matches server/services/tts/respeecher and ordering between playht and rime is consistent.
server/services/tts/respeecher.mdx (4)
14-17: Verify API Reference URL.Check that the reference-server link resolves to the Respeecher TTS service page after deploy.
28-31: Verify example link exists.Ensure examples/foundational/07ad-interruptible-respeecher.py is present on main, or update the path.
67-68: Confirm language claim.“Currently only English is supported for public models.” Please verify this against the latest Respeecher catalog.
103-110: Retain the “voice” key in Respeecher settings. TTSUpdateSettingsFrame’s handler accepts both “voice” and “voice_id,” and RespeecherTTSService expects the “voice” field in its update payload.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
server/services/tts/respeecher.mdx (1)
114-118: Tighten metrics wording and fix list punctuation (em dashes, consistent casing).Matches house style and resolves the grammar lint hints.
-This service provides: - -- **Time to First Byte (TTFB)** - Latency from text input to first audio -- **Processing Duration** - Total synthesis time -- **Usage Metrics** - Character count and synthesis statistics +This service provides: + +- **Time to first byte (TTFB)** — latency from text input to first audio +- **Processing duration** — total synthesis time +- **Usage metrics** — character count and synthesis statistics
🧹 Nitpick comments (1)
server/services/tts/respeecher.mdx (1)
53-57: Clarify when to use each input frame.A short note reduces confusion between TextFrame vs TTSSpeakFrame and whether LLM frames are required.
### Input - `TextFrame` - Text content to synthesize into speech - `TTSSpeakFrame` - Text that the TTS service should speak - `TTSUpdateSettingsFrame` - Runtime configuration updates (e.g., voice) - `LLMFullResponseStartFrame` / `LLMFullResponseEndFrame` - LLM response boundaries + +<Info> + Use <code>TTSSpeakFrame</code> to speak explicit text directly. <code>TextFrame</code> is typically produced upstream (e.g., by an LLM). The + <code>LLMFullResponseStart/End</code> frames are optional and only needed if you gate TTS on complete LLM responses. +</Info>
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
server/services/tts/respeecher.mdx(1 hunks)
🧰 Additional context used
🪛 LanguageTool
server/services/tts/respeecher.mdx
[grammar] ~116-~116: There might be a mistake here.
Context: ...- Latency from text input to first audio - Processing Duration - Total synthesis ...
(QB_NEW_EN)
[grammar] ~117-~117: There might be a mistake here.
Context: ...essing Duration** - Total synthesis time - Usage Metrics - Character count and sy...
(QB_NEW_EN)
🔇 Additional comments (1)
server/services/tts/respeecher.mdx (1)
1-128: Solid, clear doc with actionable examples.Structure, links, and install guidance look good. After addressing the minor fixes above, this will be ready to land.
server/services/tts/respeecher.mdx
Outdated
| <Card | ||
| title="Respeecher Docs" | ||
| icon="book" | ||
| href="https://space.respeecher.com/docs/quickstart" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just https://space.respeecher.com/docs (even though it doesn't really matter since both URLs get you to the same content)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think we could improve the titles and descriptions to make it crystal clear that there are just two docs links, one for something to do with pipecat (actually I'm not sure what is there) and then our API docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the link. Regarding titles and descriptions, they are standard across all the services, I don't think we should change them
|
|
||
| Respeecher Space API provides high-quality streaming text-to-speech synthesis with low latency. | ||
|
|
||
| <CardGroup cols={3}> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It renders fine in the docs preview (pnpx mint dev). The first and the third link are broken since the PRs haven't been merged upstream yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, but why would the rendering be broken in Github? It's not broken for other services
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not broken for other services
I don't know the reason, but my gut feeling is that it's nothing important. If we indeed have some syntax errors, maybe the upstream reviewers will point them out
server/services/tts/respeecher.mdx
Outdated
|
|
||
| ## Language Support | ||
|
|
||
| Currently only English is supported for public models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is language specific about pipecat anyway? I assume that we will have other languages soon -- would we need to make another PR to pipecat every time we add a language? Also, given that we do have non-public models already for other languages, is there any difficulty using them with pipecat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is language specific about pipecat anyway? I assume that we will have other languages soon -- would we need to make another PR to pipecat every time we add a language?
It depends on whether our multilingual models are going to need a language embedding. If yes, we would need to add some integration code that converts a Pipecat language ID into a language ID of our API. If no, everything should work as is.
Maybe we could adjust wording here to something like: Refer to [a link to our docs] for language support in different models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense. Not just say that we only support English, but also not add any language embedding support unless/until we need it
| ## Language Support | ||
|
|
||
| Currently only English is supported for public models. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
11 has a section on sample rates / output formats before the usage example. Should we have one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think their section highlights that the integration can infer the right Eleven API's output format enum string value from the sample rate integer value set by the pipeline. Since our API just accepts the sample rate integer value directly, maybe we don't need a similar section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, but still someone could carelessly read our docs and wonder what sample rates we support or if you can choose that. So maybe better to make it explicit and easy for people to understand that this is not a disadvantage of us vs Eleven
| tts, | ||
| transport.output() | ||
| ]) | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
11 puts aggregators in their pipeline. We don't need them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cartesia doesn't. It's up to us, it's just an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand if users should put in aggregators or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I we want to properly read sequence of numbers with digits e.g. "1, 2, 3, 4, ..." we currently need an aggregator (otherwise our API treats "," as a deliminator for thousands). But for most cases even without aggregator the conversion is fine.
|
|
||
| This service provides: | ||
|
|
||
| - **Time to First Byte (TTFB)** - Latency from text input to first audio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be neat to measure TTFB for Respeecher and various competitors
mitrushchienkova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It would be cool to mention Respeecher in the Websocket (Recommended) section here. But I'm not sure Pipecat team will approve it.
- Why do not we add support for HTTP-based TTS, like other providers?
Overall, LGTM
|

Summary by CodeRabbit