Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@
"server/services/tts/openai",
"server/services/tts/piper",
"server/services/tts/playht",
"server/services/tts/respeecher",
"server/services/tts/rime",
"server/services/tts/sarvam",
"server/services/tts/xtts"
Expand Down
1 change: 1 addition & 0 deletions server/services/supported-services.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ Text-to-Speech services receive text input and output audio streams or chunks.
| [OpenAI](/server/services/tts/openai) | `pip install "pipecat-ai[openai]"` |
| [Piper](/server/services/tts/piper) | No dependencies required |
| [PlayHT](/server/services/tts/playht) | `pip install "pipecat-ai[playht]"` |
| [Respeecher](/server/services/tts/respeecher) | `pip install "pipecat-ai[respeecher]"` |
| [Rime](/server/services/tts/rime) | `pip install "pipecat-ai[rime]"` |
| [Sarvam](/server/services/tts/sarvam) | No dependencies required |
| [XTTS](/server/services/tts/xtts) | `pip install "pipecat-ai[xtts]"` |
Expand Down
132 changes: 132 additions & 0 deletions server/services/tts/respeecher.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
---
title: "Respeecher"
description: "Text-to-speech service using the Respeecher Space WebSocket API"
---

## Overview

The Respeecher Space API provides high-quality streaming text-to-speech synthesis with low latency.

<CardGroup cols={3}>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cards are not formatting properly when you view the page in Github and links are not working
Знімок екрана 2025-09-06 о 4 05 36 пп

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It renders fine in the docs preview (pnpx mint dev). The first and the third link are broken since the PRs haven't been merged upstream yet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but why would the rendering be broken in Github? It's not broken for other services

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not broken for other services

I don't know the reason, but my gut feeling is that it's nothing important. If we indeed have some syntax errors, maybe the upstream reviewers will point them out

<Card
title="API Reference"
icon="code"
href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.respeecher.tts.html"
>
Complete API documentation and method details
</Card>
<Card
title="Respeecher Docs"
icon="book"
href="https://space.respeecher.com/docs"
>
Official Respeecher Space documentation
</Card>
<Card
title="Example Code"
icon="play"
href="https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/07ad-interruptible-respeecher.py"
>
Working example with interruption handling
</Card>
</CardGroup>

## Installation

To use Respeecher services, install the required dependencies:

```bash
pip install "pipecat-ai[respeecher]"
```

You'll also need to set up your Respeecher API key as an environment variable: `RESPEECHER_API_KEY`.

<Tip>
Get your API key by signing up at
[Respeecher Space](https://space.respeecher.com/).
</Tip>

## Frames

### Input

- `TextFrame` - Text to synthesize into speech, subject to optional aggregation
- `TTSSpeakFrame` - Text that should be spoken immediately
- `TTSUpdateSettingsFrame` - Runtime configuration updates (e.g., voice, sampling params)
- `LLMFullResponseStartFrame` / `LLMFullResponseEndFrame` - LLM response boundaries

### Output

- `TTSStartedFrame` - Signals start of synthesis
- `TTSAudioRawFrame` - Generated audio data chunks
- `TTSStoppedFrame` - Signals completion of synthesis
- `ErrorFrame` - Connection or processing errors

## Language Support

Refer to the Respeecher Space [Docs](https://space.respeecher.com/docs)
for language support in different models.

## Supported Sample Rates

All common sample rates are supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 has a section on sample rates / output formats before the usage example. Should we have one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think their section highlights that the integration can infer the right Eleven API's output format enum string value from the sample rate integer value set by the pipeline. Since our API just accepts the sample rate integer value directly, maybe we don't need a similar section

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, but still someone could carelessly read our docs and wonder what sample rates we support or if you can choose that. So maybe better to make it explicit and easy for people to understand that this is not a disadvantage of us vs Eleven

## Usage Example

Initialize the WebSocket service with your API key and desired voice:

```python
from pipecat.services.respeecher.tts import RespeecherTTSService
import os

# Configure WebSocket service
tts = RespeecherTTSService(
api_key=os.getenv("RESPEECHER_API_KEY"),
voice_id="samantha",
params=RespeecherTTSService.InputParams(
sampling_params={
# Optional sampling params overrides
# See https://space.respeecher.com/docs/api/tts/sampling-params-guide
# "temperature": 0.5
},
),
)

# Use in pipeline
pipeline = Pipeline([
transport.input(),
stt,
llm,
tts,
transport.output()
])
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 puts aggregators in their pipeline. We don't need them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cartesia doesn't. It's up to us, it's just an example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand if users should put in aggregators or not.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I we want to properly read sequence of numbers with digits e.g. "1, 2, 3, 4, ..." we currently need an aggregator (otherwise our API treats "," as a deliminator for thousands). But for most cases even without aggregator the conversion is fine.


### Dynamic Configuration

Make settings updates by pushing a `TTSUpdateSettingsFrame` for the `RespeecherTTSService`:

```python
from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(
TTSUpdateSettingsFrame(settings={"voice": "your-new-voice-id", "sampling_params": {"temperature": 0.5}})
)
```

## Metrics

This service provides:

- **Time to First Byte (TTFB)** - Latency from text input to first audio
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be neat to measure TTFB for Respeecher and various competitors

- **Processing Duration** - Total synthesis time
- **Usage Metrics** - Character count and synthesis statistics

<Info>
[Learn how to enable Metrics](/guides/fundamentals/metrics) in your Pipeline.
</Info>

## Additional Notes

- **Connection Management**: WebSocket lifecycle is handled automatically with reconnection support
- **Sample Rate**: Set globally in `PipelineParams` rather than per-service for consistency