FrontPocket provides a front-end to Kyutai Labs Pocket TTS, including the ability to read text from the clipboard, from a text file and passed directly on the CLI. Features include the ability to pause, resume, move back and forward in the spoken text and change playback speed.
It is a low-latency, daemon-based text-to-speech system, developed and tested under Linux. FrontPocket loads the TTS model once at startup and streams audio sentence by sentence, so there is minimal delay between sending text and hearing it spoken.
Subsequent sentences are generated in advance and previous sentences are cached allowing instantaneous movement backwards and forwards a few sentences.
frontpocket_server.py is intended to be always running in the background as a systemd service. Of course, you can also run it from the CLI to use it in interactive mode. I'd suggest doing that and then running as a service once you're satisfied it is dialed-in. The server is controlled by the included lightweight CLI client.
You could set hotkeys to run the CLI client passing it parameters to play, pause, change speed, etc. i.e. Ctrl-Shift-S could trigger speaking of the clipboard text.
A Qt6-based toolbar frontpocket_toolbar.py provides a UI to drive the CLI client - play clipboard text, move forward and back and change voices and speed. Right click to see the speed and voice selections.
Compact Toolbar:
Right click to expand toolbar: 
Note: Developed/tested under Debian Linux. MacOS/Windows "should" work but hasn't been tested. Please test and provide a PR for any needed fixes.
Inspiration for this project comes from Kokorodoki which provides a similar featureset for Kokoro TTS. https://github.com/eel-brah/kokorodoki
Much thanks to the very smart people at Kyutai Labs for their beautiful model and helpful reference code. Their stuff is where the real magic happens. https://github.com/kyutai-labs/pocket-tts
- Low latency — model is pre-loaded; audio begins within seconds of sending text
- Chunk-ahead generation — the next several sentences are generated in the background while the current one plays
- Multiple voices — switch voices on the fly; built-in voices and custom
.safetensorsembeddings supported - Speed control — pitch-preserved speed adjustment via pyrubberband
- Pause / resume — resume from exactly where you paused, even after changing voice or speed
- Skip forward / back — move through sentences instantly; previously played sentences are cached
- Interrupt — inject an urgent TTS message mid-playback, then resume automatically
- Clipboard-first — default input is the system clipboard; also accepts inline text and text files
- systemd ready — runs as a proper system service with automatic restart on failure
- Multilingual — sentence segmentation supports English, German, French, Spanish, Italian, Russian, Polish, and more. Tested with English.
- UI Toolbar - Because we don't always want to be in the CLI.
FrontPocket has three components:
| Component | File | Role |
|---|---|---|
| Server | frontpocket_server.py |
Loads the model, listens on a TCP socket, plays audio |
| Client | frontpocket_client.py |
Sends text or commands to the server |
| Toolbar | frontpocket_toolbar.py |
Provides a UI to talk to the client |
The server and client communicate over a local TCP socket (default port 5562). The client is fire-and-forget — it sends a message and exits immediately.
See INSTALL.md for full setup instructions including systemd service configuration.
fp [text] [options]
| Command | Description |
|---|---|
fp |
Speak clipboard contents (default) |
fp "Some text" |
Speak inline text |
fp --file article.txt |
Speak contents of a text file |
| Command | Short | Description |
|---|---|---|
fp --pause |
Pause playback | |
fp --resume |
Resume from where you paused | |
fp --next |
Skip to next sentence | |
fp --back |
Go back one sentence |
Why no --stop? Just use pause and don't resume.
| Command | Description |
|---|---|
fp --voice masha |
Change voice (takes effect immediately) |
fp --speed 1.5 |
Change speed (0.5–3.0, takes effect immediately) |
| Command | Description |
|---|---|
fp --interruptwith "text" |
Pause, speak the text, resume |
fp --interruptwith alert.txt |
Same, but read text from a file |
fp --status |
Speak current voice, speed, and playback state |
| Option | Description |
|---|---|
--ping |
Check server is reachable (exit 0 = up, exit 1 = down) |
--list-voices |
Print all voices configured in frontpocket.ini |
--version |
Print FrontPocket version and exit |
--port PORT |
Connect to a non-default server port |
--host HOST |
Connect to a non-default server host |
--quiet |
Suppress all client output |
FrontPocket is configured via frontpocket.ini. The server looks for it next to frontpocket_server.py, then in the current working directory. When installed as a service, symlink /etc/FrontPocket/frontpocket.ini into /opt/FrontPocket/.
[settings]
default_voice = alba
default_speed = 1.0
port = 5562
lookahead_chunks = 5
language = en
log_level = INFO
interrupt_sound = /var/lib/FrontPocket/sounds/notification.wav
[voices]
alba = alba
mary = /var/lib/FrontPocket/voices/mary.safetensorsSee the fully commented frontpocket.ini for all available options.
FrontPocket uses pocket_tts for TTS.
Built-in voices are referenced by name in frontpocket.ini:
alba = albaCustom voices use .safetensors embedding files:
mary = /var/lib/FrontPocket/voices/mary.safetensorsHugging Face voices can be referenced directly:
expresso = hf://kyutai/tts-voices/expresso/ex01-ex02_default_001_channel2_198s.wavText sent to the server without a ! prefix is spoken. Commands are prefixed with !:
| Command | Alias | Description |
|---|---|---|
!pause |
!p |
Pause playback |
!resume |
!r |
Resume playback |
!next |
!n |
Skip to next chunk |
!back |
!b |
Go back one chunk |
!voice <name> |
Change voice | |
!speed <value> |
Change speed | |
!interruptwith <text> |
!i |
Interrupt with text, then resume |
!status |
Speak current voice, speed, and state | |
!ping |
No-op — used to verify server is reachable |
| Level | What you see |
|---|---|
ERROR |
Fatal errors only |
INFO |
Chunk generation, playback, commands received (default) |
DEBUG |
Everything above plus chunk text, socket messages, and per-chunk .txt files in debug_dir |
Set log_level in frontpocket.ini or override at launch:
python3 frontpocket_server.py --log-level DEBUGWhen running as a service, view logs with:
journalctl -u frontpocket -f| Package | Purpose |
|---|---|
pocket_tts |
TTS engine |
pysbd |
Sentence boundary detection |
pyrubberband |
Pitch-preserved speed adjustment |
sounddevice |
Audio playback |
numpy |
Audio array handling |
scipy |
WAV file reading (interrupt sound) |
rubberband-cli |
System package required by pyrubberband |
pyqt6 |
UI for the toolbar |
pyperclip |
Windows clipboard support (optional) |
MIT