GitHub - ai-anchorite/EchoStudio: [12GB VRAM] TTS app built around the EchoTTS model. TTS, Dub, and voice cloning. English output

Echo TTS Studio

Local text-to-speech with voice cloning, video dubbing, and voice editing.

What is this?

An enhanced, one-click installable studio built on the Echo-TTS project by Jordan Darefsky, with additional code (chunking, etc) borrowed from KevinAHM/echo-tts-api - enhanced for creators who want a quality TTS with voice cloning and video translation dubbing without the setup headaches.

Model: jordand/echo-tts-base | Blog: echo-tts blog post

🖥️ Requirements

VRAM: 12 GB minimum (NVIDIA GPU recommended)
Platform: Windows · Linux · macOS
Install: One click via Pinokio — handles Python, dependencies, and model downloads automatically

Getting Started

Install Pinokio if you haven't already
Click the install badge above, or search "EchoStudio" in the Pinokio app
Hit Install → Start → done

Features

TTS

Voice cloning from reference audio
Multi-speaker support (S1/S2 tagging)
Long-form generation with automatic text chunking and crossfade stitching
Sampler presets and full control over CFG guidance, sampling style, and KV scaling

Dub

Upload video, extract audio, and transcribe/translate with Whisper
Editable transcript with segment timing
Re-voice translated speech with TTS using cloned or saved voices
Preserve background audio — AI source separation mixes ambient/background with the new TTS voice
Multi-speaker dubbing with S1/S2 tags

Voices

Upload audio or video files as voice sources
Edit saved voices directly
Clip, trim silence, adjust speed, and normalize volume
Vocal isolation — separate clean vocals from noisy recordings (BS-Roformer, MDX-Net via audio-separator)
Background isolation for extracting ambience/music
Save edited voices as named profiles with cached speaker latents

Settings

Theme selection, memory management, custom output directory, temp file cleanup

Tips

Generation Length

Echo generates up to 30 seconds of audio per chunk. Longer text is automatically split and stitched with configurable silence gaps and crossfade. Shorter text produces shorter outputs naturally.

Reference Audio

Up to 5 minutes of reference audio is supported, but shorter clips (10 seconds or less) work well too. Use the Voices tab to clip, clean, and isolate vocals from noisy recordings.

Force Speaker (KV Scaling)

If the model generates a different speaker than expected, enable "Force Speaker" (default scale 1.5). Aim for the lowest scale that produces the correct speaker.

Text Prompt Format

Use [S1] and [S2] for speaker tags. Expression markers like (laughs), (angry), (whispering) control tone. Commas function as pauses.

Responsible Use

Don't use this model to impersonate real people without consent or generate deceptive audio. You are responsible for complying with local laws regarding biometric data and voice cloning.

License

Code in this repo is MIT-licensed except where file headers specify otherwise (e.g., autoencoder.py is Apache-2.0).

Audio outputs are CC-BY-NC-SA-4.0 due to the dependency on the Fish Speech S1-DAC autoencoder. Echo-TTS weights are released under CC-BY-NC-SA-4.0.

Citation

@misc{darefsky2025echo,
    author = {Darefsky, Jordan},
    title = {Echo-TTS},
    year = {2025},
    url = {https://jordandarefsky.com/blog/2025/echo/}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
app		app
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
icon.png		icon.png
install.js		install.js
link.js		link.js
pinokio.js		pinokio.js
pinokio.json		pinokio.json
reset.js		reset.js
start.js		start.js
torch.js		torch.js
update.js		update.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Echo TTS Studio

What is this?

🖥️ Requirements

Getting Started

Features

TTS

Dub

Voices

Settings

Tips

Generation Length

Reference Audio

Force Speaker (KV Scaling)

Text Prompt Format

Responsible Use

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Echo TTS Studio

What is this?

🖥️ Requirements

Getting Started

Features

TTS

Dub

Voices

Settings

Tips

Generation Length

Reference Audio

Force Speaker (KV Scaling)

Text Prompt Format

Responsible Use

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages