This repo contains an OpenAI-compatible server for Coqui-TTS, adapted from
TTS/server/server.py.
Watch the video showcase and tutorial on YouTube
This server allows you to use XTTS2 local TTS models as a drop-in replacement for OpenAI TTS models.
The primary use case is integration with WingmanAI (wingman-ai.com), offering:
- Local voice cloning
- Additional TTS options
- No reliance on paid services like ElevenLabs
--lowvrammode: moves TTS model to CPU when idle (saves ~1.5GB VRAM with XTTS2)- Ensures correct language segmenter is used for splitting long text
- (Planned) Support for pre-made XTTS2 latents in generation
You have three installation options:
- ✅ Premade
.exefor Windows (Experimental) - 🛠️ Use custom server from this repo with Python
- ⚙️ Use original
idiap/coqui-ai-TTSserver with Python
Pros:
- No Python/coding knowledge needed
- Mostly pre-packaged
- Quickest setup
Cons:
- Antivirus may flag the
.exe - Minimal testing
- No auto-updates
- Windows only
- Trust required for the download
Installation Steps:
- Download ZIP (~5GB) (mega.nz download) or here (mediafire download)
- Unzip anywhere (avoid OneDrive-controlled folders)
- If warned, click “Keep Anyway”
- Double-click
run_server.bat - Allow network access when prompted
- Follow menu to select language and GPU/CPU
You’re now running! 🎉 Proceed to WingmanAI Configuration.
Pros:
- Custom WingmanAI features (e.g., lowvram)
- Open source
- Cross-platform support
Cons:
- Not automatically synced with base repo
- Requires more steps
- Trust needed (or read the code)
Installation Steps:
-
Unzip (avoid OneDrive folders)
-
Install
pyenv-win -
Open terminal in the unzipped folder
-
Run:
pyenv install 3.11.7 pyenv local 3.11.7 python -m venv venv .\venv\Scripts\activate pip install torch torchaudio (if using cpu) or pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 (if using Nvidia gpu) pip install -r requirements.txt
-
Download XTTS2 model files from: huggingface.co/coqui/XTTS-v2 into the
xtts_modelfolder
To Run the Server:
- Open the project folder
- Double-click
run_server_with_python.bat - Follow prompts to choose language and GPU/CPU
You’re now running! 🎉 Proceed to WingmanAI Configuration.
Pros:
- Trusted, long-standing repo
- Open source
- Automatic updates
- Works on all OS
Cons:
- No
lowvrammode (uses ~3-4GB VRAM idle on GPU) - No support for pre-made latents
Installation Steps:
-
Create a folder (e.g.
Coqui-TTS-Server) -
Install
pyenv-win -
Open terminal in that folder
-
Run:
pyenv install 3.11.7 pyenv local 3.11.7 python -m venv venv .\venv\Scripts\activate
-
(Optional for NVIDIA GPU):
pip install --pre torch torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
-
Then:
pip install coqui-tts[server,languages]
To Run the Server:
.\venv\Scripts\activate
tts-server --model_name tts_models/multilingual/multi-dataset/xtts_v2Optional flags:
- Add
--use_cudato run on GPU - Add
--language_idx de(or other language code)
Example:
tts-server --model_name tts_models/multilingual/multi-dataset/xtts_v2 --use_cuda --language_idx de
On the first run, the program should automatically install the TTS model (XTTS2). You may have to indicate consent to the license during the download process.
You’re now running! 🎉 Proceed to WingmanAI Configuration.
-
Start the TTS server using any method above
-
Open WingmanAI
-
Choose a Wingman and click the 🔧 config wrench
-
Under Text to Speech, choose
Local OpenAI Compatible TTS -
Click the ⚙️ configuration wheel
-
Enter:
- URL:
http://localhost:5002/v1 - Model:
XTTS2(or anything else; just a placeholder)
- URL:
-
Adjust Speed to control speech rate
-
Choose a Voice from the built-ins, or:
- Use a
.wavfile (1 voice sample) of the speaker whose voice you want to clone (recommended: 5-10 seconds, mono, 22050hz) - Or a folder of
.wavfiles (multiple samples) of the speaker whose voice you want to clone (recommended 3-6 samples, each 5-10 seconds, mono, 22050hz)
- Use a
-
For the Voice enter either the name of a built in voice (see list below) or enter the path to the
.wavfile or speaker folder (use/, not\) you made above (recommend placing these in the cloning_wavs folder if using this repo or the exe or another similar folder if using the coqui-tts repo).
| Claribel Dervla | Dervla Studious | Gracie Wise | Tammie Ema |
| Alison Dietlinde | Ana Florence | Annmarie Nele | Asya Anara |
| Brenda Stern | Gitta Nikolina | Henriette Usha | Sofia Hellen |
| Tammy Grit | Tanja Adelina | Vjollca Johnnie | Andrew Chipper |
| Badr Odhiambo | Dionisio Schuyler | Royston Min | Viktor Eka |
| Abrahan Mackdde Michal | Baldur Sanjin | Craig Gutsy | Damien Black |
| Gilberto Mathias | Ilkin Urbano | Kazuhiko Atallah | Ludvig Milivoj |
| Suad Qasim | Torcull Diarmuid | Viktor Menelaos | Zacharie Aimilios |
| Nova Hogarth | Maja Ruoho | Uta Obando | Lidiya Szekeres |
| Chandra MacFarland | Szofi Granger | Camilla Holmström | Lilya Stainthorpe |
| Zofija Kendrick | Narelle Moon | Barbora MacLean | Alexandra Hisakawa |
| Alma María | Rosemary Okafor | Ige Behringer | Filip Traverse |
| Damjan Chapman | Wulf Carlevaro | Aaron Dreschner | Kumar Dahl |
| Eugenio Mataracı | Ferran Simen | Xavier Hayasaka | Luis Moray |
| Marcos Rudaski |
✅ With the server running, open http://localhost:5002 in your browser to try out all the voices in a demo UI.
- Save your Wingman
Your Wingman now speaks with XTTS2! 🗣️✨
