-
Notifications
You must be signed in to change notification settings - Fork 30
voice clone for Dia-1.6B #107
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Currently, TTS.cpp supports inference for the Dia model via CLI, but it does not expose a way to perform voice cloning with an audio reference, as supported by the original Dia implementation in Python.
In the original Dia Python API, we can load an audio reference and transcript to guide the voice characteristics of generated speech:
from dia.model import Dia
model = Dia.from_pretrained("nari-labs/Dia-1.6B", compute_dtype="float16")
clone_from_text = "[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face."
clone_from_audio = "simple.mp3"
text_to_generate = "[S1] Hello, how are you? [S2] I'm good, thank you."
output = model.generate(
clone_from_text + text_to_generate,
audio_prompt=clone_from_audio,
use_torch_compile=True,
verbose=True
)
model.save_audio("voice_clone.mp3", output)
Feature request:
Add an inference option in TTS.cpp for the Dia model that allows:
- Loading an audio reference file (e.g., .mp3 / .wav).
- Providing the transcript of that reference audio.
- Generating new audio that mimics the reference voice, directly in the CLI or through a programmatic API.
Proposed interface example (CLI):
./tts --model dia.gguf \
--text "Hello, how are you?" \
--clone-audio reference.mp3 \
--clone-text "[S1] This is the transcript of the reference audio."
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request