mmwillet · mmwillet · Jun 19, 2025 · Jun 18, 2025 · Jun 18, 2025 · Jun 18, 2025
diff --git a/README.md b/README.md
@@ -20,6 +20,7 @@ In this endeavor, MacOS and metal support will be treated as the primary platfor
 | [Parler TTS Large](https://huggingface.co/parler-tts/parler-tts-large-v1)|&check;|&check;|&check;|[here](https://huggingface.co/mmwillet2/Parler_TTS_GGUF)|
 | [Kokoro](https://huggingface.co/hexgrad/Kokoro-82M)                      |&check;|&cross;|&check;|[here](https://huggingface.co/mmwillet2/Kokoro_GGUF)    |
 | [Dia](https://github.com/nari-labs/dia)                                  |&check;|&check;|&check;|[here](https://huggingface.co/mmwillet2/Dia_GGUF)       |
+| [Orpheus](https://github.com/canopyai/Orpheus-TTS)                       |&check;|&cross;|&cross;|[here](https://huggingface.co/mmwillet2/Orpheus_GGUF)       |
 
 Additional Model support will initially be added based on open source model performance in both the [old TTS model arena](https://huggingface.co/spaces/TTS-AGI/TTS-Arena) and [new TTS model arena](https://huggingface.co/spaces/TTS-AGI/TTS-Arena-V2) as well as the availability of said models' architectures and checkpoints.
 

diff --git a/examples/cli/README.md b/examples/cli/README.md
@@ -11,7 +11,7 @@ This simple example cli tool can be used to generate speach from a text prompt a
 
 In order to get a detailed breakdown the functionality currently available you can call the cli with the `--help` parameter. This will return a breakdown of all parameters:
 ```bash
-./cli --help
+./tts-cli --help
 
 --temperature (-t):
     The temperature to use when generating outputs. Defaults to 1.0.
@@ -52,25 +52,44 @@ In order to get a detailed breakdown the functionality currently available you c
 General usage should follow from these possible parameters. E.G. The following command will save generated speech to the `/tmp/test.wav` file.
 
 ```bash
-./cli --model-path /model/path/to/gguf_file.gguf --prompt "I am saying some words" --save-path /tmp/test.wav
+./tts-cli --model-path /model/path/to/gguf_file.gguf --prompt "I am saying some words" --save-path /tmp/test.wav
 ```
 
-#### Dia Generation Arguments
+#### Dia and Orpheus Generation Arguments
 
-Currently the default cli arguments are not aligned with Dia's default sampling settings. Specifically the temperature and topk settings should be changed to  `1.3` and `35` respectively when generating with Dia like so:
+Currently the default cli arguments are not aligned with Dia's or Orpheus' default sampling settings. Specifically the temperature and topk settings should be changed to  `1.3` and `35` respectively when generating with Dia like so:
 
-```base
-./cli --model-path /model/path/to/Dia.gguf --prompt "[S1] Hi, I am Dia, this is how I talk." --save-path /tmp/test.wav --topk 35 --temperature 1.3
+```bash
+./tts-cli --model-path /model/path/to/Dia.gguf --prompt "[S1] Hi, I am Dia, this is how I talk." --save-path /tmp/test.wav --topk 35 --temperature 1.3
 ```
 
+and the voice, temperature, and repetition penalty setting should be changed to a valid voice (e.g. `leah`), `0.7`, and `1.1` respectively when generating with Orpheus like so:
+
+```bash
+./tts-cli --model-path /model/path/to/Orpheus.gguf --prompt "Hi, I am Orpheus, this is how I talk." --save-path /tmp/test.wav --voice leah --temperature 0.7 --repetition-penalty 1.1
+```
+
+
 #### Conditional Generation
 
+Conditional generation is a Parler TTS specific behavior.
+
 By default the Parler TTS model is saved to the GGUF format with a pre-encoded conditional prompt (i.e. a prompt used to determine how to generate speech), but if the text encoder model, the T5-Encoder model, is avaiable in gguf format (see the [python convertion scripts](../../py-gguf/README.md) for more information on how to prepare the T5-Encoder model) then a new conditional prompt can be used for generation like so:
 
 ```bash
-./cli --model-path /model/path/to/gguf_file.gguf --prompt "I am saying some words" --save-path /tmp/test.wav --text-encoder-path /model/path/to/t5_encoder_file.gguf --consditional-prompt "deep voice"
+./tts-cli --model-path /model/path/to/gguf_file.gguf --prompt "I am saying some words" --save-path /tmp/test.wav --text-encoder-path /model/path/to/t5_encoder_file.gguf --consditional-prompt "deep voice"
 ```
 
+#### Distinct Voice Support
+
+Kokoro and Orpheus both support voices which can be set via the `--voice` (`-v`) argument. Orpheus supports the following voices:
+
+```
+"zoe", "zac","jess", "leo", "mia", "julia", "leah"
+```
+
+and Kokoro supports the voices listedin the section below.
+
 #### MultiLanguage Configuration
 
 Kokoro supports multiple langauges with distinct voices, and, by default, the standard voices are encoded in the Kokoro gguf file. Below is a list of the available voices:

diff --git a/ggml b/ggml
diff --git a/include/common.h b/include/common.h
@@ -18,12 +18,14 @@ enum tts_arch {
 	PARLER_TTS_ARCH = 0,
 	KOKORO_ARCH = 1,
 	DIA_ARCH = 2,
+	ORPHEUS_ARCH = 3,
 };
 
 const std::map<std::string, tts_arch> SUPPORTED_ARCHITECTURES = {
 	{ "parler-tts", PARLER_TTS_ARCH },
 	{ "kokoro", KOKORO_ARCH },
 	{ "dia", DIA_ARCH },
+	{ "orpheus", ORPHEUS_ARCH }
 };
 
 struct generation_configuration {

diff --git a/include/tts.h b/include/tts.h
@@ -4,13 +4,15 @@
 #include "parler_model.h"
 #include "kokoro_model.h"
 #include "dia_model.h"
+#include "orpheus_model.h"
 #include <thread>
 #include <fstream>
 #include <array>
 
 struct tts_runner * parler_tts_from_file(gguf_context * meta_ctx, ggml_context * weight_ctx, int n_threads, generation_configuration * config, tts_arch arch, bool cpu_only);
 struct tts_runner * kokoro_from_file(gguf_context * meta_ctx, ggml_context * weight_ctx, int n_threads, generation_configuration * config, tts_arch arch, bool cpu_only);
 struct tts_runner * dia_from_file(gguf_context * meta_ctx, ggml_context * weight_ctx, int n_threads, generation_configuration * config, tts_arch arch, bool cpu_only);
+struct tts_runner * orpheus_from_file(gguf_context * meta_ctx, ggml_context * weight_ctx, int n_threads, generation_configuration * config, tts_arch arch, bool cpu_only);
 struct tts_runner * runner_from_file(const std::string & fname, int n_threads, generation_configuration * config, bool cpu_only = true);
 int generate(tts_runner * runner, std::string sentence, struct tts_response * response, generation_configuration * config);
 void update_conditional_prompt(tts_runner * runner, const std::string file_path, const std::string prompt, bool cpu_only = true);

diff --git a/py-gguf/convert_orpheus_to_gguf b/py-gguf/convert_orpheus_to_gguf
@@ -0,0 +1,21 @@
+#!/usr/bin/env python3
+
+import argparse
+from tts_encoders.orpheus_gguf_encoder import OrpheusEncoder, DEFAULT_ORPHEUS_REPO_ID, DEFAULT_SNAC_REPO_ID
+from os.path import isdir, dirname
+
+
+def parse_arguments():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--save-path", type=str, required=True, help="the path to save the converted gguf tts model too.")
+    parser.add_argument("--repo-id", type=str, required=False, default=DEFAULT_ORPHEUS_REPO_ID, help="The Huggingface repository to pull the model from.")
+    parser.add_argument("--snac-repo-id", type=str, required=False, default=DEFAULT_SNAC_REPO_ID, help="The Huggingface repository to pull the snac audio decoder model from.")
+    parser.add_argument("--never-make-dirs", default=False, action="store_true", help="When set the script will never add new directories.")
+    return parser.parse_known_args()
+
+
+if __name__ == '__main__':
+    args, _ = parse_arguments()
+    if not isdir(dirname(args.save_path)) and args.never_make_dirs:
+        raise ValueError(f"model path, {args.save_path} is not a valid path.")
+    OrpheusEncoder(args.save_path, repo_id=args.repo_id).write()
diff --git a/py-gguf/requirements.txt b/py-gguf/requirements.txt
@@ -4,8 +4,8 @@ gguf==0.10.0
 spacy==3.8.5
 kokoro==0.9.4
 huggingface-hub>=0.26.5
-transformers>=4.43.3
-parler_tts @ git+https://github.com/huggingface/parler-tts.git@8e465f1b5fcd223478e07175cb40494d19ffbe17
+transformers>=4.46.0
+parler_tts @ git+https://github.com/huggingface/parler-tts.git@d108732cd57788ec86bc857d99a6cabd66663d68
 gguf==0.10.0
 safetensors==0.5.3
 groovy==0.1.2
@@ -14,5 +14,7 @@ gradio-client==1.10.0
 llvmlite==0.44.0
 numba==0.61.2
 scipy>=1.15.2
+snac==1.2.1
 soundfile>=0.13.1
-nari-tts @ git+https://github.com/nari-labs/dia.git@7cf50c889c6013f74326cbdcb7696a985a4cf9c1
+nari-tts @ git+https://github.com/nari-labs/dia.git@2811af1c5f476b1f49f4744fabf56cf352be21e5
+torchvision==0.21.0
diff --git a/py-gguf/tts_encoders/__init__.py b/py-gguf/tts_encoders/__init__.py
@@ -5,3 +5,4 @@
 from .kokoro_gguf_encoder import *
 from .dia_gguf_encoder import *
 from .dac_gguf_encoder import *
+from .orpheus_gguf_encoder import *
diff --git a/py-gguf/tts_encoders/dia_gguf_encoder.py b/py-gguf/tts_encoders/dia_gguf_encoder.py
@@ -82,7 +82,7 @@ def prepare_decoder_tensors(self):
             elif parts[0] == "norm":
                 self.set_tensor(f"{base}.norm", param)
             elif parts[0] == "logits_dense":
-                heads = param.shape[1];
+                heads = param.shape[1]
                 for i in range(heads):
                     head = param.data[:, i]
                     self.set_tensor(f"{base}.heads.{i}", head.transpose(0,1))

diff --git a/py-gguf/tts_encoders/kokoro_gguf_encoder.py b/py-gguf/tts_encoders/kokoro_gguf_encoder.py
@@ -96,7 +96,7 @@ class KokoroEncoder(TTSEncoder):
     gguf_encoder.write()
     ```
     """
-    def __init__(self, model_path: Path | str = "./kokoro.gguf", repo_id: Path | str =DEFAULT_KOKORO_REPO,
+    def __init__(self, model_path: Path | str = "./kokoro.gguf", repo_id: Path | str = DEFAULT_KOKORO_REPO,
                  voices: Optional[List[str]] = None, use_espeak: bool = False,
                  phonemizer_repo: Path | str = DEFAULT_TTS_PHONEMIZER_REPO):
         """
+8 −0		include/ggml.h
+5 −0		src/ggml-cpu/CMakeLists.txt
+24 −0		src/ggml-cpu/ggml-cpu-ffast-math.c
+3 −16		src/ggml-cpu/ggml-cpu.c
+23 −0		src/ggml.c