From 0c933d3b6e4f3a9e5f500dcfcbc15cf14bb7f2f1 Mon Sep 17 00:00:00 2001
From: Hiten Shah <hnshah@gmail.com>
Date: Tue, 24 Mar 2026 01:42:04 -0700
Subject: [PATCH] docs: Add model performance comparison and selection guide

Adds performance benchmarks and selection guidance to help users choose
the right model for their use case.

- Added performance comparison table with RTF, memory usage, and use cases
- Added 'Which Model Should I Use?' section with clear recommendations
- Included performance notes with testing methodology

Tested all 3 models (nano, micro, mini) on Apple M2 Ultra with
comprehensive benchmarks measuring load time, generation speed (RTF),
and memory usage.

Hardware: Mac Studio M2 Ultra, 24 cores, macOS
---
 README.md | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/README.md b/README.md
index 29a2747..5a3a4b0 100644
--- a/README.md
+++ b/README.md
@@ -52,6 +52,35 @@ Kitten TTS is an open-source, lightweight text-to-speech library built on ONNX.
 
 > **Note:** Some users have reported issues with the `kitten-tts-nano-0.8-int8` model. If you encounter problems, please [open an issue](https://github.com/KittenML/KittenTTS/issues).
 
+### Performance Comparison
+
+Real-world performance on Apple M2 Ultra (24 cores):
+
+| Model | Parameters | Disk Size | RTF* | Memory** | Speed | Best For |
+|-------|-----------|-----------|------|----------|-------|----------|
+| **kitten-tts-mini** | 80M | 80 MB | 0.19x | ~180 MB | 5x real-time | High-quality audiobooks, podcasts |
+| **kitten-tts-micro** | 40M | 41 MB | 0.10x | ~160 MB | 10x real-time | General use, summaries, articles |
+| **kitten-tts-nano** | 15M | 56 MB | 0.03x | ~145 MB | 34x real-time | Quick responses, notifications |
+
+\* RTF = Real-Time Factor (lower is faster). 0.03x means generating 1 second of audio takes 0.03 seconds.  
+\*\* Memory usage beyond base requirements; actual usage may vary with text length.
+
+### Which Model Should I Use?
+
+- **For fastest generation:** Use `nano` — generates audio 34x faster than real-time with good quality
+- **For balanced performance:** Use `micro` — recommended for most use cases, 10x real-time
+- **For best quality:** Use `mini` — highest fidelity audio, still 5x faster than real-time
+
+All models run efficiently on CPU without requiring a GPU. Performance scales with CPU cores and speed.
+
+### Performance Notes
+
+- All measurements taken on Apple M2 Ultra (24 cores, macOS)
+- RTF varies slightly with text complexity and length
+- Memory usage is approximate and depends on text being processed
+- Your mileage may vary on different hardware; contributions of benchmarks welcome
+
+
 ## Demo
 
 https://github.com/user-attachments/assets/d80120f2-c751-407e-a166-068dd1dd9e8d