Speech-to-Text & Text-to-Speech servers optimized for Apple Silicon
Features • Installation • Usage • API • License
Lightweight FastAPI servers for on-device speech processing using MLX-optimized models. Runs entirely on your Mac with no cloud dependencies.
🎤 Speech-to-Text (STT)
- Parakeet-MLX model (110M parameters)
- OpenAI-compatible
/v1/audio/transcriptionsendpoint - Real-time transcription on Apple Silicon
- ~115MB RAM usage
🔊 Text-to-Speech (TTS)
- Kokoro-MLX model (82M parameters)
- 43+ high-quality voices
- Simple HTTP API
- ~400MB RAM usage
⚡ Apple Silicon Optimized
- Uses Apple's MLX framework for GPU acceleration
- Unified memory architecture support
- Runs locally - no internet required after model download
# Clone the repository
git clone https://github.com/[your-username]/mac-speech-services.git
cd mac-speech-services
# Create virtual environment
python3.12 -m venv venv
source venv/bin/activate
# Install dependencies for both services
pip install -r stt-service/requirements.txt
pip install -r tts-service/requirements.txt# Start both STT and TTS
./start-all.sh# Terminal 1 - STT on port 8001
./start-stt.sh
# Terminal 2 - TTS on port 8002
./start-tts.sh# Check STT health
curl http://localhost:8001/health
# Check TTS health
curl http://localhost:8002/health
# Transcribe audio (STT)
curl -X POST http://localhost:8001/v1/audio/transcriptions \
-F file=@audio.wav
# Generate speech (TTS)
curl -X POST http://localhost:8002/tts \
-F text="Hello world" \
-F voice=af_bella \
--output speech.wav| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/v1/audio/transcriptions |
POST | Transcribe audio file |
Transcribe Request:
curl -X POST http://localhost:8001/v1/audio/transcriptions \
-F file=@audio.wav \
-F response_format=jsonResponse:
{
"text": "transcribed text here"
}| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/voices |
GET | List available voices |
/tts |
POST | Generate speech |
TTS Request:
curl -X POST http://localhost:8002/tts \
-F text="Hello world" \
-F voice=af_bella \
-F speed=1.0 \
--output speech.wavAvailable Voices:
af_bella,af_heart,af_nicole,af_sky(American Female)am_adam,am_michael(American Male)bf_emma,bf_isabella(British Female)bm_george,bm_lewis(British Male)- And 30+ more...
- macOS with Apple Silicon (M1/M2/M3)
- Python 3.12+
- ~520MB RAM for both services
- Models download on first use (~500MB total)
The services bind to 0.0.0.0 and are accessible from:
- Localhost:
http://localhost:8001/http://localhost:8002 - Network:
http://<your-mac-ip>:8001/http://<your-mac-ip>:8002 - VMs/Docker:
http://host.docker.internal:8001
mac-speech-services/
├── stt-service/
│ ├── stt_server.py # Parakeet-MLX STT server
│ └── requirements.txt # STT dependencies
├── tts-service/
│ ├── kokoro_server.py # Kokoro-MLX TTS server
│ └── requirements.txt # TTS dependencies
├── start-stt.sh # Start STT only
├── start-tts.sh # Start TTS only
├── start-all.sh # Start both services
├── LICENSE
└── README.md
Models are downloaded automatically on first use from Hugging Face:
- STT:
mlx-community/parakeet-tdt_ctc-110m(~115MB) - TTS:
mlx-community/Kokoro-82M-bf16(~400MB)
Cached at ~/.cache/huggingface/ and ~/.cache/kokoro_mlx/.
# Check if ports are in use
lsof -ti:8001 8002
# Kill existing processes
pkill -9 -f uvicorn# Verify MLX installation
python -c "import mlx; print(mlx.__version__)"
# Check Hugging Face cache
ls ~/.cache/huggingface/- STT: ~115MB
- TTS: ~400MB
- Total: ~520MB (fits comfortably on any Apple Silicon Mac)
- mlx - Apple's ML framework
- parakeet-mlx - STT models
- kokoro-mlx - TTS models
MIT License - see LICENSE file.
Contributions welcome! Please read CONTRIBUTING.md first.
- Models by mlx-community on Hugging Face
- Parakeet by NVIDIA, adapted for MLX by riedemannai
- Kokoro by hexgrad, adapted for MLX by nicholaslor
To prevent memory leaks (Kokoro-MLX v0.1.0 can accumulate memory over time):
# Unload model to free RAM (~350-400MB freed)
curl -X POST http://localhost:8002/unload
# Reload when needed
curl -X POST http://localhost:8002/reloadFor clients that send JSON instead of form-data:
curl -X POST http://localhost:8002/tts-json \
-H "Content-Type: application/json" \
-d '{"text":"Hello","voice":"af_bella","speed":1.0}' \
--output speech.wavAuto-restart if memory exceeds 4GB:
./memory_watchdog.sh &