Skip to content

Ambisphaeric/tts-stt-services

Mac Speech Services

Speech-to-Text & Text-to-Speech servers optimized for Apple Silicon

FeaturesInstallationUsageAPILicense

Lightweight FastAPI servers for on-device speech processing using MLX-optimized models. Runs entirely on your Mac with no cloud dependencies.

Features

🎤 Speech-to-Text (STT)

  • Parakeet-MLX model (110M parameters)
  • OpenAI-compatible /v1/audio/transcriptions endpoint
  • Real-time transcription on Apple Silicon
  • ~115MB RAM usage

🔊 Text-to-Speech (TTS)

  • Kokoro-MLX model (82M parameters)
  • 43+ high-quality voices
  • Simple HTTP API
  • ~400MB RAM usage

Apple Silicon Optimized

  • Uses Apple's MLX framework for GPU acceleration
  • Unified memory architecture support
  • Runs locally - no internet required after model download

Installation

# Clone the repository
git clone https://github.com/[your-username]/mac-speech-services.git
cd mac-speech-services

# Create virtual environment
python3.12 -m venv venv
source venv/bin/activate

# Install dependencies for both services
pip install -r stt-service/requirements.txt
pip install -r tts-service/requirements.txt

Usage

Quick Start - Run Both Services

# Start both STT and TTS
./start-all.sh

Or Run Individually

# Terminal 1 - STT on port 8001
./start-stt.sh

# Terminal 2 - TTS on port 8002
./start-tts.sh

Test the Services

# Check STT health
curl http://localhost:8001/health

# Check TTS health
curl http://localhost:8002/health

# Transcribe audio (STT)
curl -X POST http://localhost:8001/v1/audio/transcriptions \
  -F file=@audio.wav

# Generate speech (TTS)
curl -X POST http://localhost:8002/tts \
  -F text="Hello world" \
  -F voice=af_bella \
  --output speech.wav

API Reference

STT Service (Port 8001)

Endpoint Method Description
/health GET Health check
/v1/audio/transcriptions POST Transcribe audio file

Transcribe Request:

curl -X POST http://localhost:8001/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F response_format=json

Response:

{
  "text": "transcribed text here"
}

TTS Service (Port 8002)

Endpoint Method Description
/health GET Health check
/voices GET List available voices
/tts POST Generate speech

TTS Request:

curl -X POST http://localhost:8002/tts \
  -F text="Hello world" \
  -F voice=af_bella \
  -F speed=1.0 \
  --output speech.wav

Available Voices:

  • af_bella, af_heart, af_nicole, af_sky (American Female)
  • am_adam, am_michael (American Male)
  • bf_emma, bf_isabella (British Female)
  • bm_george, bm_lewis (British Male)
  • And 30+ more...

System Requirements

  • macOS with Apple Silicon (M1/M2/M3)
  • Python 3.12+
  • ~520MB RAM for both services
  • Models download on first use (~500MB total)

Network Access

The services bind to 0.0.0.0 and are accessible from:

  • Localhost: http://localhost:8001 / http://localhost:8002
  • Network: http://<your-mac-ip>:8001 / http://<your-mac-ip>:8002
  • VMs/Docker: http://host.docker.internal:8001

Project Structure

mac-speech-services/
├── stt-service/
│   ├── stt_server.py      # Parakeet-MLX STT server
│   └── requirements.txt   # STT dependencies
├── tts-service/
│   ├── kokoro_server.py   # Kokoro-MLX TTS server
│   └── requirements.txt   # TTS dependencies
├── start-stt.sh          # Start STT only
├── start-tts.sh          # Start TTS only
├── start-all.sh          # Start both services
├── LICENSE
└── README.md

Models

Models are downloaded automatically on first use from Hugging Face:

  • STT: mlx-community/parakeet-tdt_ctc-110m (~115MB)
  • TTS: mlx-community/Kokoro-82M-bf16 (~400MB)

Cached at ~/.cache/huggingface/ and ~/.cache/kokoro_mlx/.

Troubleshooting

Services won't start

# Check if ports are in use
lsof -ti:8001 8002

# Kill existing processes
pkill -9 -f uvicorn

Models not loading

# Verify MLX installation
python -c "import mlx; print(mlx.__version__)"

# Check Hugging Face cache
ls ~/.cache/huggingface/

Out of memory

  • STT: ~115MB
  • TTS: ~400MB
  • Total: ~520MB (fits comfortably on any Apple Silicon Mac)

Related Projects

License

MIT License - see LICENSE file.

Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

Acknowledgments

  • Models by mlx-community on Hugging Face
  • Parakeet by NVIDIA, adapted for MLX by riedemannai
  • Kokoro by hexgrad, adapted for MLX by nicholaslor

Memory Management

To prevent memory leaks (Kokoro-MLX v0.1.0 can accumulate memory over time):

# Unload model to free RAM (~350-400MB freed)
curl -X POST http://localhost:8002/unload

# Reload when needed
curl -X POST http://localhost:8002/reload

JSON API

For clients that send JSON instead of form-data:

curl -X POST http://localhost:8002/tts-json \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello","voice":"af_bella","speed":1.0}' \
  --output speech.wav

Memory Watchdog

Auto-restart if memory exceeds 4GB:

./memory_watchdog.sh &

About

Services for TTS and STT over HTTP

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors