Skip to content

CodeBySonu95/VoxSherpa-TTS

Repository files navigation

VoxSherpa TTS Banner

Join Beta Support F-Droid Android License Sherpa-ONNX Downloads

VoxSherpa TTS

Studio-quality offline neural text-to-speech for Android.
Hindi Β· English Β· British Β· Japanese Β· Chinese Β· and more β€” No cloud. No limits. No compromise.


πŸ† Featured In

VoxSherpa TTS is listed in the official README of k2-fsa/sherpa-onnx β€” the core inference library powering this app.

Sherpa-ONNX HuggingFace


Why VoxSherpa?

Most TTS apps make you choose between quality and privacy. Cloud-based tools like ElevenLabs sound incredible β€” but they require internet, send your text to remote servers, and charge per character.

VoxSherpa breaks that tradeoff.

It runs two professional-grade neural engines entirely on your device:

Engine Quality Speed Best For
🧠 Kokoro-82M Studio-grade · rivals ElevenLabs Slower on budget hardware Audiobooks, voiceovers, professional content
⚑ Piper / VITS Natural · clear Fast on any device Daily use, quick synthesis

Screenshots

Generate Models Library Settings

Features

πŸŽ™οΈ Dual Neural Engine

  • Kokoro-82M β€” 82 million parameter neural model. Multilingual support including Hindi, English, British English, French, Spanish, Chinese, Japanese and 50+ more languages. Same architecture used by top-tier commercial TTS services.
  • Piper / VITS β€” Fast, lightweight, natural. Generates speech in seconds on any Android device.

πŸ”’ 100% Offline & Private

  • All processing happens on your device
  • No internet required after model download
  • No account, no telemetry, no data collection
  • Your text never leaves your phone

πŸ“¦ Model Management

  • Download models directly from the app
  • Import your own .onnx models from local storage
  • Multiple models installed simultaneously
  • Smart storage tracking

🎧 Audio Controls

  • Real-time waveform visualization
  • Adjustable speed and pitch
  • Play, pause, and replay generated audio
  • Export as WAV with correct sample rate per model

πŸ“š Speech Library

  • Save all generated audio locally
  • Favorites system for quick access
  • View generation history with timestamps
  • Voice model attribution per recording

βš™οΈ Smart Settings

  • Smart Punctuation β€” natural pauses after sentence breaks
  • Emotion Tags β€” [whisper], [angry], [happy] support
  • Per-model voice selection (Kokoro supports 100+ speakers)
  • Theme-aware UI

Technical Architecture

User Text
    β”‚
    β”œβ”€β”€β”€ Kokoro Engine (KokoroEngine.java)
    β”‚         └── Sherpa-ONNX JNI β†’ ONNX Runtime β†’ CPU/NNAPI
    β”‚                   └── kokoro-multi-lang-v1_0 (82M params, FP32)
    β”‚
    └─── Piper / VITS Engine (VoiceEngine.java)
              └── Sherpa-ONNX JNI β†’ ONNX Runtime β†’ CPU
                        └── VITS model (language-specific)

Built with:

  • Sherpa-ONNX β€” on-device neural inference
  • Kokoro-82M β€” multilingual neural TTS model
  • Piper β€” fast local TTS
  • Android AudioTrack API β€” low-latency PCM playback

Performance

Generation speed depends entirely on your device's processor:

Device Tier Kokoro Piper
🟒 Flagship (Snapdragon 8 Gen 3) ~20–40 sec/min audio ~5 sec/min audio
🟑 Mid-range (8-core) ~60–90 sec/min audio ~10 sec/min audio
πŸ”΄ Budget (6-core) ~2–3 min/min audio ~20 sec/min audio

Kokoro prioritizes quality over speed by design. It uses the same 82M parameter architecture that powers premium commercial TTS β€” running it entirely offline on a mobile CPU is genuinely pushing the hardware limits.


Installation

πŸ§ͺ Help Me Reach Google Play β€” Join the Beta!

I've submitted VoxSherpa TTS V2.1 to Google Play, but according to Play Store rules, I need at least 12 testers for 14 days before I can publish to production.

If you find this project useful and want early access to V2.1 β€” I'd really appreciate your help. All you need to do is install the app and keep it for 14 days. You don't have to do anything else.

What's new in V2.1:

  • πŸ”Š System-wide TTS engine β€” use VoxSherpa in any app (Chrome, WhatsApp, etc.)
  • πŸ“„ PDF to Audio
  • πŸ“‘ TXT to Audio

How to join:

  1. Fill out the form below with your Gmail
  2. I'll add you manually to the closed test
  3. You'll receive a Play Store opt-in link

Join Beta

Source code for V2.0 and V2.1 will be pushed to GitHub after beta testing is complete.

F-Droid

Coming Soon β€” F-Droid version uses GitHub-hosted model list instead of Firebase β€” fully FOSS compliant, GPL v3.0 licensed.

F-Droid Coming Soon

Manual APK

Download the latest APK from Releases.


Model Import (Technical Users)

VoxSherpa supports importing custom .onnx models without any server:

  1. Place your .onnx model + tokens.txt on device storage
  2. Open Models tab β†’ tap + β†’ Import Local Model
  3. Select your files

Compatible with any Sherpa-ONNX compatible TTS model.


Contributing

VoxSherpa is open source. Contributions welcome:

  • πŸ› Bug reports via Issues
  • πŸ’‘ Feature requests via Discussions
  • πŸ”§ Pull requests for fixes and improvements

License

Copyright (C) 2025 CodeBySonu95

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

https://www.gnu.org/licenses/gpl-3.0.html

Acknowledgements


Built with obsession. Runs without internet.

VoxSherpa β€” Because your voice deserves to stay yours.

About

πŸŽ™οΈ VoxSherpa TTS Offline Neural Text-to-Speech Engine for Android ⚑ Sherpa-ONNX powered πŸ”Š Natural voice synthesis πŸ“± Fully offline processing πŸš€ No cloud β€’ No limits

Topics

Resources

License

Stars

Watchers

Forks

Packages