VoxSherpa TTS is listed in the official README of k2-fsa/sherpa-onnx β the core inference library powering this app.
Most TTS apps make you choose between quality and privacy. Cloud-based tools like ElevenLabs sound incredible β but they require internet, send your text to remote servers, and charge per character.
VoxSherpa breaks that tradeoff.
It runs two professional-grade neural engines entirely on your device:
| Engine | Quality | Speed | Best For |
|---|---|---|---|
| π§ Kokoro-82M | Studio-grade Β· rivals ElevenLabs | Slower on budget hardware | Audiobooks, voiceovers, professional content |
| β‘ Piper / VITS | Natural Β· clear | Fast on any device | Daily use, quick synthesis |
- Kokoro-82M β 82 million parameter neural model. Multilingual support including Hindi, English, British English, French, Spanish, Chinese, Japanese and 50+ more languages. Same architecture used by top-tier commercial TTS services.
- Piper / VITS β Fast, lightweight, natural. Generates speech in seconds on any Android device.
- All processing happens on your device
- No internet required after model download
- No account, no telemetry, no data collection
- Your text never leaves your phone
- Download models directly from the app
- Import your own
.onnxmodels from local storage - Multiple models installed simultaneously
- Smart storage tracking
- Real-time waveform visualization
- Adjustable speed and pitch
- Play, pause, and replay generated audio
- Export as WAV with correct sample rate per model
- Save all generated audio locally
- Favorites system for quick access
- View generation history with timestamps
- Voice model attribution per recording
- Smart Punctuation β natural pauses after sentence breaks
- Emotion Tags β
[whisper],[angry],[happy]support - Per-model voice selection (Kokoro supports 100+ speakers)
- Theme-aware UI
User Text
β
ββββ Kokoro Engine (KokoroEngine.java)
β βββ Sherpa-ONNX JNI β ONNX Runtime β CPU/NNAPI
β βββ kokoro-multi-lang-v1_0 (82M params, FP32)
β
ββββ Piper / VITS Engine (VoiceEngine.java)
βββ Sherpa-ONNX JNI β ONNX Runtime β CPU
βββ VITS model (language-specific)
Built with:
- Sherpa-ONNX β on-device neural inference
- Kokoro-82M β multilingual neural TTS model
- Piper β fast local TTS
- Android AudioTrack API β low-latency PCM playback
Generation speed depends entirely on your device's processor:
| Device Tier | Kokoro | Piper |
|---|---|---|
| π’ Flagship (Snapdragon 8 Gen 3) | ~20β40 sec/min audio | ~5 sec/min audio |
| π‘ Mid-range (8-core) | ~60β90 sec/min audio | ~10 sec/min audio |
| π΄ Budget (6-core) | ~2β3 min/min audio | ~20 sec/min audio |
Kokoro prioritizes quality over speed by design. It uses the same 82M parameter architecture that powers premium commercial TTS β running it entirely offline on a mobile CPU is genuinely pushing the hardware limits.
I've submitted VoxSherpa TTS V2.1 to Google Play, but according to Play Store rules, I need at least 12 testers for 14 days before I can publish to production.
If you find this project useful and want early access to V2.1 β I'd really appreciate your help. All you need to do is install the app and keep it for 14 days. You don't have to do anything else.
What's new in V2.1:
- π System-wide TTS engine β use VoxSherpa in any app (Chrome, WhatsApp, etc.)
- π PDF to Audio
- π TXT to Audio
How to join:
- Fill out the form below with your Gmail
- I'll add you manually to the closed test
- You'll receive a Play Store opt-in link
Source code for V2.0 and V2.1 will be pushed to GitHub after beta testing is complete.
Coming Soon β F-Droid version uses GitHub-hosted model list instead of Firebase β fully FOSS compliant, GPL v3.0 licensed.
Download the latest APK from Releases.
VoxSherpa supports importing custom .onnx models without any server:
- Place your
.onnxmodel +tokens.txton device storage - Open Models tab β tap + β Import Local Model
- Select your files
Compatible with any Sherpa-ONNX compatible TTS model.
VoxSherpa is open source. Contributions welcome:
- π Bug reports via Issues
- π‘ Feature requests via Discussions
- π§ Pull requests for fixes and improvements
Copyright (C) 2025 CodeBySonu95
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
https://www.gnu.org/licenses/gpl-3.0.html
- k2-fsa/sherpa-onnx β the inference engine that makes this possible
- hexgrad/Kokoro-82M β the neural model behind studio-quality synthesis
- rhasspy/piper β fast local TTS engine
Built with obsession. Runs without internet.
VoxSherpa β Because your voice deserves to stay yours.




