Modular audio inference runtime with plugin architecture.
Voco separates the core runtime from model implementations. Install only what you need, use a consistent API across different models.
Core concept: One interface for multiple TTS/audio models.
router = AudioRouter()
router.load("kokoro", alias="tts")
router.infer("tts", text="Hello world")Switch models without changing your code:
router.load("other-model", alias="tts") # Same interface
router.infer("tts", text="Hello world")# Core (lightweight, no dependencies)
pip install voco
# Install plugins you need
pip install voco-kokoro # for Kokoro TTSfrom voco.core import AudioRouter
import voco_kokoro
router = AudioRouter()
router.load("kokoro", alias="tts", device="cpu")
for result in router.infer("tts", text="Hello world", voice="af_heart"):
audio = result.audioVoco separates the core runtime from model plugins:
- Core (
voco): Router, caching, plugin loader - no heavy dependencies - Plugins (
voco-kokoro,voco-gtts, etc.): Each model is a separate package with its own dependencies
Load models dynamically at runtime:
router = AudioRouter()
router.load("kokoro", alias="tts") # Loads voco-kokoro plugin
audio = router.infer("tts", text="Hello world")Switch models without changing code:
router.load("gtts", alias="tts") # Replace with Google TTS
audio = router.infer("tts", text="Hello world") # Same interface- Zero dependencies in core
- Consistent API across models
- Plugin architecture
- Optional caching layer (experimental)
- Type safe
Optional file-based cache for repeated inference calls.
router = AudioRouter(cache=True)
# First call generates and caches
audio = router.infer("tts", text="Hello world")
# Subsequent calls return cached result
audio = router.infer("tts", text="Hello world")router = AudioRouter(
cache=True,
cache_config={
"max_size_mb": 500, # Max cache size
"ttl_seconds": 86400, # Time to live (1 hour - 30 days)
"warn_at_percent": 80, # Warning threshold
}
)router.cache.stats() # View cache usage
router.cache.clear() # Clear all cache
router.cache.clear(model="tts") # Clear specific model# Skip cache for specific call
audio = router.infer("tts", text="Hello", cache=False)voco uses file-based cache and stored in ~/.voco/cache/. Keys are generated from model name, text, and parameters by default.
Useful for repeated phrases. Not recommended for unique text or privacy-sensitive content and realtime environments.
Each plugin is a separate PyPI package with its own dependencies. Install only what you need.
- voco-kokoro - Kokoro TTS
Plugins register themselves via Python entry points. See CONTRIBUTING.md for the plugin development guide.
# Your plugin structure
voco-myplugin/
├── voco_myplugin/
│ └── __init__.py # Implements BaseAudioModel
└── pyproject.toml # Defines entry pointSee CONTRIBUTING.md for detailed setup and plugin development guide.
git clone https://github.com/yourusername/voco.git
cd voco
pip install -e .
pip install -e plugins/voco-kokoro
python examples/generate_audio.pyMIT
