Skip to content

## 🚀 Feature: CPU Support for Soprano TTS #13

@humair-m

Description

@humair-m

@ekwek1

🚀 Feature: CPU Support for Soprano TTS

Summary

I've implemented full CPU support for Soprano TTS, enabling the model to run on systems without CUDA-enabled GPUs. This makes Soprano accessible to a much wider range of users and deployment scenarios.

Motivation

Currently, Soprano requires a CUDA-enabled GPU to run, which limits its accessibility. Many users want to:

  • Test Soprano on laptops or servers without GPUs
  • Deploy in CPU-only environments
  • Use Soprano for offline/non-real-time generation where speed is less critical

Changes Made

I've submitted a pull request that implements CPU support across the entire codebase:

1. Core TTS Module (soprano/tts.py)

  • Added 'cpu' to recognized devices
  • Replaced hardcoded .cuda() calls with .to(device)
  • Added map_location=device to weight loading
  • Made all tensor operations device-agnostic

2. Backend Support

  • LMDeploy (soprano/backends/lmdeploy.py): Added CPU mode support
  • Transformers (soprano/backends/transformers.py): Enhanced CPU compatibility with proper dtype handling

3. Decoder Components

  • Spectral Operations (soprano/vocos/spectral_ops.py): Removed hardcoded CUDA device from window buffer

4. Demo Application

  • Updated Gradio app to automatically detect and display current device
  • Removed @spaces.GPU decorator for CPU compatibility
  • Added performance notes for CPU vs GPU usage

5. Documentation

  • Updated README.md with CPU usage examples
  • Added changelog section for v0.0.3
  • Updated installation requirements
  • Checked off CPU support in roadmap

Technical Details

Device Detection:

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
model = SopranoTTS(backend="auto", device=DEVICE)

Automatic Backend Selection:

  • CUDA device → LMDeploy (if available) or Transformers
  • CPU device → Transformers backend with float32 precision

Key Implementation Points:

  • All .cuda() calls replaced with .to(self.device)
  • PyTorch buffers properly registered for automatic device movement
  • dtype selection based on device (bfloat16 for CUDA, float32 for CPU)
  • Cache management only applied to CUDA mode

Testing

The implementation has been tested with:

  • ✅ CPU inference (single and batch)
  • ✅ CUDA inference (backwards compatibility maintained)
  • ✅ Automatic device detection
  • ✅ Both LMDeploy and Transformers backends
  • ✅ Gradio demo on both devices

Performance Notes

  • CUDA: ~2000× real-time factor (unchanged)
  • CPU: Slower than CUDA but fully functional for offline generation

Breaking Changes

None - all changes are backwards compatible. Existing CUDA code continues to work exactly as before.

Pull Request

I've submitted a pull request with all these changes. The implementation is clean, well-tested, and maintains full backwards compatibility with existing CUDA deployments.

Benefits

  1. Wider Accessibility: Users without GPUs can now use Soprano
  2. Testing & Development: Easier local development on laptops
  3. Flexible Deployment: Support for CPU-only server environments
  4. Cost Reduction: Option to use cheaper CPU instances for non-real-time workloads

Files Changed

  • soprano/tts.py
  • soprano/backends/lmdeploy.py
  • soprano/backends/transformers.py
  • soprano/vocos/spectral_ops.py
  • app.py (Gradio demo)
  • README.md

Looking forward to your feedback! Let me know if you'd like any changes or have questions about the implementation.

Related Roadmap Item: Closes the "CPU support" checkbox in the roadmap ✅

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions