Skip to content

RainNight11/Voice

Repository files navigation

πŸŽ™οΈ VoicePrint

Professional AI Voice Cloning & Synthesis Platform

Spring Boot Flutter DashScope License

English | δΈ­ζ–‡ζ–‡ζ‘£

Home Screen

Clone any voice with just a few seconds of audio


πŸ“– Table of Contents


✨ Features

🎯 Core Capabilities

Feature Description
Zero-Shot Voice Cloning Clone any voice with just 3-10 seconds of reference audio
Cloud TTS Synthesis High-quality text-to-speech powered by DashScope CosyVoice-v2
Local Inference On-premise voice synthesis with CosyVoice2-0.5B model
Voice Library Create, manage, and organize your custom voice profiles
Real-time Preview Instant audio preview before downloading
Multi-Platform Flutter mobile app + Android native + Web interface

πŸ”₯ Why VoicePrint?

  • πŸš€ Fast Cloning - Create custom voices in seconds, not hours
  • 🎨 High Fidelity - Natural-sounding synthesis with emotion preservation
  • ☁️ Hybrid Architecture - Seamlessly switch between cloud and local inference
  • πŸ“± Cross-Platform - One codebase, multiple platforms
  • πŸ”’ Privacy First - Local inference option keeps your data on-premise
  • πŸ› οΈ Developer Friendly - RESTful APIs with Swagger documentation

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        Client Layer                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Flutter    β”‚   Android   β”‚    Web      β”‚      API Clients        β”‚
β”‚    App      β”‚   Compose   β”‚  Frontend   β”‚      (REST/SDK)         β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚             β”‚             β”‚                 β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Spring Boot Backend                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Voice     β”‚  β”‚  DashScope  β”‚  β”‚     Voice Library       β”‚  β”‚
β”‚  β”‚ Synthesis   β”‚  β”‚    Voice    β”‚  β”‚      Management         β”‚  β”‚
β”‚  β”‚ Controller  β”‚  β”‚ Controller  β”‚  β”‚      Controller         β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚                β”‚                     β”‚                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚                    Service Layer                            β”‚ β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚
β”‚  β”‚ CosyVoice  β”‚ DashScope   β”‚ DashScope    β”‚    OSS Upload   β”‚ β”‚
β”‚  β”‚  Service   β”‚ TTS Service β”‚ Voice Serviceβ”‚    Service      β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚             β”‚              β”‚               β”‚
         β–Ό             β–Ό              β–Ό               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  CosyVoice  β”‚ β”‚  DashScope  β”‚ β”‚  DashScope  β”‚ β”‚  Alibaba    β”‚
β”‚   Python    β”‚ β”‚  TTS API    β”‚ β”‚  Voice API  β”‚ β”‚    OSS      β”‚
β”‚   Local     β”‚ β”‚  (Cloud)    β”‚ β”‚  (Clone)    β”‚ β”‚  Storage    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”„ Data Flow

  1. Voice Cloning Flow

    Audio Upload β†’ OSS Storage β†’ DashScope Voice API β†’ Voice ID Created
    
  2. TTS Synthesis Flow

    Text + Voice ID β†’ DashScope/Local Model β†’ Audio Stream β†’ Client
    

πŸ“Έ Screenshots


Home

Voice Clone

TTS Synthesis

Voice Library

Profile

Settings

πŸš€ Quick Start

Prerequisites

  • JDK 21 (for Spring Boot backend)
  • Python 3.10+ with Conda (for local CosyVoice inference)
  • Flutter 3.10+ (for mobile app)
  • DashScope API Key (get one at DashScope Console)

1. Clone the Repository

git clone https://github.com/your-username/VoicePrint.git
cd VoicePrint

2. Start the Backend

# Set your DashScope API key
export DASHSCOPE_API_KEY=your_api_key_here

# Run the Spring Boot backend
./gradlew backend:bootRun

The backend will start at http://localhost:8081

πŸ“ Swagger UI available at: http://localhost:8081/swagger-ui.html

3. Run the Flutter App

cd flutter_frontend
flutter pub get
flutter run

4. (Optional) Setup Local Inference

For local CosyVoice inference without cloud API:

# Create conda environment
conda create -n voice2 python=3.10
conda activate voice2

# Install CosyVoice dependencies
cd CosyVoice
pip install -e .

# Download model weights (CosyVoice2-0.5B)
# Place in: CosyVoice/CosyVoice2-0.5B/pretrained_models/

πŸ“š API Reference

Voice Synthesis

Endpoint Method Description
/api/voice/tts POST Local zero-shot voice cloning
/api/voice/dashscope-tts POST Cloud TTS with preset voices
/api/voice/dashscope-voices GET List available cloud voices
/api/voice/batch-tts POST Batch text-to-speech

Voice Library (DashScope)

Endpoint Method Description
/api/dashscope/voices POST Create custom voice from audio
/api/dashscope/voices GET List all custom voices
/api/dashscope/voices/{voiceId} GET Query voice details
/api/dashscope/voices/{voiceId} PUT Update voice reference audio
/api/dashscope/voices/{voiceId} DELETE Delete custom voice
/api/dashscope/voices/ttsByVoice POST TTS with custom voice ID

Example: Create Custom Voice

curl -X POST "http://localhost:8081/api/dashscope/voices" \
  -F "targetModel=cosyvoice-v2" \
  -F "prefix=myvoice" \
  -F "url=https://your-audio-url.com/sample.wav"

Example: Synthesize Speech

curl -X POST "http://localhost:8081/api/dashscope/voices/ttsByVoice" \
  -d "voiceId=your_voice_id" \
  -d "text=Hello, this is my cloned voice!" \
  --output output.mp3

βš™οΈ Configuration

Backend Configuration (application.properties)

# Server
server.port=8081
server.address=0.0.0.0

# DashScope API (recommended: use environment variable)
# dashscope.api-key=${DASHSCOPE_API_KEY}

# Local CosyVoice Settings
cosy.model-dir=CosyVoice/CosyVoice2-0.5B/pretrained_models/CosyVoice2-0.5B
cosy.code-dir=CosyVoice
cosy.conda-env=voice2
cosy.conda-path=/path/to/conda

# OSS Storage (for audio file hosting)
oss.endpoint=https://oss-cn-hangzhou.aliyuncs.com
oss.bucket=your-bucket-name
oss.access-key-id=${OSS_ACCESS_KEY_ID}
oss.access-key-secret=${OSS_ACCESS_KEY_SECRET}

Environment Variables

Variable Required Description
DASHSCOPE_API_KEY Yes Your DashScope API key
OSS_ACCESS_KEY_ID Optional Alibaba OSS access key
OSS_ACCESS_KEY_SECRET Optional Alibaba OSS secret key

πŸ› οΈ Development

Project Structure

VoicePrint/
β”œβ”€β”€ app/                    # Android Compose native app
β”œβ”€β”€ backend/                # Spring Boot backend
β”‚   β”œβ”€β”€ src/main/java/
β”‚   β”‚   └── com/example/backend/
β”‚   β”‚       β”œβ”€β”€ api/        # REST Controllers
β”‚   β”‚       β”œβ”€β”€ configures/     # Configuration classes
β”‚   β”‚       └── service/    # Business logic
β”‚   └── scripts/
β”‚       └── cosy_cli.py     # Local inference script
β”œβ”€β”€ flutter_frontend/       # Flutter cross-platform app
β”‚   └── lib/
β”‚       β”œβ”€β”€ screens/        # UI screens
β”‚       β”œβ”€β”€ services/       # API & audio services
β”‚       └── widgets/        # Reusable components
β”œβ”€β”€ CosyVoice/             # CosyVoice model & code
└── figures/                   # Documentation images

Build Commands

# Build everything
./gradlew buildAll

# Backend only
./gradlew backend:build
./gradlew backend:bootRun

# Android app
./gradlew :app:assembleDebug

# Flutter app
cd flutter_frontend && flutter build apk

Testing

# Backend tests
./gradlew backend:test

# Flutter tests
cd flutter_frontend && flutter test

πŸ”§ Tech Stack

Backend

  • Framework: Spring Boot 3.3.4
  • Language: Java 17 (JDK 21 toolchain)
  • APIs: DashScope SDK, Alibaba OSS SDK
  • Documentation: SpringDoc OpenAPI (Swagger)

Frontend

  • Flutter: Provider state management, Dio HTTP client
  • Android: Kotlin + Jetpack Compose
  • Audio: just_audio, flutter_sound

AI/ML

  • Cloud: Alibaba DashScope CosyVoice-v2
  • Local: CosyVoice2-0.5B (PyTorch)

🀝 Contributing

We welcome contributions! Here's how you can help:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Guidelines

  • Follow existing code style and conventions
  • Add tests for new features
  • Update documentation as needed
  • Keep commits atomic and well-described

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


πŸ™ Acknowledgments


πŸ“ž Support


⭐ Star this repo if you find it useful! ⭐

Made with ❀️ by VoicePrint Team

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors