🎙️ VoicePrint

Professional AI Voice Cloning & Synthesis Platform

English | 中文文档

Clone any voice with just a few seconds of audio

📖 Table of Contents

Features
Architecture
Screenshots
Quick Start
API Reference
Configuration
Development
Tech Stack
Contributing
License

✨ Features

🎯 Core Capabilities

Feature	Description
Zero-Shot Voice Cloning	Clone any voice with just 3-10 seconds of reference audio
Cloud TTS Synthesis	High-quality text-to-speech powered by DashScope CosyVoice-v2
Local Inference	On-premise voice synthesis with CosyVoice2-0.5B model
Voice Library	Create, manage, and organize your custom voice profiles
Real-time Preview	Instant audio preview before downloading
Multi-Platform	Flutter mobile app + Android native + Web interface

🔥 Why VoicePrint?

🚀 Fast Cloning - Create custom voices in seconds, not hours
🎨 High Fidelity - Natural-sounding synthesis with emotion preservation
☁️ Hybrid Architecture - Seamlessly switch between cloud and local inference
📱 Cross-Platform - One codebase, multiple platforms
🔒 Privacy First - Local inference option keeps your data on-premise
🛠️ Developer Friendly - RESTful APIs with Swagger documentation

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Client Layer                              │
├─────────────┬─────────────┬─────────────┬─────────────────────────┤
│  Flutter    │   Android   │    Web      │      API Clients        │
│    App      │   Compose   │  Frontend   │      (REST/SDK)         │
└──────┬──────┴──────┬──────┴──────┬──────┴──────────┬──────────────┘
       │             │             │                 │
       └─────────────┴─────────────┴─────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Spring Boot Backend                           │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │   Voice     │  │  DashScope  │  │     Voice Library       │  │
│  │ Synthesis   │  │    Voice    │  │      Management         │  │
│  │ Controller  │  │ Controller  │  │      Controller         │  │
│  └──────┬──────┘  └──────┬──────┘  └───────────┬─────────────┘  │
│         │                │                     │                │
│  ┌──────┴────────────────┴─────────────────────┴──────────────┐ │
│  │                    Service Layer                            │ │
│  ├────────────┬─────────────┬──────────────┬─────────────────┤ │
│  │ CosyVoice  │ DashScope   │ DashScope    │    OSS Upload   │ │
│  │  Service   │ TTS Service │ Voice Service│    Service      │ │
│  └─────┬──────┴──────┬──────┴───────┬──────┴────────┬────────┘ │
└────────┼─────────────┼──────────────┼───────────────┼──────────┘
         │             │              │               │
         ▼             ▼              ▼               ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│  CosyVoice  │ │  DashScope  │ │  DashScope  │ │  Alibaba    │
│   Python    │ │  TTS API    │ │  Voice API  │ │    OSS      │
│   Local     │ │  (Cloud)    │ │  (Clone)    │ │  Storage    │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘

🔄 Data Flow

Voice Cloning Flow

Audio Upload → OSS Storage → DashScope Voice API → Voice ID Created

TTS Synthesis Flow

Text + Voice ID → DashScope/Local Model → Audio Stream → Client

📸 Screenshots

Home	Voice Clone	TTS Synthesis
Voice Library	Profile	Settings

🚀 Quick Start

Prerequisites

JDK 21 (for Spring Boot backend)
Python 3.10+ with Conda (for local CosyVoice inference)
Flutter 3.10+ (for mobile app)
DashScope API Key (get one at DashScope Console)

1. Clone the Repository

git clone https://github.com/your-username/VoicePrint.git
cd VoicePrint

2. Start the Backend

# Set your DashScope API key
export DASHSCOPE_API_KEY=your_api_key_here

# Run the Spring Boot backend
./gradlew backend:bootRun

The backend will start at http://localhost:8081

📝 Swagger UI available at: http://localhost:8081/swagger-ui.html

3. Run the Flutter App

cd flutter_frontend
flutter pub get
flutter run

4. (Optional) Setup Local Inference

For local CosyVoice inference without cloud API:

# Create conda environment
conda create -n voice2 python=3.10
conda activate voice2

# Install CosyVoice dependencies
cd CosyVoice
pip install -e .

# Download model weights (CosyVoice2-0.5B)
# Place in: CosyVoice/CosyVoice2-0.5B/pretrained_models/

📚 API Reference

Voice Synthesis

Endpoint	Method	Description
`/api/voice/tts`	POST	Local zero-shot voice cloning
`/api/voice/dashscope-tts`	POST	Cloud TTS with preset voices
`/api/voice/dashscope-voices`	GET	List available cloud voices
`/api/voice/batch-tts`	POST	Batch text-to-speech

Voice Library (DashScope)

Endpoint	Method	Description
`/api/dashscope/voices`	POST	Create custom voice from audio
`/api/dashscope/voices`	GET	List all custom voices
`/api/dashscope/voices/{voiceId}`	GET	Query voice details
`/api/dashscope/voices/{voiceId}`	PUT	Update voice reference audio
`/api/dashscope/voices/{voiceId}`	DELETE	Delete custom voice
`/api/dashscope/voices/ttsByVoice`	POST	TTS with custom voice ID

Example: Create Custom Voice

curl -X POST "http://localhost:8081/api/dashscope/voices" \
  -F "targetModel=cosyvoice-v2" \
  -F "prefix=myvoice" \
  -F "url=https://your-audio-url.com/sample.wav"

Example: Synthesize Speech

curl -X POST "http://localhost:8081/api/dashscope/voices/ttsByVoice" \
  -d "voiceId=your_voice_id" \
  -d "text=Hello, this is my cloned voice!" \
  --output output.mp3

⚙️ Configuration

Backend Configuration (`application.properties`)

# Server
server.port=8081
server.address=0.0.0.0

# DashScope API (recommended: use environment variable)
# dashscope.api-key=${DASHSCOPE_API_KEY}

# Local CosyVoice Settings
cosy.model-dir=CosyVoice/CosyVoice2-0.5B/pretrained_models/CosyVoice2-0.5B
cosy.code-dir=CosyVoice
cosy.conda-env=voice2
cosy.conda-path=/path/to/conda

# OSS Storage (for audio file hosting)
oss.endpoint=https://oss-cn-hangzhou.aliyuncs.com
oss.bucket=your-bucket-name
oss.access-key-id=${OSS_ACCESS_KEY_ID}
oss.access-key-secret=${OSS_ACCESS_KEY_SECRET}

Environment Variables

Variable	Required	Description
`DASHSCOPE_API_KEY`	Yes	Your DashScope API key
`OSS_ACCESS_KEY_ID`	Optional	Alibaba OSS access key
`OSS_ACCESS_KEY_SECRET`	Optional	Alibaba OSS secret key

🛠️ Development

Project Structure

VoicePrint/
├── app/                    # Android Compose native app
├── backend/                # Spring Boot backend
│   ├── src/main/java/
│   │   └── com/example/backend/
│   │       ├── api/        # REST Controllers
│   │       ├── configures/     # Configuration classes
│   │       └── service/    # Business logic
│   └── scripts/
│       └── cosy_cli.py     # Local inference script
├── flutter_frontend/       # Flutter cross-platform app
│   └── lib/
│       ├── screens/        # UI screens
│       ├── services/       # API & audio services
│       └── widgets/        # Reusable components
├── CosyVoice/             # CosyVoice model & code
└── figures/                   # Documentation images

Build Commands

# Build everything
./gradlew buildAll

# Backend only
./gradlew backend:build
./gradlew backend:bootRun

# Android app
./gradlew :app:assembleDebug

# Flutter app
cd flutter_frontend && flutter build apk

Testing

# Backend tests
./gradlew backend:test

# Flutter tests
cd flutter_frontend && flutter test

🔧 Tech Stack

Backend

Framework: Spring Boot 3.3.4
Language: Java 17 (JDK 21 toolchain)
APIs: DashScope SDK, Alibaba OSS SDK
Documentation: SpringDoc OpenAPI (Swagger)

Frontend

Flutter: Provider state management, Dio HTTP client
Android: Kotlin + Jetpack Compose
Audio: just_audio, flutter_sound

AI/ML

Cloud: Alibaba DashScope CosyVoice-v2
Local: CosyVoice2-0.5B (PyTorch)

🤝 Contributing

We welcome contributions! Here's how you can help:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow existing code style and conventions
Add tests for new features
Update documentation as needed
Keep commits atomic and well-described

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

CosyVoice - Open-source voice synthesis model
DashScope - Cloud AI services by Alibaba
Spring Boot - Java backend framework
Flutter - Cross-platform UI framework

📞 Support

⭐ Star this repo if you find it useful! ⭐

Made with ❤️ by VoicePrint Team

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.idea		.idea
.vscode		.vscode
CosyVoice		CosyVoice
app		app
backend		backend
figures		figures
flutter_frontend		flutter_frontend
gradle		gradle
.gitignore		.gitignore
README.md		README.md
build.gradle.kts		build.gradle.kts
framework.html		framework.html
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
local.properties		local.properties
settings.gradle.kts		settings.gradle.kts
test_frontend.html		test_frontend.html
xionger.wav		xionger.wav

Folders and files

Latest commit

History

Repository files navigation

🎙️ VoicePrint

📖 Table of Contents

✨ Features

🎯 Core Capabilities

🔥 Why VoicePrint?

🏗️ Architecture

🔄 Data Flow

📸 Screenshots

🚀 Quick Start

Prerequisites

1. Clone the Repository

2. Start the Backend

3. Run the Flutter App

4. (Optional) Setup Local Inference

📚 API Reference

Voice Synthesis

Voice Library (DashScope)

Example: Create Custom Voice

Example: Synthesize Speech

⚙️ Configuration

Backend Configuration (application.properties)

Environment Variables

🛠️ Development

Project Structure

Build Commands

Testing

🔧 Tech Stack

Backend

Frontend

AI/ML

🤝 Contributing

Development Guidelines

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Backend Configuration (`application.properties`)

Packages