Professional AI Voice Cloning & Synthesis Platform
English | δΈζζζ‘£
Clone any voice with just a few seconds of audio
- Features
- Architecture
- Screenshots
- Quick Start
- API Reference
- Configuration
- Development
- Tech Stack
- Contributing
- License
| Feature | Description |
|---|---|
| Zero-Shot Voice Cloning | Clone any voice with just 3-10 seconds of reference audio |
| Cloud TTS Synthesis | High-quality text-to-speech powered by DashScope CosyVoice-v2 |
| Local Inference | On-premise voice synthesis with CosyVoice2-0.5B model |
| Voice Library | Create, manage, and organize your custom voice profiles |
| Real-time Preview | Instant audio preview before downloading |
| Multi-Platform | Flutter mobile app + Android native + Web interface |
- π Fast Cloning - Create custom voices in seconds, not hours
- π¨ High Fidelity - Natural-sounding synthesis with emotion preservation
- βοΈ Hybrid Architecture - Seamlessly switch between cloud and local inference
- π± Cross-Platform - One codebase, multiple platforms
- π Privacy First - Local inference option keeps your data on-premise
- π οΈ Developer Friendly - RESTful APIs with Swagger documentation
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client Layer β
βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ¬ββββββββββββββββββββββββββ€
β Flutter β Android β Web β API Clients β
β App β Compose β Frontend β (REST/SDK) β
ββββββββ¬βββββββ΄βββββββ¬βββββββ΄βββββββ¬βββββββ΄βββββββββββ¬βββββββββββββββ
β β β β
βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Spring Boot Backend β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
β β Voice β β DashScope β β Voice Library β β
β β Synthesis β β Voice β β Management β β
β β Controller β β Controller β β Controller β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ βββββββββββββ¬ββββββββββββββ β
β β β β β
β ββββββββ΄βββββββββββββββββ΄ββββββββββββββββββββββ΄βββββββββββββββ β
β β Service Layer β β
β ββββββββββββββ¬ββββββββββββββ¬βββββββββββββββ¬ββββββββββββββββββ€ β
β β CosyVoice β DashScope β DashScope β OSS Upload β β
β β Service β TTS Service β Voice Serviceβ Service β β
β βββββββ¬βββββββ΄βββββββ¬βββββββ΄ββββββββ¬βββββββ΄βββββββββ¬βββββββββ β
ββββββββββΌββββββββββββββΌβββββββββββββββΌββββββββββββββββΌβββββββββββ
β β β β
βΌ βΌ βΌ βΌ
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
β CosyVoice β β DashScope β β DashScope β β Alibaba β
β Python β β TTS API β β Voice API β β OSS β
β Local β β (Cloud) β β (Clone) β β Storage β
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
-
Voice Cloning Flow
Audio Upload β OSS Storage β DashScope Voice API β Voice ID Created -
TTS Synthesis Flow
Text + Voice ID β DashScope/Local Model β Audio Stream β Client
- JDK 21 (for Spring Boot backend)
- Python 3.10+ with Conda (for local CosyVoice inference)
- Flutter 3.10+ (for mobile app)
- DashScope API Key (get one at DashScope Console)
git clone https://github.com/your-username/VoicePrint.git
cd VoicePrint# Set your DashScope API key
export DASHSCOPE_API_KEY=your_api_key_here
# Run the Spring Boot backend
./gradlew backend:bootRunThe backend will start at http://localhost:8081
π Swagger UI available at:
http://localhost:8081/swagger-ui.html
cd flutter_frontend
flutter pub get
flutter runFor local CosyVoice inference without cloud API:
# Create conda environment
conda create -n voice2 python=3.10
conda activate voice2
# Install CosyVoice dependencies
cd CosyVoice
pip install -e .
# Download model weights (CosyVoice2-0.5B)
# Place in: CosyVoice/CosyVoice2-0.5B/pretrained_models/| Endpoint | Method | Description |
|---|---|---|
/api/voice/tts |
POST | Local zero-shot voice cloning |
/api/voice/dashscope-tts |
POST | Cloud TTS with preset voices |
/api/voice/dashscope-voices |
GET | List available cloud voices |
/api/voice/batch-tts |
POST | Batch text-to-speech |
| Endpoint | Method | Description |
|---|---|---|
/api/dashscope/voices |
POST | Create custom voice from audio |
/api/dashscope/voices |
GET | List all custom voices |
/api/dashscope/voices/{voiceId} |
GET | Query voice details |
/api/dashscope/voices/{voiceId} |
PUT | Update voice reference audio |
/api/dashscope/voices/{voiceId} |
DELETE | Delete custom voice |
/api/dashscope/voices/ttsByVoice |
POST | TTS with custom voice ID |
curl -X POST "http://localhost:8081/api/dashscope/voices" \
-F "targetModel=cosyvoice-v2" \
-F "prefix=myvoice" \
-F "url=https://your-audio-url.com/sample.wav"curl -X POST "http://localhost:8081/api/dashscope/voices/ttsByVoice" \
-d "voiceId=your_voice_id" \
-d "text=Hello, this is my cloned voice!" \
--output output.mp3# Server
server.port=8081
server.address=0.0.0.0
# DashScope API (recommended: use environment variable)
# dashscope.api-key=${DASHSCOPE_API_KEY}
# Local CosyVoice Settings
cosy.model-dir=CosyVoice/CosyVoice2-0.5B/pretrained_models/CosyVoice2-0.5B
cosy.code-dir=CosyVoice
cosy.conda-env=voice2
cosy.conda-path=/path/to/conda
# OSS Storage (for audio file hosting)
oss.endpoint=https://oss-cn-hangzhou.aliyuncs.com
oss.bucket=your-bucket-name
oss.access-key-id=${OSS_ACCESS_KEY_ID}
oss.access-key-secret=${OSS_ACCESS_KEY_SECRET}| Variable | Required | Description |
|---|---|---|
DASHSCOPE_API_KEY |
Yes | Your DashScope API key |
OSS_ACCESS_KEY_ID |
Optional | Alibaba OSS access key |
OSS_ACCESS_KEY_SECRET |
Optional | Alibaba OSS secret key |
VoicePrint/
βββ app/ # Android Compose native app
βββ backend/ # Spring Boot backend
β βββ src/main/java/
β β βββ com/example/backend/
β β βββ api/ # REST Controllers
β β βββ configures/ # Configuration classes
β β βββ service/ # Business logic
β βββ scripts/
β βββ cosy_cli.py # Local inference script
βββ flutter_frontend/ # Flutter cross-platform app
β βββ lib/
β βββ screens/ # UI screens
β βββ services/ # API & audio services
β βββ widgets/ # Reusable components
βββ CosyVoice/ # CosyVoice model & code
βββ figures/ # Documentation images
# Build everything
./gradlew buildAll
# Backend only
./gradlew backend:build
./gradlew backend:bootRun
# Android app
./gradlew :app:assembleDebug
# Flutter app
cd flutter_frontend && flutter build apk# Backend tests
./gradlew backend:test
# Flutter tests
cd flutter_frontend && flutter test- Framework: Spring Boot 3.3.4
- Language: Java 17 (JDK 21 toolchain)
- APIs: DashScope SDK, Alibaba OSS SDK
- Documentation: SpringDoc OpenAPI (Swagger)
- Flutter: Provider state management, Dio HTTP client
- Android: Kotlin + Jetpack Compose
- Audio: just_audio, flutter_sound
- Cloud: Alibaba DashScope CosyVoice-v2
- Local: CosyVoice2-0.5B (PyTorch)
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow existing code style and conventions
- Add tests for new features
- Update documentation as needed
- Keep commits atomic and well-described
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- CosyVoice - Open-source voice synthesis model
- DashScope - Cloud AI services by Alibaba
- Spring Boot - Java backend framework
- Flutter - Cross-platform UI framework
- π Documentation
- π Report Issues
- π¬ Discussions
β Star this repo if you find it useful! β
Made with β€οΈ by VoicePrint Team




