Voice Cloning with Coqui XTTS-v2

Overview

This project involves the development of a voice cloning application using the XTTS-v2 model from Coqui.ai. Leveraging advanced AI models and machine learning algorithms, this application can generate highly personalized and natural-sounding synthetic voices from user-provided samples. The project demonstrates the ability to produce accurate voice replicas and highlights the potential applications and ethical considerations of voice cloning technology.

Features

Upload Audio Samples: Users can upload an audio file (in .wav format) containing the sample voice to be cloned.
Text-to-Speech: Enter a string of text to be synthesized into speech using the cloned voice.
Language Selection: Choose the language for the synthesized speech from a drop-down menu.
Voice Selection: Select one of the uploaded audio files for cloning.
High-Quality Output: Generates audio files in .wav format that accurately reproduce the entered text using the cloned voice.

Architecture

The application operates on a two-tier architecture comprising a backend server and a frontend interface. The backend, developed with Python and a series of libraries, handles the processing and synthesis tasks, while the frontend, built with Streamlit, provides a user-friendly interface for interactions.

Installation

Ensure to have Python 9, ffmpeg and a CUDA version of PyTorch installed

To set up the project locally, follow these steps:

Clone the repository:

git clone https://github.com/edoardotavassi/ACA_project.git  
cd ACA_project

Create and activate a virtual environment:

python3 -m venv venv  
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```

Usage

Start the system:
```
./start.sh
```
Stop the system:
```
./stop.sh
```

Ethical Considerations

The development and deployment of voice cloning technology come with significant ethical considerations, including:

Informed Consent: Ensure that individuals fully understand how their voice will be used, stored, and any associated risks before cloning their voice.
Privacy: Implement robust encryption protocols to protect voice data both at rest and in transit, and store data in secure environments with restricted access.
Transparency: Clearly inform users when they are interacting with synthetic voices.
Preventing Misuse: Develop guidelines and regulations to detect synthetic voices and penalize misuse, promoting international cooperation to address global threats.

Future Developments

Key areas for future development include:

Distributed Parallel Computation: Implementing technologies like Celery to enable distributed task queues for parallel processing, improving scalability and reducing processing times.
Speech-to-Text Integration: Adding a speech-to-text channel within the application framework to allow users to convert spoken input directly into text.
Optimization: Enhancing the model’s ability to handle a broader range of accents and languages, and optimizing processing times.

Contributors

Edoardo Toma Tavassi
Alessandro Troiano

Acknowledgments

This project was developed as part of the Advanced Computer Architectures course under the supervision of Prof. Christian Pilato, Academic Year 2023-24.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
backend		backend
test		test
ui		ui
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
start.sh		start.sh
stop.sh		stop.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Cloning with Coqui XTTS-v2

Overview

Features

Architecture

Installation

Usage

Ethical Considerations

Future Developments

Contributors

Acknowledgments

License

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Cloning with Coqui XTTS-v2

Overview

Features

Architecture

Installation

Usage

Ethical Considerations

Future Developments

Contributors

Acknowledgments

License

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages