This repo is an AI project template intended for AI generative models for music production.
Originally it was intended to be a short project aimed for submission for Neural Audio Plugin Competition 2023.
There are 2 separate git repository for this plugin. The first is for the machine learning code. The second is for the plugin code for running the trained model.
- https://github.com/cpoohee/MLPluginTemplate (this repo, the ML code base)
- https://github.com/cpoohee/NeuralPluginTemplate (Plugin code)
-
The ML code is created to run with Nvidia GPU (cuda) on Ubuntu 20.04 or Apple Silicon hardware (mps) in mind.
- For installing cuda 11.8 drivers, see (https://cloud.google.com/compute/docs/gpus/install-drivers-gpu)
- For Apple Silicon, set
PYTORCH_ENABLE_MPS_FALLBACK=1as your environment variable when running python scripts, as not all pytorch ops are mps compatible.
-
Prepare at least 150GB of free hdd space.
-
The current dataset and cached files size used are about 50 GB. (nus-48e, vocalset, vctk)
-
Install Miniconda. See guide
-
For Apple Silicon, you will need to use Miniconda for mps acceleration. Anaconda is not recommended.
For Ubuntu
sudo apt-get update
sudo apt-get install libsndfile1For Macos
- install brew. See https://brew.sh/
brew install libsndfile- Pytorch with pytorch lightning for machine learning
- Hydra, for configurations
- MLFlow, for experiment tracking
- ONNX, for porting model to C++ runtime
Go to your terminal, create a folder that you would like to clone the repo.
Run the following command in your terminal to clone this repo.
git clone https://github.com/cpoohee/MLPluginTemplateAssuming, miniconda is properly installed, run the following to create the environment
conda env create -f environment.ymlActivate the environment
conda activate waveEdit the scripts in src/download_data.py and target the code to download data to /data/raw folder.
The script will download the raw datasets and saves it onto /data/raw folder.
The script should also transcode the audio to wav format and select useful audio files to the /data/interim folder.
The sample download script is created for downloading the following datasets
- NUS-48E
- VocalSet
- VCTK
To run the download script, go to the project root folder
cd MLPluginTemplateThen, run the download script
python src/download_data.py- in case of failure especially from gdrive links, you try to delete contents of
/data/rawand try again later. the scripts will skip the download if it detects the folder exists.
Edit the download Pre-trained Models script for your needs.
Run the following, and it will download pre-trained models into the /models/pre-trained folder
python src/download_pre-trained_models.py Important!! Current pre-process code assumes an audio input and a labelled target format.
If you have a different input / target format, do modify the src/process_data.py.
This is especially true for audio input and audio output targets.
The command below will pre-process NUS-48E, VCTK and vocalset datasets.
python src/process_data.py process_data/dataset=nus_vocalset_vctkPre-processing will split the dataset into train/validation/test/prediction splits as stored into the ./data/processed folder.
It also slices the audio into 5 sec long clips.
-
process_data/dataset=nusfor pre-processing just the NUS-48E dataset -
process_data/dataset=vctkfor pre-processing just the VCTK dataset -
See the
conf/process_data/process_root.yamlfor more detailed configurations.
Run the following
python src/cache_dataset.py model=autoencoder_speaker dataset=nus_vocalset_vctkIt will cache the downloaded pre-trained speaker encoder's embeddings.
To use cuda (Nvidia)
python src/cache_dataset.py model=autoencoder_speaker dataset=nus_vocalset_vctk process_data.accelerator=cudaor mps (Apple silicon)
python src/cache_dataset.py model=autoencoder_speaker dataset=nus_vocalset_vctk process_data.accelerator=mpsModify the src/datamodule/audio_dataloader.py for your model input/target needs.
Augmentation codes can be found here.
Augmentation parameters should be added/modified in conf/augmentations yaml files.
Add or modify pytorch lightning model codes under src/model.
Add or modify model parameters conf/model yaml files.
Make sure the model input/target format from your dataloader matches your model requirements.
For example, this dataloader has audio x and y, followed by vectors and label.
def _shared_eval_step(self, batch):
x, y, dvecs, name = batch
own_dvec, target_dvec = dvecs
y_pred = self.forward(x, target_dvec)
return y, y_pred, target_dvec, name
python src/train_model.py augmentations=augmentation_enable model=autoencoder_speaker dataset=nus_vocalset_vctk-
See the
conf/training/train.yamlfor more training options to override.- for example, append the parameter
training.batch_size=8to change batch size training.learning_rate=0.0001to change the learning ratetraining.experiment_name="experiment1"to change the model's ckpt filename.training.max_epochs=30to change the number of epochs to train.training.accelerator=mpsfor Apple Silicon hardware
- for example, append the parameter
-
See
conf/model/autoencoder_speaker.yamlfor model specifications to override.
Under the ./outputs/ folder, look for the current experiment's mlruns folder.
e.g. outputs/2023-03-20/20-11-30/mlruns
In your terminal, replace the $PROJECT_ROOT and outputs to your full project path and run the following.
mlflow server --backend-store-uri file:'$PROJECT_ROOT/outputs/2023-03-20/20-11-30/mlruns'By default, you will be able to view the experiment tracking under http://127.0.0.1:5000/ on your browser.
The above is showing the configuration for MLFlow to run on localhost.
- (Optional) it is possible to set up an MLFlow tracking server and configure the tracking uri under
training.tracking_uri. See https://mlflow.org/docs/latest/tracking.html for more info.
Models will be saved into the folders as .ckpt under
$PROJECT_ROOT/outputs/YYYY-MM-DD/HH-MM-SS/models
By default, the model will save a checkpoint at every end of an epoch.
Replace $PATH/TO/MODEL/model.ckpt to the saved model file, and run
python src/test_model.py model=autoencoder_speaker dataset=nus_vocalset_vctk testing.checkpoint_file="$PATH/TO/MODEL/model.ckpt"Edit the model parameter to the model yaml file. (for this case conf/model/autoencoder_speaker.yaml is entered as model=autoencoder_speaker)
See conf/testing/test.yaml for more configurations.
The script will convert the pytorch model into ONNX format, which will be needed for the plugin code.
Replace $PATH/TO/MODEL/model.ckpt to the saved model file,
Replace "./models/onnx/my_model.onnx" to specify the ONNX file path to be saved file, and run
python src/export_model_to_onnx.py export_to_onnx.checkpoint_file="$PATH/TO/MODEL/model.ckpt" export_to_onnx.export_filename="./models/onnx/my_model.onnx"Copy the ONNX file to the C++ plugin code.
download_data.py-> downloads dataset into data/raw, then pick the audio and place into data/interimdownload_pre-trained_models.py-> download pre-trained models into models/pre-trained for later uses.process_data.py-> use the audio from data/interim, process the audio into xx sec blocks, cuts silences and place into data/processedcache_dataset.py-> cache dataset's speech embeddings from wav files.train_model.py-> trains data from data/processed,test_model.py-> test (output as metrics) and do prediction (outputs for listening ) from data/processedexport_model_to_onnx.py-> export model to onnx