About • Installation • How To Extract Video Embeddings • How To Train • How To Evaluate • Credits • License
VAT-SS is Andrey-Vera-Teasgen-SpeechSeparation models family. This repository allows user to train and evaluate mentioned in report SS models.
Pay attention that in all configs base model is state-of-the-art DPTN-AV-repack-by-teasgen, but you may use other SS models reported in the paper additionally (take a look at other configs)
Follow these steps to install the project:
-
(Optional) Create and activate new environment using
conda.# create env conda create -n project_env python=3.10 # activate env conda activate project_env
-
Install all required packages:
pip install -r requirements.txt
-
Install
pre-commit:pre-commit install
This section is mandatory for running Train and Evaluation script for Audio-Video models. Preliminary video embeddings extraction is necessary for speed up forward time.
bash download_lipreader.sh
python make_embeddings.py \
--cfg_path src/lipreader/configs/lrw_resnet18_mstcn.json \
--lipreader_path lrw_resnet18_mstcn_video.pth \
--mouths_dir mouths \
--embeds_dir embeddingsThe embeddings will be saved to --embeds_dir. Please set correct path to your directory in all Hydra configs at Datasets level
You should have single A100-80gb GPU to exactly reproduce training, otherwise please implement and use gradient accumulation
To train a model, run the following commands and register in WandB:
Two-steps training:
python3 train.py -cn dptn_wav_av.yaml dataloader.batch_size=16 writer.run_name=av_dptn_wav_av_v1_video_tanh_gateMoreover, training logs are available in WandBs
- DPRNN & DPTN https://wandb.ai/teasgen/ss/overview
- ConvTasNet https://wandb.ai/verabuylova-nes/ss/overview
- VoiceFilter & RTFS https://wandb.ai/aapetukhov-new-economic-school/ss?nw=nwuseraapetukhov
All generated texts will be saved into data/saved/inferenced/<dataset part> directory with corresponing names. Download SOTA pretrained model using
wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1egOSgh3qaADxWpxd379nmhLrfZ-5xYEf' -O ./model.tar
tar xvf ./model.tarTo run inference and calculate metrics, provide custom dataset and run:
python3 inference.py -cn inference_dptn_av.yaml dataloader.batch_size=32 inferencer.from_pretrained=model_best.pthSet dataloader.batch_size not more than len(dataset)
In case you don't have GT please change device_tensors in inference_dptn_av.yaml config to device_tensors: ["mix_spectrogram", "mix", "s1_embedding", "s2_embedding"], following that metrics won't be calculated and only predictions will be saved.
To evaluate the computational performance of the model, run:
python3 profiler.pyThis repository is based on a PyTorch Project Template.