Skip to content

vbuylova/speech_separation

 
 

Repository files navigation

VAT-SS: Investigation of Speech Separation Models Including Video Source

[🔥 VAT-SS Report]

AboutInstallationHow To Extract Video EmbeddingsHow To TrainHow To EvaluateCreditsLicense

About

VAT-SS is Andrey-Vera-Teasgen-SpeechSeparation models family. This repository allows user to train and evaluate mentioned in report SS models.

Pay attention that in all configs base model is state-of-the-art DPTN-AV-repack-by-teasgen, but you may use other SS models reported in the paper additionally (take a look at other configs)

Installation

Follow these steps to install the project:

  1. (Optional) Create and activate new environment using conda.

    # create env
    conda create -n project_env python=3.10
    
    # activate env
    conda activate project_env
  2. Install all required packages:

    pip install -r requirements.txt
  3. Install pre-commit:

    pre-commit install

How To Extract Video Embeddings

This section is mandatory for running Train and Evaluation script for Audio-Video models. Preliminary video embeddings extraction is necessary for speed up forward time.

bash download_lipreader.sh

python make_embeddings.py \
    --cfg_path src/lipreader/configs/lrw_resnet18_mstcn.json \
    --lipreader_path lrw_resnet18_mstcn_video.pth \
    --mouths_dir mouths \
    --embeds_dir embeddings

The embeddings will be saved to --embeds_dir. Please set correct path to your directory in all Hydra configs at Datasets level

How To Train

You should have single A100-80gb GPU to exactly reproduce training, otherwise please implement and use gradient accumulation

To train a model, run the following commands and register in WandB:

Two-steps training:

python3 train.py -cn dptn_wav_av.yaml dataloader.batch_size=16 writer.run_name=av_dptn_wav_av_v1_video_tanh_gate

Moreover, training logs are available in WandBs

How To Evaluate

All generated texts will be saved into data/saved/inferenced/<dataset part> directory with corresponing names. Download SOTA pretrained model using

wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1egOSgh3qaADxWpxd379nmhLrfZ-5xYEf' -O ./model.tar
tar xvf ./model.tar

To run inference and calculate metrics, provide custom dataset and run:

python3 inference.py -cn inference_dptn_av.yaml dataloader.batch_size=32 inferencer.from_pretrained=model_best.pth

Set dataloader.batch_size not more than len(dataset)

In case you don't have GT please change device_tensors in inference_dptn_av.yaml config to device_tensors: ["mix_spectrogram", "mix", "s1_embedding", "s2_embedding"], following that metrics won't be calculated and only predictions will be saved.

To evaluate the computational performance of the model, run:

python3 profiler.py

Credits

This repository is based on a PyTorch Project Template.

License

License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 90.2%
  • TeX 9.6%
  • Shell 0.2%