VAT-SS: Investigation of Speech Separation Models Including Video Source

About • Installation • How To Extract Video Embeddings • How To Train • How To Evaluate • Credits • License

About

VAT-SS is Andrey-Vera-Teasgen-SpeechSeparation models family. This repository allows user to train and evaluate mentioned in report SS models.

Pay attention that in all configs base model is state-of-the-art DPTN-AV-repack-by-teasgen, but you may use other SS models reported in the paper additionally (take a look at other configs)

Installation

Follow these steps to install the project:

(Optional) Create and activate new environment using conda.

# create env
conda create -n project_env python=3.10

# activate env
conda activate project_env

Install all required packages:
```
pip install -r requirements.txt
```
Install pre-commit:
```
pre-commit install
```

How To Extract Video Embeddings

This section is mandatory for running Train and Evaluation script for Audio-Video models. Preliminary video embeddings extraction is necessary for speed up forward time.

bash download_lipreader.sh

python make_embeddings.py \
    --cfg_path src/lipreader/configs/lrw_resnet18_mstcn.json \
    --lipreader_path lrw_resnet18_mstcn_video.pth \
    --mouths_dir mouths \
    --embeds_dir embeddings

The embeddings will be saved to --embeds_dir. Please set correct path to your directory in all Hydra configs at Datasets level

How To Train

You should have single A100-80gb GPU to exactly reproduce training, otherwise please implement and use gradient accumulation

To train a model, run the following commands and register in WandB:

Two-steps training:

python3 train.py -cn dptn_wav_av.yaml dataloader.batch_size=16 writer.run_name=av_dptn_wav_av_v1_video_tanh_gate

Moreover, training logs are available in WandBs

DPRNN & DPTN https://wandb.ai/teasgen/ss/overview
ConvTasNet https://wandb.ai/verabuylova-nes/ss/overview
VoiceFilter & RTFS https://wandb.ai/aapetukhov-new-economic-school/ss?nw=nwuseraapetukhov

How To Evaluate

All generated texts will be saved into data/saved/inferenced/<dataset part> directory with corresponing names. Download SOTA pretrained model using

wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1egOSgh3qaADxWpxd379nmhLrfZ-5xYEf' -O ./model.tar
tar xvf ./model.tar

To run inference and calculate metrics, provide custom dataset and run:

python3 inference.py -cn inference_dptn_av.yaml dataloader.batch_size=32 inferencer.from_pretrained=model_best.pth

Set dataloader.batch_size not more than len(dataset)

In case you don't have GT please change device_tensors in inference_dptn_av.yaml config to device_tensors: ["mix_spectrogram", "mix", "s1_embedding", "s2_embedding"], following that metrics won't be calculated and only predictions will be saved.

To evaluate the computational performance of the model, run:

python3 profiler.py

Credits

This repository is based on a PyTorch Project Template.

Name		Name	Last commit message	Last commit date
Latest commit History 205 Commits
src		src
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
download_lipreader.sh		download_lipreader.sh
inference.py		inference.py
make_embeddings.py		make_embeddings.py
profiler.py		profiler.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VAT-SS: Investigation of Speech Separation Models Including Video Source

About

Installation

How To Extract Video Embeddings

How To Train

How To Evaluate

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VAT-SS: Investigation of Speech Separation Models Including Video Source

About

Installation

How To Extract Video Embeddings

How To Train

How To Evaluate

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages