Code for Quantization and Disentanglement for Cross-Modal Alignment in Neural Speech Reconstruction from Brain Activity
Gwilliams et al. Dataset https://osf.io/ag3kj
Armeni et al. Dataset https://data.ru.nl/collections/di/dccn/DSC_3011085.05_995
GigaSpeech Dataset (XS) https://github.com/SpeechColab/GigaSpeech
download ns3_facodec_encoder.bin and ns3_facodec_decoder.bin from FACodec
download pretrained.pth from AudioMAE
download audioldm2-speech-gigaspeech.pth from AudioLDM2
Follow the steps below to set up the virtual environment.
Create and activate the environment:
conda create -n QDBrain python=3.10
conda activate QDBrainInstall dependencies in the listed order:
pip install -r requirements.txtCUDA_VISIBLE_DEVICES=1 python train_proj.pyCUDA_VISIBLE_DEVICES=1 python train_disentangle.py