# Training recipes - [x] BVCC (24/05/14) - [x] SOMOS - [x] NISQA - [x] TMHINT-QI - [x] SingMOS (VMC'24 track2) - [x] PSTN - [x] Tencent # Benchmarks - [x] VMC'22 OOD track - having difficulty downloading from BC server now... - [x] VMC'23 - [ ] Zoomed-in BVCC (VMC'24 track1) # Classic models ## Non-intrusive (single-ended) - [x] LDNet https://github.com/unilight/LDNet (24/05/14) - [x] SSL-MOS https://github.com/nii-yamagishilab/mos-finetune-ssl (24/05/22) - [x] UTMOS https://github.com/sarulab-speech/UTMOS22 (24/06/12) - [x] RAMP https://arxiv.org/abs/2308.16488 ###### Do I want to implement these methods? - PAM https://github.com/soham97/PAM - SpeechLMScore https://github.com/soumimaiti/speechlmscore_tool ## Intrusive (double-ended) - [ ] SVSNet https://ieeexplore.ieee.org/document/9716822 # Experimental features ## Output - [x] continuous with L1/L2 loss - [x] discrete (categorial) with cross-entropy loss ## Input feature - [x] semantic SSL - Linguistic representation from ASR - [ ] Whisper PPG - [ ] sxliu PPG - [ ] audio codec - general audio representation - Supported in S3PRL - [x] SSAST - [ ] CLAP https://github.com/microsoft/CLAP - [ ] Audio-MAE https://github.com/facebookresearch/AudioMAE - [ ] BEATS https://github.com/microsoft/unilm/tree/master/beats # Improvements - [x] In training loop, automatically save models with good results on dev set - [x] Model ensemble - [x] Model averaging - [ ] Inference with outside pre-trained models - [x] Upload/download/inference with, models trained in this toolkit (to where? HuggingFace?)
Training recipes
Benchmarks
Classic models
Non-intrusive (single-ended)
Do I want to implement these methods?
Intrusive (double-ended)
Experimental features
Output
Input feature
Improvements