Can't reproduce the redult of the paper

Hi, team.
I am very greatful you provide the code and data splits for your CPC audio paper "https: //arXiv.org/abs/2002.02848". 

First I tried to pretrain Mod. CPC on libri-100 and frozen the features for common voice 1-hour ASR task, I got avg per of 45.2% on 5 languages (es, fr, it, ru, tt), which is reported as 43.9% in your paper (Table 3), I think my results is close (-1.3%) to what you reported, which seemed reasonable.

But when I test the pre-trained features on 5-hour common voice ASR tasks (es, fr, it, ru, tt), I just got a avg per (frozen features) of **42.5%**, which had a big gap (-3.7%) with the reported per (**38.8%**, Table 5 in paper); when finetuning features, the gap was even bigger, the avg per was **37.2%** (in the paper it is reported as **31.0%**).
Unfortunately, the 5-hour common voice ASR experiments also perform badly when training from scratch, a avg per of **43.2%**, far behind **38.3%** reported in your paper.

I will be very thankful if you kindly provide more detailed hyper-parameters to help me reproduce your results. 
Especially, I noticed you have set a optional argument **--LSTM** in ./eval/common_voices_eval.py to add a LSTM layer before the linear softmax layer. I think it would significantly increase the model capacity and may lead to better performance, did you use it?
Thnk you very much!

For now I used the default hyper-parameters on common voice ASR transfer experiments:
--batchSize 8
--lr 2e-4
--nEpoch 30
--kernelSize 8
......


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't reproduce the redult of the paper #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can't reproduce the redult of the paper #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions