Why is the BLEU obtained from the training model provided much higher than the value on paper?

I download the provided trained model, and test on test dataset, but get much higher BLEU than the values in paper.

I use the scripts  provided, and don't change anything:
```
python preprocess.py \
  --source-lang de \
  --target-lang en \
  --trainpref data/wmt14.en-de/train \
  --validpref data/wmt14.en-de/valid \
  --testpref data/wmt14.en-de/test \
  --destdir output/data-bin/wmt14.de-en \
  --srcdict output/maskPredict_de_en/dict.de.txt \
  --tgtdict output/maskPredict_de_en/dict.en.txt

python generate_cmlm.py output/data-bin/wmt14.${src}-${tgt}  \
    --path ${model_dir}/checkpoint_best.pt \
    --task translation_self \
    --remove-bpe True \
    --max-sentences 20 \
    --decoding-iterations ${iteration} \
    --decoding-strategy mask_predict 
```

I get 34.42 on WMT14 DE->EN, 35.20 on WMT16 EN->RO, 35.62 on WMT RO->EN. These values are much higher than that in origin paper. This is strange, and what happened?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the BLEU obtained from the training model provided much higher than the value on paper? #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why is the BLEU obtained from the training model provided much higher than the value on paper? #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions