Skip to content
This repository was archived by the owner on Oct 26, 2022. It is now read-only.
This repository was archived by the owner on Oct 26, 2022. It is now read-only.

Training with fconv model converges but not with blstm #125

@patrik-lambert

Description

@patrik-lambert

Hi, I am trying to train an English-Finnish translation engine with a data set in the IT domain (about 800,000 unique sentence pairs, 13 million English words), using 32000 joint BPE operations (vocabulary is 13,500 for English and 22,700 for Finnish). The validation set (2000 sentence pairs) is randomly extracted from the training data (and removed from it).

Using the fconv model, the training completes nicely. The parameters are
-model fconv -nenclayer 10 -nlayer 8 -dropout 0.2 -optim nag -lr 0.25 -clip 0.1 -batchsize 32 -maxbatch 3200 \
-momentum 0.99 -timeavg -bptt 0 -nembed 512 -noutembed 512 -nhid 512

The training ends up with these values:
| checkpoint 018 | epoch 018 | 1004778 updates | s/checkpnt 5190 | words/s 4001 | lr 0.000025 | avg_dict_size 8692.39
| checkpoint 018 | epoch 018 | 1004778 updates | trainloss 1.09 | train ppl 2.13
| checkpoint 018 | epoch 018 | 1004778 updates | validloss 1.43 | valid ppl 2.69 | testloss 3.06 | test ppl 8.32

With the blstm model, I haven't been able to do a proper training. With the parameters suggested in the README the training ends after 2 epochs and a validation set perplexity of 99614929. I have tried different algorithms and learning rates, different number of layers, and in all cases the validation ppl is huge and BLEU scores very low. The lowest validation set ppl at first epoch (6500, but it then increases) is with the following parameters:
-model blstm -dropout 0.3 -optim sgd -lr 0.25 -clip 25 -bptt 25 -nembed 512 -noutembed 512 -nhid 512

Any idea of what could be happening or suggestions? Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions