Skip to content

About the Electra paper #128

@lgdgodv

Description

@lgdgodv

On Page 13 of the paper, fine-tuning details part, the paper mentions that
"we searched for the best number of train epochs out of [10, 3] for each task. For SQuAD,
we decreased the number of train epochs to 2 to be consistent with BERT and RoBERTa"

My question is that did Electra use similar early stopping technique (set the training once for 10 epochs, early stop when validation loss stops decreasing) like in RoBERTa? Or you guys set the training for 3,4,5,6,7,8,9,10 epochs separately?

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions