About the Electra paper

On Page 13 of the paper, fine-tuning details part, the paper mentions that 
"we searched for the best number of train epochs out of [10, 3] for each task. For SQuAD,
we decreased the number of train epochs to 2 to be consistent with BERT and RoBERTa"

My question is that did Electra use similar early stopping technique (set the training once for 10 epochs, early stop when validation loss stops decreasing) like in RoBERTa?  Or you guys set the training for 3,4,5,6,7,8,9,10 epochs separately?

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the Electra paper #128

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About the Electra paper #128

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions