How to reproduce the same training process when using "train_from"

Dear, 

When the model training was forced to stop due to an accident. I use the opt "train_from" to continue training from the checkpoint. But the result is different from the tranining from start to finish without stopping:

1. The stored patience for "early stop" was not saved into checkpoint.
2. The order of data batch provied train_iter is different, when train_from a checkpoint. （When train_from, it starts over from the begining of the dataset and the data are very different from where it stands at the step of saved checkpoint）

Note that i fixed all random seed.

So it is very convenient that If a reproduction mechanism can be added into the code base.
Any help will be greatly appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to reproduce the same training process when using "train_from" #2006

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to reproduce the same training process when using "train_from" #2006

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions