Dear,
When the model training was forced to stop due to an accident. I use the opt "train_from" to continue training from the checkpoint. But the result is different from the tranining from start to finish without stopping:
- The stored patience for "early stop" was not saved into checkpoint.
- The order of data batch provied train_iter is different, when train_from a checkpoint. (When train_from, it starts over from the begining of the dataset and the data are very different from where it stands at the step of saved checkpoint)
Note that i fixed all random seed.
So it is very convenient that If a reproduction mechanism can be added into the code base.
Any help will be greatly appreciated.