-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
With the keep_checkpoint option we can specify how many checkpoints should be kept. However, the checkpoints are just saved sequentially and never ordered. That means that if your best performing model is early on, it might get removed anyway.
OpenNMT-py/onmt/models/model_saver.py
Lines 79 to 83 in 0734288
| if self.keep_checkpoint > 0: | |
| if len(self.checkpoint_queue) == self.checkpoint_queue.maxlen: | |
| todel = self.checkpoint_queue.popleft() | |
| self._rm_checkpoint(todel) | |
| self.checkpoint_queue.append(chkpt_name) |
As an alternative approach, I would suggest that if validation is done before each save step, that validation loss is also passed to the save method. self.checkpoint_queue could then contain tuples of (loss, chkpt_name) and after each append that queue gets sorted on loss. That way, only the worst performing models are removed.
Things to consider: ModelSaver should then know whether the metric is higher=better or lower=better, and a fallback needs to be in-place when no loss is passed.