-
-
Notifications
You must be signed in to change notification settings - Fork 654
Add SAVED_CHECKPOINT event to Checkpoint handler #3440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SAVED_CHECKPOINT event to Checkpoint handler #3440
Conversation
@JeevanChevula thanks for the PR. However, let's rework the API of the new feature you are working on:
# checkpoint.py
class CheckpointEvents(EventEnum):
SAVED_CHECKPOINT = "saved_checkpoint"
class Checkpoint(...):
SAVED_CHECKPOINT = CheckpointEvents.SAVED_CHECKPOINT
...
from ignite.engine import Engine, Events
from ignite.handlers import Checkpoint, global_step_from_engine
trainer = ...
evaluator = ...
# Setup Accuracy metric computation on evaluator.
# evaluator.state.metrics contain 'accuracy',
# which will be used to define ``score_function`` automatically.
# Run evaluation on epoch completed event
# ...
to_save = {'model': model}
handler = Checkpoint(
to_save, '/tmp/models',
n_saved=2, filename_prefix='best',
score_name="accuracy",
global_step_transform=global_step_from_engine(trainer)
)
evaluator.add_event_handler(Events.COMPLETED, handler)
# ---- New API with Checkpoint.SAVED_CHECKPOINT event: -----
@evaluator.on(Checkpoint.SAVED_CHECKPOINT)
def notify_when_saved(eval_engine, chkpt_handler): # we should pass to the attached handlers the engine and the checkpoint instance.
assert eval_engine is engine
assert chkpt_handler is handler
print("Saved checkpoint:", chkpt_handler.last_checkpoint)
# ---- End of New API with Checkpoint.SAVED_CHECKPOINT event: -----
trainer.run(data_loader, max_epochs=10)
> ["best_model_9_accuracy=0.77.pt", "best_model_10_accuracy=0.78.pt", ] Let me know what do you think? |
Thanks for the suggestion . I’ll try to work on updating the PR to follow the API approach you mentioned with |
Implementation Note: Implemented EventEnum-based SAVED_CHECKPOINT event as requested. However, Ignite's event system only supports single-parameter handlers - the originally requested two-parameter signature (handler(engine, checkpoint_handler)) failed during event firing and registration. Current implementation uses single parameter with checkpoint access via engine._current_checkpoint_handler. All 61 core tests pass, confirming functionality works without breaking existing features. The 3 distributed test errors are pre-existing infrastructure issues unrelated to this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this PR @JeevanChevula
I left few more comments to improve the PR
Pushing current implementation with working SAVED_CHECKPOINT event functionality. Will add proper Google-style docstrings with version directives by Monday per contributing guidelines |
@JeevanChevula please rebase your PR branch, you have now some extra commits |
d500fc8
to
d81faa9
Compare
d81faa9
to
fe4942d
Compare
fe4942d
to
25e6adf
Compare
Updated docstring for CheckpointEvents class to clarify event trigger.
@JeevanChevula here is how docs are rendered for this PR: https://deploy-preview-3440--pytorch-ignite-preview.netlify.app/handlers Thanks for all the updates you made recently. The last thing I think we still have to do is to reorganize a bit the docs on CheckpointEvents. The example you wrote in handlers.rst: https://github.com/pytorch/ignite/pull/3440/files#diff-fed20f17bf0c40747938f76730fe5cdc467c919bd185eeb9fe0870b861379681R38 should be moved into |
Hi! Thank you for the detailed feedback on the documentation reorganization I want to make sure I implement the changes exactly as you envision them. Three specific questions for clarification:
3.SAVED_CHECKPOINT attribute documentation
I’d like to follow your vision precisely, so confirming these points will help me make the changes correctly. |
Here are details on your questions: Point 1: -> After the last existing example (the “Customise the save_handler” section) but before the .. versionchanged:: notes Point 2: -> Remove the entire section completely (since the example will be moved to the docstring)? Point 3: -> Add it in an Attributes: section inside the Checkpoint docstring
Make sure to check rendered docs (on netlify: https://deploy-preview-3440--pytorch-ignite-preview.netlify.app/) |
…VED_CHECKPOINT attribute
Hi! I’ve applied the requested changes:
On my Windows machine I wasn’t able to fully render the docs locally due to a Sphinx subprocess issue in my environment. I may be missing a dependency—I'll revisit this when I’m back. In the meantime, the Netlify preview should reflect these changes; if anything doesn’t look right, I’m happy to adjust. I’ll be OOO ( Sep 26 – Oct 5, IST]. Please feel free to leave comments or push small cleanups; I’ll pick up any remaining fixes as soon as I return. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @JeevanChevula !
Fixes #934
This PR adds a "saved_checkpoint" event that fires after successful checkpoint saves.
Usage: