Skip to content

Conversation

JeevanChevula
Copy link
Contributor

@JeevanChevula JeevanChevula commented Sep 2, 2025

Fixes #934

This PR adds a "saved_checkpoint" event that fires after successful checkpoint saves.

Usage:

# Users need to register the event first
engine.register_events("saved_checkpoint")

@trainer.on("saved_checkpoint")
def after_saving(engine):
    print("Checkpoint saved!")

@github-actions github-actions bot added the module: handlers Core Handlers module label Sep 2, 2025
@JeevanChevula JeevanChevula marked this pull request as ready for review September 2, 2025 06:16
@JeevanChevula JeevanChevula marked this pull request as draft September 3, 2025 06:51
@github-actions github-actions bot added the docs label Sep 3, 2025
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 3, 2025

@JeevanChevula thanks for the PR. However, let's rework the API of the new feature you are working on:

  1. Let's avoid using string type as event name and do not ask users to register the event. This should be done internally by ignite.

  2. Let's define the event in the Checkpoint as following:

# checkpoint.py

class CheckpointEvents(EventEnum):
    SAVED_CHECKPOINT = "saved_checkpoint"


class Checkpoint(...):
    SAVED_CHECKPOINT = CheckpointEvents.SAVED_CHECKPOINT
    ...
  1. finally the usage can be something like this:
from ignite.engine import Engine, Events
from ignite.handlers import Checkpoint, global_step_from_engine

trainer = ...
evaluator = ...
# Setup Accuracy metric computation on evaluator.
# evaluator.state.metrics contain 'accuracy',
# which will be used to define ``score_function`` automatically.
# Run evaluation on epoch completed event
# ...

to_save = {'model': model}
handler = Checkpoint(
    to_save, '/tmp/models',
    n_saved=2, filename_prefix='best',
    score_name="accuracy",
    global_step_transform=global_step_from_engine(trainer)
)

evaluator.add_event_handler(Events.COMPLETED, handler)

# ---- New API with Checkpoint.SAVED_CHECKPOINT event: -----

@evaluator.on(Checkpoint.SAVED_CHECKPOINT)
def notify_when_saved(eval_engine, chkpt_handler):   # we should pass to the attached handlers the engine and the checkpoint instance.
    assert eval_engine is engine
    assert chkpt_handler is handler
    print("Saved checkpoint:", chkpt_handler.last_checkpoint)
    
# ---- End of New API with Checkpoint.SAVED_CHECKPOINT event: -----

trainer.run(data_loader, max_epochs=10)
> ["best_model_9_accuracy=0.77.pt", "best_model_10_accuracy=0.78.pt", ]

Let me know what do you think?

@JeevanChevula
Copy link
Contributor Author

Thanks for the suggestion . I’ll try to work on updating the PR to follow the API approach you mentioned with

@JeevanChevula JeevanChevula marked this pull request as ready for review September 5, 2025 09:58
@JeevanChevula
Copy link
Contributor Author

Implementation Note:

Implemented EventEnum-based SAVED_CHECKPOINT event as requested. However, Ignite's event system only supports single-parameter handlers - the originally requested two-parameter signature (handler(engine, checkpoint_handler)) failed during event firing and registration.

Current implementation uses single parameter with checkpoint access via engine._current_checkpoint_handler. All 61 core tests pass, confirming functionality works without breaking existing features. The 3 distributed test errors are pre-existing infrastructure issues unrelated to this change.

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this PR @JeevanChevula
I left few more comments to improve the PR

@JeevanChevula
Copy link
Contributor Author

Pushing current implementation with working SAVED_CHECKPOINT event functionality. Will add proper Google-style docstrings with version directives by Monday per contributing guidelines

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 12, 2025

@JeevanChevula please rebase your PR branch, you have now some extra commits

@JeevanChevula JeevanChevula force-pushed the add-saved-checkpoint-event branch from d500fc8 to d81faa9 Compare September 18, 2025 11:03
@JeevanChevula JeevanChevula marked this pull request as draft September 18, 2025 12:19
@JeevanChevula JeevanChevula force-pushed the add-saved-checkpoint-event branch from d81faa9 to fe4942d Compare September 19, 2025 05:42
@JeevanChevula JeevanChevula force-pushed the add-saved-checkpoint-event branch from fe4942d to 25e6adf Compare September 19, 2025 12:24
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 24, 2025

@JeevanChevula here is how docs are rendered for this PR: https://deploy-preview-3440--pytorch-ignite-preview.netlify.app/handlers

Thanks for all the updates you made recently. The last thing I think we still have to do is to reorganize a bit the docs on CheckpointEvents.

The example you wrote in handlers.rst: https://github.com/pytorch/ignite/pull/3440/files#diff-fed20f17bf0c40747938f76730fe5cdc467c919bd185eeb9fe0870b861379681R38 should be moved into Checkpoint docstring. And this page: https://deploy-preview-3440--pytorch-ignite-preview.netlify.app/generated/ignite.handlers.checkpoint.checkpoint#ignite.handlers.checkpoint.Checkpoint should mention the attribute Checkpoint.SAVED_CHECKPOINT: e.g. alias of SAVED_CHECKPOINT from :class:~ignite.handlers.checkpoint.CheckpointEvents.

@JeevanChevula
Copy link
Contributor Author

Hi! Thank you for the detailed feedback on the documentation reorganization

I want to make sure I implement the changes exactly as you envision them. Three specific questions for clarification:

  1. Checkpoint class docstring placement (class Checkpoint(Serializable):)
    The Checkpoint class already has a long docstring with multiple Examples: sections.
    Should I add the checkpoint event usage example:

    • After the last existing example (the “Customise the save_handler” section) but before the .. versionchanged:: notes?
    • Or in a different location within the docstring?
  2. handlers.rst reorganization
    For the Checkpoint Events section in handlers.rst, should I:

    • Remove the entire section completely (since the example will be moved to the docstring)?
    • Or keep the heading and basic description, but remove only the detailed usage example?

3.SAVED_CHECKPOINT attribute documentation
For documenting "alias of SAVED_CHECKPOINT from CheckpointEvents", would you prefer that I:

 * Add it in an Attributes: section inside the Checkpoint docstring?
 * Or simply put a short comment/docstring next to the line SAVED_CHECKPOINT = CheckpointEvents.SAVED_CHECKPOINT?
 * Or another location entirely?

I’d like to follow your vision precisely, so confirming these points will help me make the changes correctly.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 24, 2025

Here are details on your questions:

Point 1: -> After the last existing example (the “Customise the save_handler” section) but before the .. versionchanged:: notes

Point 2: -> Remove the entire section completely (since the example will be moved to the docstring)?

Point 3: -> Add it in an Attributes: section inside the Checkpoint docstring

Attributes:
    SAVED_CHECKPOINT: alias of SAVED_CHECKPOINT from :class:~ignite.handlers.checkpoint.CheckpointEvents.

Make sure to check rendered docs (on netlify: https://deploy-preview-3440--pytorch-ignite-preview.netlify.app/)

@JeevanChevula
Copy link
Contributor Author

JeevanChevula commented Sep 25, 2025

Hi!

I’ve applied the requested changes:

  • Moved the checkpoint event usage example into the Checkpoint class docstring (after “Customise the save_handler” and before the .. versionchanged:: notes).

  • Added an Attributes: section documenting SAVED_CHECKPOINT as an alias of :class:~ignite.handlers.checkpoint.CheckpointEvents``.

  • Removed the “Checkpoint Events” section from handlers.rst and kept only the API references.

On my Windows machine I wasn’t able to fully render the docs locally due to a Sphinx subprocess issue in my environment. I may be missing a dependency—I'll revisit this when I’m back. In the meantime, the Netlify preview should reflect these changes; if anything doesn’t look right, I’m happy to adjust.

I’ll be OOO ( Sep 26 – Oct 5, IST]. Please feel free to leave comments or push small cleanups; I’ll pick up any remaining fixes as soon as I return. Thanks!

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @JeevanChevula !

@vfdev-5 vfdev-5 marked this pull request as ready for review September 25, 2025 09:02
@vfdev-5 vfdev-5 enabled auto-merge September 25, 2025 09:02
@vfdev-5 vfdev-5 disabled auto-merge September 25, 2025 10:07
@vfdev-5 vfdev-5 merged commit d54f710 into pytorch:master Sep 25, 2025
19 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci CI docs module: engine Engine module module: handlers Core Handlers module module: metrics Metrics module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add new events
2 participants