-
-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
The Trainer class in fenn/nn/trainers currently provides a clean training loop abstraction, but it does not offer a built-in way to save model checkpoints during training.
Checkpointing is a common requirement for:
- long training runs
- recovering from interruptions
- inspecting intermediate models
- selecting the best-performing epoch
This issue proposes adding a simple, optional checkpoint saving mechanism to the Trainer, without changing default behavior.
Goal
Allow users to optionally enable checkpoint saving by specifying:
- which epochs should be saved
- a base name for the checkpoint files
If not enabled, the Trainer should behave exactly as it does now.
Proposed behavior
Extend the Trainer configuration with optional parameters such as:
checkpoint_epochs: int | None- epochs at which a checkpoint is saved (e.g. 5, 10, 20)
checkpoint_name: str | None- base filename used for checkpoints
When enabled:
- at the end of a specified epoch, the trainer saves:
- model state (
state_dict) - optimizer state (if available)
- current epoch index
- model state (
Example filenames:
checkpoint_name_epoch_10.ptcheckpoint_name_epoch_20.pt
Tasks
- Inspect the current
Trainerimplementation infenn/nn/trainers. - Add checkpoint-related arguments (constructor or config.yaml-based).
- Implement checkpoint saving at the end of training epochs.
- Ensure:
- no checkpoints are written unless explicitly enabled
- existing training workflows remain unaffected
- Add minimal documentation or docstring explaining usage.
Acceptance criteria
- Checkpoint saving is fully optional.
- Users can control:
- which epochs are saved
- the checkpoint file base name
- Trainer behavior is unchanged when checkpointing is disabled.
- Saved checkpoints can be reloaded with standard PyTorch APIs.
- Code remains simple and readable.
Optional (but nice to have)
- resume-from-checkpoint logic
How to contribute
Comment on this issue to claim it, then open a PR with:
- the implementation
- a short usage example (code snippet or docstring)
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed