-
-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Context
During recent discussion in Discord there was a question about whether
checkpoint hashing alone is sufficient for verifying the training process.
Before building more complex verification layers, it would be useful to
validate a basic assumption:
If two identical training runs are executed with the same dataset,
configuration, and seed, do they produce identical model checkpoints?
Proposed Experiment
Create a small reproducible experiment that:
- Uses a tiny dataset (e.g., small Wikipedia subset or synthetic data)
- Runs a deterministic training loop twice
- Saves checkpoints from both runs
- Computes SHA-256 hashes of both checkpoints
- Verifies whether the hashes match
Expected Outcome
If hashes match, checkpoint hashing may be sufficient for verifying
training determinism under controlled conditions.
If hashes differ, it indicates hidden sources of entropy in the
training pipeline.
Implementation
The experiment would live under:
experiments/checkpoint_reproducibility/
and would include:
- a minimal deterministic training script
- checkpoint hashing
- a simple reproducibility report
This experiment can serve as the first validation for the
training determinism layer of the project.