fix(reproducibility): add opt-in strict determinism across trainers by SoheylM · Pull Request #61 · IDEALLab/EngiOpt

SoheylM · 2026-03-05T08:54:15Z

Description

Adds an opt-in strict reproducibility path for all training entrypoints while preserving current default behavior.

Introduces engiopt/reproducibility.py with shared helpers:
- seed_training(seed)
- enable_strict_determinism(warn_only=True)
- make_dataloader_generator(seed)
Adds strict_determinism: bool = False to all targeted training Args dataclasses.
Replaces ad-hoc seeding with seed_training(args.seed) in each training script.
Enables strict deterministic controls only when --strict-determinism is passed.
Hardens DataLoader reproducibility by supplying seeded generators for shuffle=True loaders.
Updates README with optional reproducibility usage (--strict-determinism).

Fixes SOH-14 (Linear)

Type of change

Documentation only change (no code changed)
Bug fix (non-breaking change which fixes an issue)
New algorithm (non-breaking change which adds a new model/algorithm)
Improvement to existing algorithm (non-breaking change which improves functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Screenshots

N/A

Checklist:

Code Quality

I have run the pre-commit checks with pre-commit run --all-files
I have run ruff check . and ruff format
I have run mypy .
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings

CleanRL Philosophy (for new/modified algorithms)

The implementation follows the CleanRL single-file philosophy: all training logic is contained in one file
The code is reproducible: random seeds are set, PyTorch determinism is enabled
Hyperparameters are configurable via command-line arguments using tyro
WandB logging is integrated with --track flag support
The model can be saved and restored via WandB artifacts (--save-model flag)

Algorithm Completeness (for new algorithms)

Both training script (algorithm.py) and evaluation script (evaluate_algorithm.py) are provided
The algorithm works with EngiBench's Problem interface
The algorithm is added to the README table with correct metadata

Documentation

I have made corresponding changes to the documentation
New algorithms include docstrings explaining the approach and any paper references

Validation

Determinism smoke check (cgan_cnn_2d) run twice on same machine with strict mode and fixed seed; resulting checkpoint tensor hashes matched for both generator and discriminator.
Non-strict short run completed successfully (default behavior path unchanged).
Strict mode configured with warn_only=True for nondeterministic ops to warn and continue.

linear · 2026-03-05T08:54:19Z

SOH-14 Non reproducibility in training algorithms

g-braeunlich

Just 2 small suggestions

engiopt/gan_bezier/gan_bezier.py

g-braeunlich · 2026-03-05T10:12:41Z

engiopt/reproducibility.py

+    if torch.cuda.is_available():
+        torch.backends.cuda.matmul.allow_tf32 = False


Not exactly the same logic, but maybe safer?

Suggested change

if torch.cuda.is_available():

torch.backends.cuda.matmul.allow_tf32 = False

torch.backends.cuda.matmul.allow_tf32 = not torch.cuda.is_available()

I kept the current logic intentionally for strict mode semantics. In strict mode we should avoid enabling TF32; not torch.cuda.is_available() would set this flag to True on CPU/MPS runs. Current behavior only disables CUDA matmul TF32 when CUDA exists, while CPU/MPS remain unaffected. If we simplify, I’d still keep the assigned value False and only guard backend availability.

Follow-up: I agree the cleaner framing is backend-capability based. For strict mode, we should keep TF32 disabled (False) whenever the CUDA matmul backend is present, rather than deriving the value from torch.cuda.is_available() (which can be False on CPU/MPS runs and would imply True). Happy to switch to that form for clarity if you prefer.

fix(reproducibility): add opt-in strict determinism across trainers

2d376c5

SoheylM requested a review from g-braeunlich March 5, 2026 09:45

g-braeunlich reviewed Mar 5, 2026

View reviewed changes

refactor(gan_bezier): simplify deterministic dataloader generator wiring

21e760f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(reproducibility): add opt-in strict determinism across trainers#61

fix(reproducibility): add opt-in strict determinism across trainers#61
SoheylM wants to merge 2 commits intomainfrom
codex/soh-14-strict-determinism-hardening

SoheylM commented Mar 5, 2026

Uh oh!

linear bot commented Mar 5, 2026

Uh oh!

g-braeunlich left a comment

Uh oh!

Uh oh!

g-braeunlich Mar 5, 2026

Uh oh!

SoheylM Mar 5, 2026

Uh oh!

SoheylM Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if torch.cuda.is_available():
		torch.backends.cuda.matmul.allow_tf32 = False

Conversation

SoheylM commented Mar 5, 2026

Description

Type of change

Screenshots

Checklist:

Code Quality

CleanRL Philosophy (for new/modified algorithms)

Algorithm Completeness (for new algorithms)

Documentation

Validation

Uh oh!

linear bot commented Mar 5, 2026

Uh oh!

g-braeunlich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

g-braeunlich Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

SoheylM Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

SoheylM Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants