Skip to content

Conversation

@josephdviviano
Copy link
Collaborator

  • I've read the .github/CONTRIBUTING.md file
  • My code follows the typing guidelines
  • I've added appropriate tests
  • I've run pre-commit hooks locally

Description

changed performance mode to deterministic mode -- which is off by default, so we can make more effective use of compute by default and only run deterministically when explicitly requested by the user

…ault, so we can make more effective use of compute by default and only run deterministically when explicitly requested by the user
@josephdviviano josephdviviano self-assigned this Nov 30, 2025
Comment on lines +194 to +197
else:
# For GPU training, we can use multiple threads for CPU operations
os.environ["OMP_NUM_THREADS"] = str(num_cpus)
os.environ["MKL_NUM_THREADS"] = str(num_cpus)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this the default behavior? If yes, I think we can remove this else branch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure what is default or not, so I'd like to keep this in for now. It was really hard to nail down the cause of the irreproducibility errors we were seeing before in distributed training.

# Each worker gets a distinct seed in the same pattern used for ranks.
set_seed(base_seed + worker_id, performance_mode=False)
# TODO: Can this be false?
set_seed(base_seed + worker_id, deterministic_mode=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be false by default; as it is the default torch behavior

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed (it is now passed to the function, and default is False)

@josephdviviano josephdviviano merged commit 9c15288 into master Dec 10, 2025
3 checks passed
@josephdviviano josephdviviano deleted the seed_fix branch December 10, 2025 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants