-
Notifications
You must be signed in to change notification settings - Fork 8
Grad scaling parity between both pp and non-pp #251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
36903f2 to
7a25b4c
Compare
| if grad_to_accumulate is not None: | ||
| if unsharded_grad is None: | ||
| unsharded_grad = grad_to_accumulate | ||
| for i in range(len(unsharded_grads)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer needed after rebase
| n_microbatches=n_microbatches, | ||
| loss_fn=loss_fn, | ||
| backward_requires_autograd=backward_requires_autograd, | ||
| scale_grads=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you proposing to land this change, or just using it while doing numerics tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll change this to only be for numerics test mode maybe? it's either turn this off or emulate the scaling for non-pp
6e72707 to
b8546e1
Compare
|
folded into previous pr |
Stacked PRs:
Grad scaling parity between both pp and non-pp