Grad scaling parity between both pp and non-pp #251

xmfan · 2025-11-14T18:07:19Z

Stacked PRs:

Grad scaling parity between both pp and non-pp

stack-info: PR: #246, branch: xmfan/stack/20

stack-info: PR: #251, branch: xmfan/stack/21

wconstab · 2025-11-15T01:54:04Z

autoparallel/graph_pp_runner.py

-        if grad_to_accumulate is not None:
-            if unsharded_grad is None:
-                unsharded_grad = grad_to_accumulate
+    for i in range(len(unsharded_grads)):


what's the difference?

no longer needed after rebase

wconstab · 2025-11-15T01:54:52Z

examples/example_ds3_pp.py

        n_microbatches=n_microbatches,
        loss_fn=loss_fn,
        backward_requires_autograd=backward_requires_autograd,
+        scale_grads=False,


are you proposing to land this change, or just using it while doing numerics tests?

i'll change this to only be for numerics test mode maybe? it's either turn this off or emulate the scaling for non-pp

xmfan · 2025-11-20T00:43:58Z

folded into previous pr

xmfan added 2 commits November 13, 2025 18:25

Compare microbatch forward outputs and gradients

7c45448

stack-info: PR: #246, branch: xmfan/stack/20

Grad scaling parity between both pp and non-pp

7a25b4c

stack-info: PR: #251, branch: xmfan/stack/21

xmfan force-pushed the xmfan/stack/21 branch from 36903f2 to 7a25b4c Compare November 14, 2025 18:07

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 14, 2025

xmfan mentioned this pull request Nov 14, 2025

Compare microbatch forward outputs and gradients #246

Open

wconstab reviewed Nov 15, 2025

View reviewed changes

xmfan force-pushed the xmfan/stack/20 branch 2 times, most recently from 6e72707 to b8546e1 Compare November 20, 2025 00:43

xmfan closed this Nov 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Grad scaling parity between both pp and non-pp #251

Grad scaling parity between both pp and non-pp #251

xmfan commented Nov 14, 2025 •

edited

Loading

Uh oh!

wconstab Nov 15, 2025

Uh oh!

xmfan Nov 19, 2025

Uh oh!

wconstab Nov 15, 2025

Uh oh!

xmfan Nov 19, 2025 •

edited

Loading

Uh oh!

xmfan commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Grad scaling parity between both pp and non-pp #251

Grad scaling parity between both pp and non-pp #251

Conversation

xmfan commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wconstab Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

xmfan Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

wconstab Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

xmfan Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xmfan commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xmfan commented Nov 14, 2025 •

edited

Loading

xmfan Nov 19, 2025 •

edited

Loading