Skip to content

Conversation

@YongzhongYang
Copy link

Summary:
We identified this potential bug during debugging issue reported in https://fb.workplace.com/groups/755371733754414/permalink/833999072850393/

Fixed a bug in _fuse_input_dist_splits where names with no valid process
group (pg=None) were being added to names_per_pg[None]. This would cause
issues downstream when trying to create FusedKJTListSplitsAwaitable with
a None process group.

The issue occurred when:

  1. A request is of type KJTListSplitsAwaitable
  2. None of its awaitables are of type KJTSplitsAllToAllMeta
  3. This leaves pg = None (line 207)
  4. The name was still appended to names_per_pg[None] (line 213)

The fix adds a check to only append names when pg is not None, ensuring
that only requests with valid process groups are included in the fused
operations.

Why this matters:

  • Prevents passing pg=None to FusedKJTListSplitsAwaitable (line 232)
  • Ensures only valid distributed operations are fused together
  • Avoids potential runtime errors or undefined behavior

Differential Revision: D87110878

Summary:
We identified this potential bug during debugging issue reported in https://fb.workplace.com/groups/755371733754414/permalink/833999072850393/

Fixed a bug in `_fuse_input_dist_splits` where names with no valid process
group (pg=None) were being added to `names_per_pg[None]`. This would cause
issues downstream when trying to create `FusedKJTListSplitsAwaitable` with
a None process group.

The issue occurred when:
1. A request is of type `KJTListSplitsAwaitable`
2. None of its awaitables are of type `KJTSplitsAllToAllMeta`
3. This leaves `pg = None` (line 207)
4. The name was still appended to `names_per_pg[None]` (line 213)

The fix adds a check to only append names when `pg is not None`, ensuring
that only requests with valid process groups are included in the fused
operations.

Why this matters:
- Prevents passing `pg=None` to `FusedKJTListSplitsAwaitable` (line 232)
- Ensures only valid distributed operations are fused together
- Avoids potential runtime errors or undefined behavior

Differential Revision: D87110878
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 14, 2025

@YongzhongYang has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87110878.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant