Skip to content

Fix Windows import errors by guarding torch.distributed with hasattr checks#172

Open
0xDELUXA wants to merge 1 commit intoROCm:main_perffrom
0xDELUXA:add-hasattr-checks-for-win-comp
Open

Fix Windows import errors by guarding torch.distributed with hasattr checks#172
0xDELUXA wants to merge 1 commit intoROCm:main_perffrom
0xDELUXA:add-hasattr-checks-for-win-comp

Conversation

@0xDELUXA
Copy link

@0xDELUXA 0xDELUXA commented Feb 20, 2026

Motivation

On Windows ROCm, torch.distributed does not expose certain internal attributes (e.g. _all_gather_base).
This causes an AttributeError during import in flash_attn.utils.distributed.

Example (when loading a custom node in ComfyUI):

  File "E:\ComfyUI\venv\Lib\site-packages\flash_attn\utils\distributed.py", line 12, in <module>
    torch.distributed.all_gather_into_tensor = torch.distributed._all_gather_base
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'torch.distributed' has no attribute '_all_gather_base'

Cannot import E:\ComfyUI\custom_nodes\RES4LYF module for custom nodes: module 'torch.distributed' has no attribute '_all_gather_base'

This PR guards access to these attributes using hasattr checks, preventing the import failure.

This change is not expected to affect Linux behavior.

Technical Details

Adds hasattr checks before accessing private torch.distributed attributes in flash_attn.utils.distributed.

This improves ROCm compatibility for FlashAttention-2 on Windows.

Specifications:

OS: Windows 11
GPU: AMD Radeon RX 9060 XT (gfx1200)
Python: 3.12.10
PyTorch: 2.12.0a0+rocm7.12.0a20260218 (TheRock)
Triton: 3.6.0a0.post25 (triton-windows)

Test Plan

  • Functional testing on AMD Radeon RX 9060 XT (gfx1200)

  • Verified that FlashAttention-2 loads successfully

  • Ran existing Triton Flash Attention tests

Test Result

All existing Triton Flash Attention tests pass on gfx1200.

Submission Checklist

@0xDELUXA
Copy link
Author

0xDELUXA commented Feb 20, 2026

Just giving this PR a friendly nudge - it’s been hopping from Dao-AILab main branch -> ROCm tridao branch -> ROCm main_perf branch.

I've reached out to TheRock as well, but it seems that, at least for now, they can't build Torch with full distributed support. In the meantime, we do need this change.

This change is important for Windows compatibility, and we would appreciate a review from the ROCm/flash-attention team when convenient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant