Add AMD ROCm support: AITER attention backend + robust imports + docs by ZJLi2013 · Pull Request #3 · H-EmbodVis/HyDRA

ZJLi2013 · 2026-04-01T05:19:33Z

Summary

Enable HyDRA to run efficiently on AMD GPUs (ROCm) with optimized attention backends.

Problem

The current attention import guards use ModuleNotFoundError, which misses partial import failures (e.g. AITER's eager top-level imports). When flash-attn is not installed, the code silently falls back to PyTorch SDPA with no diagnostic logging, making it hard to tell which backend is active.

Changes

diffsynth/models/wan_video_dit.py

Widen exception handling from ModuleNotFoundError → (ImportError, ModuleNotFoundError) for flash_attn, flash_attn_interface, and sageattention imports
Add AMD AITER as an attention backend (via importlib to avoid eager import side-effects), slotted between FA3 and FA2 in the dispatch chain
Log selected attention backend at import time for easier debugging

README.md

Add collapsible AMD ROCm installation guide (PyTorch ROCm, FlashAttention Triton build, AITER CK backend)

Dispatch priority

FA3 → AITER → FA2 → SageAttention → PyTorch SDPA (fallback)

No behavior change for NVIDIA users — AITER is only available on ROCm and gracefully skipped otherwise.

Benchmarks (AMD MI300X, ROCm 6.4, FA2 Triton backend)

Metric	Before (SDPA)	After (FA2 Triton)	Delta
Steady-state step time	~12.4 s/step	~10.0 s/step	-19%
Total inference (4 samples)	~50 min	~43 min	-13%
AITER with CK backend (ROCm 7.x) is expected to provide an additional ~25% speedup over Triton, based on benchmarks from other Wan2.1-based pipelines.

Test plan

Verified on AMD MI300X (gfx942) + ROCm 6.4 + PyTorch 2.9.1
FA2 Triton backend correctly detected and used (confirmed via log output)
Generated videos are visually consistent with SDPA baseline
NVIDIA GPU regression test (no AITER installed → should fall through to existing backends un

3_concat.mp4

Add AMD ROCm support: AITER attention backend + robust imports

3ecb9c7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AMD ROCm support: AITER attention backend + robust imports + docs#3

Add AMD ROCm support: AITER attention backend + robust imports + docs#3
ZJLi2013 wants to merge 1 commit intoH-EmbodVis:mainfrom
ZJLi2013:feat/rocm-flash-attention-support

ZJLi2013 commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZJLi2013 commented Apr 1, 2026

Summary

Problem

Changes

Dispatch priority

Benchmarks (AMD MI300X, ROCm 6.4, FA2 Triton backend)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant