-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Pull requests: Dao-AILab/flash-attention
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Feat([FA4][CUTE DSL]) Add head_dim=256 support (forward + backward)
#2412
opened Mar 30, 2026 by
wangsiyu
Loading…
chore(tests): move benchmarks to benchmarks/cute/ and reduce test prints
#2408
opened Mar 29, 2026 by
NJX-njx
Loading…
3 tasks
fix(flash_fwd_sm90): zero partial V smem to prevent 0*NaN=NaN in PV GEMM
#2407
opened Mar 29, 2026 by
NJX-njx
Loading…
3 tasks
[CuTe, SM120] Forward kernel with optimized TMA path and full features support
#2406
opened Mar 28, 2026 by
sisgrad
Loading…
feat: setup_context for FlashAttnFunc (torch.func.grad)
#2405
opened Mar 28, 2026 by
NJX-njx
Loading…
fix(cute): SM120 forward/bwd and atomic add compatibility
#2404
opened Mar 28, 2026 by
NJX-njx
Loading…
build(windows): MSVC conforming preprocessor for CUDA 13+ and ninja warning
#2403
opened Mar 28, 2026 by
NJX-njx
Loading…
feat(cute): implement softcap backward pass, correct math formula, and resolve JIT cache bug
#2402
opened Mar 28, 2026 by
CaesarG
Loading…
Add FA4 CI: GitHub Actions workflow with Apptainer on B200 runner
#2393
opened Mar 25, 2026 by
Johnsonms
Loading…
[Cute,Sm80,Fwd] Guard kernel body with tile validity check to fix varlen illegal memory access
#2391
opened Mar 25, 2026 by
zhuochenKIDD
Loading…
Add SM80/SM120 block-sparse forward attention support
#2389
opened Mar 25, 2026 by
blake-snc
Loading…
[Cute] Fix multi-GPU support: add torch.cuda.device guards for kernel launches
#2356
opened Mar 16, 2026 by
NJX-njx
Loading…
2 tasks
Fix RuntimeError when out_ is specified in _flash_attn_forward (fix #2073)
#2351
opened Mar 14, 2026 by
NJX-njx
Loading…
rocm/ck: switch FlashAttention to the CK vendored via aiter and fix current CK API integration
#2350
opened Mar 14, 2026 by
eliasmagn
Loading…
Add SM120 TMA forward kernel with warp specialization
#2349
opened Mar 14, 2026 by
blake-snc
Loading…
Add SM120 split-KV (FlashDecoding) with FP32 partial outputs
#2336
opened Mar 12, 2026 by
blake-snc
Loading…
[varlen] add autograd function to zero out nan padding
#2324
opened Mar 10, 2026 by
liangel-02
Loading…
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.