[Feature Request] Aule Attention for Nvidia/AMD/Intel/Apple

[Aule Attention](https://github.com/AuleTechnologies/Aule-Attention) is a cross-platform implementation of FlashAttention-2 that uses Triton for NVIDIA and ROCm/Linux and Vulkan for other platforms. It has a [PyTorch SDPA compatibility layer](https://github.com/AuleTechnologies/Aule-Attention/blob/ebf8a2ff32a37b04d720b114abf73d21454d4f02/python/aule/__init__.py#L203) which should faciliate [Diffusers integration](https://huggingface.co/docs/diffusers/main/optimization/attention_backends#available-backends) without major issues.

It may be possible to use [native FA2 kernels](https://huggingface.co/kernels-community/flash-attn2) for NVIDIA devices by integrating the new [kernels library](https://github.com/huggingface/kernels), with Aule Attention as the backup option for other GPU types and when running offline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Aule Attention for Nvidia/AMD/Intel/Apple #79

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Feature Request] Aule Attention for Nvidia/AMD/Intel/Apple #79

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions