Skip to content

[Feature Request] Aule Attention for Nvidia/AMD/Intel/Apple #79

@iwr-redmond

Description

@iwr-redmond

Aule Attention is a cross-platform implementation of FlashAttention-2 that uses Triton for NVIDIA and ROCm/Linux and Vulkan for other platforms. It has a PyTorch SDPA compatibility layer which should faciliate Diffusers integration without major issues.

It may be possible to use native FA2 kernels for NVIDIA devices by integrating the new kernels library, with Aule Attention as the backup option for other GPU types and when running offline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions