Integrating Triangular Op Triton Kernels

We (AMD) recently introduced Triton kernels for Triangular operations (in particular, Triangle Attention) for OpenFold3 (https://www.amd.com/en/blogs/2026/openfold3-meets-amd-instinct-gpus-unlocking-scalable.html, https://github.com/aqlaboratory/openfold-3/pull/166). These work as an alternative to cuequivariance, which is exclusive to (newish) Nvidia GPUs. I've prototyped integrating triangle attention into Boltz and see ~2x speedups in end-to-end time running on MI300X. Would you be interested in us adding these as an alternative to cuequivariance? We're exploring publishing them as a standalone pip package, but it would also be easy to start by vendoring in the single file for triangle attention for a quicker integration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating Triangular Op Triton Kernels #672

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Integrating Triangular Op Triton Kernels #672

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions