We (AMD) recently introduced Triton kernels for Triangular operations (in particular, Triangle Attention) for OpenFold3 (https://www.amd.com/en/blogs/2026/openfold3-meets-amd-instinct-gpus-unlocking-scalable.html, aqlaboratory/openfold-3#166). These work as an alternative to cuequivariance, which is exclusive to (newish) Nvidia GPUs. I've prototyped integrating triangle attention into Boltz and see ~2x speedups in end-to-end time running on MI300X. Would you be interested in us adding these as an alternative to cuequivariance? We're exploring publishing them as a standalone pip package, but it would also be easy to start by vendoring in the single file for triangle attention for a quicker integration.
We (AMD) recently introduced Triton kernels for Triangular operations (in particular, Triangle Attention) for OpenFold3 (https://www.amd.com/en/blogs/2026/openfold3-meets-amd-instinct-gpus-unlocking-scalable.html, aqlaboratory/openfold-3#166). These work as an alternative to cuequivariance, which is exclusive to (newish) Nvidia GPUs. I've prototyped integrating triangle attention into Boltz and see ~2x speedups in end-to-end time running on MI300X. Would you be interested in us adding these as an alternative to cuequivariance? We're exploring publishing them as a standalone pip package, but it would also be easy to start by vendoring in the single file for triangle attention for a quicker integration.