Add SDPA and FlashAttention support to T5 #42453

DuyguA · 2025-11-27T13:26:29Z

I made some changes to the T5 modeling file to support new attention interface. I made a bit of rearrangements to employ position_bias correctly into the attention mask.

Fixes #26350

A note though, I made a make fix-copies , however it broke several related models such as longt5 and mt5. Somehow fix script didn't copy over the imports, couldn't grab the attention code correctly hence I skipped that part. If applicable we can merge this PR + I can work on related models in another PR or I'm happy to take some hints to make the script work properly.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ x] Did you read the contributor guideline,
Pull Request section?
[ x] Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
[ x] Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
[ x] Did you write any new necessary tests?

@ArthurZucker @Cyrilvallez @vasqu

vasqu

Sorry to be so strict about this but T5 is not a good candidate for flash attention / sdpa. The reason is that the relative attention bias has to be modeled there and as of now, it's not possible with base flash attention (might be possible with sdpa but needs proper mask preparation). tl;dr: It will only support eager attention in the end

We can still refactor this to have the attention interface-like implementation but only for eager in the end (i.e. _supports_sdpa/flash_attn remain False). Wdyt?

DuyguA · 2025-11-27T13:58:32Z

Sorry to be so strict about this but T5 is not a good candidate for flash attention / sdpa. The reason is that the relative attention bias has to be modeled there and as of now, it's not possible with base flash attention (might be possible with sdpa but needs proper mask preparation). tl;dr: It will only support eager attention in the end

We can still refactor this to have the attention interface-like implementation but only for eager in the end (i.e. _supports_sdpa/flash_attn remain False). Wdyt?

Sounds reasonable to me!

DuyguA · 2025-12-02T13:30:30Z

Heys again @vasqu , I made the changes for restricting only eager attention. Model tests are passing, only repo consistency checks fail as I mentioned above. PR is ready for merge 😊

github-actions · 2025-12-02T13:31:37Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: t5

changes for new attention interface

d678fb2

vasqu reviewed Nov 27, 2025

View reviewed changes

DuyguA added 6 commits December 2, 2025 11:52

no support for flash attn

fbe163f

Merge branch 'main' into refactor/t5-new-attention

ae02420

Merge branch 'main' into refactor/t5-new-attention

84cfb85

restrict only eager attention

d93f0c4

fixed typo

f9f47a4

minor cosmetics

145b10f

Merge branch 'main' into refactor/t5-new-attention

d0e14b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SDPA and FlashAttention support to T5 #42453

Add SDPA and FlashAttention support to T5 #42453

DuyguA commented Nov 27, 2025

Uh oh!

vasqu left a comment

Uh oh!

DuyguA commented Nov 27, 2025

Uh oh!

DuyguA commented Dec 2, 2025

Uh oh!

github-actions bot commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add SDPA and FlashAttention support to T5 #42453

Are you sure you want to change the base?

Add SDPA and FlashAttention support to T5 #42453

Conversation

DuyguA commented Nov 27, 2025

Before submitting

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

DuyguA commented Nov 27, 2025

Uh oh!

DuyguA commented Dec 2, 2025

Uh oh!

github-actions bot commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants