Add SDPA support for PatchTST model #42465

Furkan-rgb · 2025-11-27T21:03:41Z

What does this PR do?

Adds SDPA (Scaled Dot Product Attention) support for the PatchTST model.

Changes:

Added PatchTSTSdpaAttention class using torch.nn.functional.scaled_dot_product_attention
Integrated attention class selection in PatchTSTEncoderLayer based on config._attn_implementation
Added _supports_sdpa = True to PatchTSTPreTrainedModel
Fixed test_modeling_patchtst.py _prepare_for_class method for proper dynamic batch size handling

Testing:

All 67 PatchTST tests pass (103 skipped as expected):

pytest tests/models/patchtst/test_modeling_patchtst.py

Notes:

The SDPA implementation falls back to eager attention when output_attentions=True since SDPA doesn't return attention weights
Uses the standard attention implementation pattern from other models in transformers

kashif

looks good thanks

Furkan-rgb · 2025-11-28T08:30:34Z

Seems like the documentation build is failing due to improperly closed tag during the build. Unrelated to my changes.

vasqu

Sorry, we no longer support implementing seperate Attention classes for each attention flavor. There are some older models which haven't been refactored yet for this case.

Please take a look at models like Albert or Bert (a bit messy because it has additional enc-dec logic) which already implement this. The essence is to have

One Attention class
Specific attributes like is_causal, scaling, num_attention_heads to directly reuse across flavors
Generic code around everything around the core attn mechanism, e.g. the projections and views/reshaping
The interface (ALL_ATTENTION_FUNCTIONS) that calls the underlying flavor for us

- Add _supports_sdpa = True and _supports_flash_attn = True to PatchTSTPreTrainedModel - The existing PatchTSTAttention class already uses ALL_ATTENTION_FUNCTIONS to select the attention implementation based on config._attn_implementation - Fix test_modeling_patchtst.py _prepare_for_class for dynamic batch sizes

Furkan-rgb · 2025-11-28T19:51:19Z

Sorry, we no longer support implementing seperate Attention classes for each attention flavor. There are some older models which haven't been refactored yet for this case.

Please take a look at models like Albert or Bert (a bit messy because it has additional enc-dec logic) which already implement this. The essence is to have

One Attention class

Specific attributes like is_causal, scaling, num_attention_heads to directly reuse across flavors

Generic code around everything around the core attn mechanism, e.g. the projections and views/reshaping

The interface (ALL_ATTENTION_FUNCTIONS) that calls the underlying flavor for us

Thanks for the feedback. Took a look and updated accordingly. Much cleaner now. The existing PatchTSTAttention already uses ALL_ATTENTION_FUNCTIONS.

kashif · 2025-11-28T21:22:23Z

@Furkan-rgb I fixed the SLOW tests and explicitly testing with sdpa

vasqu · 2025-12-01T15:55:10Z

run-slow: patchtst

github-actions · 2025-12-01T15:56:15Z

This comment contains run-slow, running the specified jobs:

models: ["models/patchtst"]
quantizations: []

github-actions · 2025-12-01T16:09:28Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

vasqu · 2025-12-01T16:59:14Z

run-slow: patchtst

github-actions · 2025-12-01T16:59:44Z

💔 This comment contains run-slow, but unknown error occurred and the workflow run aborted!

github-actions · 2025-12-01T16:59:56Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: patchtst

vasqu · 2025-12-01T17:01:12Z

run-slow: patchtst

github-actions · 2025-12-01T17:02:12Z

This comment contains run-slow, running the specified jobs:

models: ["models/patchtst"]
quantizations: []

github-actions · 2025-12-01T17:12:25Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

vasqu

Just a small comment about the deepspeed addition

I was free to update some more to add flex attn and enable some other tests. SDPA is enabled by default, so no need to set the attn implementation here. We should probably update the title tho ~ something along Update supported atttns for PatchTST

vasqu · 2025-12-01T17:29:58Z

src/transformers/models/patchtst/modeling_patchtst.py

+            position_enc = module._init_pe(self.config, num_patches)
+            if is_deepspeed_zero3_enabled():
+                import deepspeed
+
+                with deepspeed.zero.GatheredParameters(module.position_enc, modifier_rank=None):
+                    if module.position_enc.numel() > 0:
+                        init.copy_(module.position_enc, position_enc)
+            else:
+                init.copy_(module.position_enc, position_enc)


Not exactly against this but why was this added to this PR? Can we move this to a separate PR? cc @kashif

the slow test were failing without this change...

Gotcha, thx for adding then

I will probably merge this PR tomorrow then, we cutting last PRs for v5 rn

kashif approved these changes Nov 28, 2025

View reviewed changes

vasqu reviewed Nov 28, 2025

View reviewed changes

Furkan-rgb force-pushed the add-sdpa-support-patchtst branch from 9233cd1 to 34deed0 Compare November 28, 2025 19:45

Furkan-rgb force-pushed the add-sdpa-support-patchtst branch from 34deed0 to 23fe9ff Compare November 28, 2025 19:48

Furkan-rgb requested a review from vasqu November 28, 2025 19:49

kashif added 3 commits November 28, 2025 21:33

Guard PatchTST positional init under ZeRO-3

1869b17

Force SDPA in PatchTST regression integration test

d01e82d

Use sdpa attn in PatchTST regression test

c53a0cf

Merge branch 'main' into add-sdpa-support-patchtst

eaea579

fixups re tests

3c6e82e

Merge branch 'main' into add-sdpa-support-patchtst

b05f61b

vasqu approved these changes Dec 1, 2025

View reviewed changes

Add SDPA support for PatchTST model #42465

Are you sure you want to change the base?

Add SDPA support for PatchTST model #42465

Conversation

Furkan-rgb commented Nov 27, 2025

What does this PR do?

Changes:

Testing:

Notes:

Uh oh!

kashif left a comment

Choose a reason for hiding this comment

Uh oh!

Furkan-rgb commented Nov 28, 2025

Uh oh!

vasqu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Furkan-rgb commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kashif commented Nov 28, 2025

Uh oh!

vasqu commented Dec 1, 2025

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

github-actions bot commented Dec 1, 2025

CI Results

Uh oh!

vasqu commented Dec 1, 2025

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

vasqu commented Dec 1, 2025

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

github-actions bot commented Dec 1, 2025

CI Results

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

kashif Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vasqu left a comment •

edited

Loading

Furkan-rgb commented Nov 28, 2025 •

edited

Loading