-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Description
I noticed there is a section about DETA does not need self-attention in the decoder. in the paper. The results show that when the self-attn is replaced by ffn in decoder, the performance is better. I wonder whether the final version in the table of compared-with-other-SOTAs using this setting? Because I found in the code that the self-attn is hard-coded in the decoder layer:
DETA/models/deformable_transformer.py
Line 328 in dade176
| self.self_attn = nn.MultiheadAttention(d_model, n_heads, dropout=dropout) |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels