Question about transformer decoder

Hi,

I am trying to learn about the code, and I find the following line:
https://github.com/facebookresearch/MaskFormer/blob/da3e60d85fdeedcb31476b5edd7d328826ce56cc/mask_former/modeling/transformer/transformer.py#L70
The input `tgt` of the decoder is all zeros, and I see the all-zeros-tensor is used as input in the decoder layer:
https://github.com/facebookresearch/MaskFormer/blob/da3e60d85fdeedcb31476b5edd7d328826ce56cc/mask_former/modeling/transformer/transformer.py#L272

Here `tgt` is all-zeros and the `query_pos` is a learnable embedding, which causes `q` and `k` to be non-zero tensor (same tensor in value as query_pos, but the `tgt` is still all-zeros(used as v). According to the computation rule of qkv attention, if `v` is all-zeros, the output of qkv would be all-zeros. Thus the self-attention module does not contribute to the model. Am I correct on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about transformer decoder #63

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about transformer decoder #63

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions