question about H/W ordering in attention mask generation

Hi, thank you for sharing this amazing work!

I was visualizing the attention masks as images and noticed that the height and width dimensions seem to be swapped.

I traced through the code and couldn't find any correction logic in the SelfAttention module, so I wanted to confirm whether this is a bug or if I'm misunderstanding something.


utils/input.py (line 35-36)
```
# modify attention mask for object idx based on the bounding box
def get_attmask_w_box(att_masks, idx, box, image_size):
    x1, y1, x2, y2 = int(np.round(box[0]*image_size)), int(np.round(box[1]*image_size)), int(np.round(box[2]*image_size)), int(np.round(box[3]*image_size))
    att_masks[idx][x1:x2, y1:y2] = 1
    return att_masks
```

dataset/decode_item.py (line 702-703)
```
            if self.return_att_masks:
                box = boxes[i]
                image_size = 64
                x1, y1, x2, y2 = int(np.round(box[0]*image_size)), int(np.round(box[1]*image_size)), int(np.round(box[2]*image_size)), int(np.round(box[3]*image_size))
                att_masks[i][x1:x2, y1:y2] = 1
```

Should this be [y1:y2, x1:x2]?

Could you please confirm if this is intentional or if I'm missing something?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about H/W ordering in attention mask generation #54

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

question about H/W ordering in attention mask generation #54

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions