Is Text/Image Encoder frozen?

Thanks for sharing the source code.

I have a few questions regards to the input embeddings.

From your code, all text and images are pre-encoded using `Bert` and `Resent`. Therefore, I assume both models here are not trainable, and you only use these embeddings as inputs for your cross-modality model. (Please correct me if it’s wrong). 

-	Could you please kindly advise how did you plot the attention map in `section 4.6` since all inputs are embeddings, not actual text or images?
-	Have you experimented with unfrozen BERT and Resnet and do end-to-end training?

thanks in advance.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is Text/Image Encoder frozen? #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is Text/Image Encoder frozen? #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions