Consultation about code and details.

Thank you for your excellent work, but I have some questions.
1) From Tab. 1 of the paper, it seems that the method applied a single-layer linear layer with GLU activation, but there appears to be no mention of the GLU activation function in the code. Have I missed something or misunderstood?
2) Have you tried other language models like T5 as the text encoder? Should GTE-en-large-v1.5 and NV-Embed-v2 be selected because they can output a CLS token for alignment with the image's CLS token?
3) At the Alignment Tuning stage, are only the pre-encoded image's and text's CLS tokens loaded onto the GPU?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consultation about code and details. #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Consultation about code and details. #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions