Thanks for your great work!
I have a question: How can we use the model for classification during inference? Should we use image embeddings after cross-attn module or just from the image encoder? Do we need to compare the embeddings in the hyperbolic space, or just compare them in the original Euclidean space?
Best regards
Thanks for your great work!
I have a question: How can we use the model for classification during inference? Should we use image embeddings after cross-attn module or just from the image encoder? Do we need to compare the embeddings in the hyperbolic space, or just compare them in the original Euclidean space?
Best regards