Inference process

Thanks for your great work! 

I have a question: How can we use the model for classification during inference?  Should we use image embeddings after cross-attn module or just from the image encoder? Do we need to compare the embeddings in the hyperbolic space, or just compare them in the original Euclidean space?

Best regards