Issue Report: Token indices sequence length is longer than the specified maximum sequence length for this model 

Firstly, thank you for the incredible work on the Multilingual-CLIP model. We have been using it and it is great!

However, we've encountered an issue when input text queries exceed 512 tokens. Here is error message:

"Token indices sequence length is longer than the specified maximum sequence length for this model (514 > 512). Running this sequence through the model will result in indexing errors."

I wonder if you've considered passing truncation=True in the tokenizer, MultilingualCLIP forward method line 16 [here](https://github.com/FreddeFrallan/Multilingual-CLIP/blob/93d894ed316b62cf882c649710252ed2d86f39c9/multilingual_clip/pt_multilingual_clip.py#L16). This change would fix the issue when the text query exceeds the token limit. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue Report: Token indices sequence length is longer than the specified maximum sequence length for this model #37

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue Report: Token indices sequence length is longer than the specified maximum sequence length for this model #37

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions