Embedding speed seems slow

Hello there. 
I tried to use m2_bert_80M_2k to generate embeddings for text strings with lengths around 500 tokens following your example on Huggingface](https://huggingface.co/togethercomputer/m2-bert-80M-2k-retrieval). However, the `outputs = model(**input_ids)` line tooks over 15s on average, slower than expected. Could you please help me find the issue here?

I also tested your example for 12 tokens. The model forwarding process is still slow (over 5s for 12tokens & padding="longest", over 16s for 12tokens & padding="max_length"(=2048).
![12tok_longest](https://github.com/HazyResearch/m2/assets/99607923/a57651ab-0d52-43b0-a9da-aab1efbd0167)
![12tok_max_length](https://github.com/HazyResearch/m2/assets/99607923/d29ae8e3-e8d6-4272-b52a-36e996def52c)
Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Embedding speed seems slow #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Embedding speed seems slow #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions