-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
First, I thank you very much for your contribution. 💯 💯 💯
In MathVerse, You have proven that most MLLMs solve problems based on "Text Redundancy".
I saw that, in InternVL they scale up the vision encoder to reduce the gap between Visual and Textual information. And it's also achieved Top 1 in MathVista.
Can you provide the benchmark results of InternVL on the MathVerse dataset? I think it will add useful information to your hypothesis.
Reference papers:
Metadata
Metadata
Assignees
Labels
No labels