Hi authors,
Congrats on the awesome project! really cool work.
I was curious if you’ve done any experiments on multi-image reasoning and grounding. Since Qwen 2.5-VL already supports multi-image input, it seems like an interesting direction to explore.
Thanks!