Skip to content

Performance of qwen3.5 #26

@Lim-Sung-Jun

Description

@Lim-Sung-Jun

Hello, thanks for sharing the metric.

I evaluated Qwen3.5 on ScreenSpot Pro, and the performance I reproduced was higher than what is reported. The paper states 68.5, but my reproduced result is 71, both with and without thinking mode. So I am a bit confused about which number should be considered correct.

The official performance is listed here:
https://huggingface.co/Qwen/Qwen3.5-397B-A17B-FP8#instruct-or-non-thinking-mode

For reproduction, I used this code:
https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding/blob/main/models/qwen3_5.py

Could you clarify why there is this discrepancy?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions