Skip to content

EvolvingLMMs-Lab/EASI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EASI

Holistic Evaluation of Multimodal LLMs on Spatial Intelligence

English | 简体中文

arXiv Data

Overview

EASI conceptualizes a comprehensive taxonomy of spatial tasks that unifies existing benchmarks and a standardized protocol for the fair evaluation of state-of-the-art proprietary and open-source models.

Key features include:

  • Supports the evaluation of state-of-the-art Spatial Intelligence models.
  • Systematically collects and integrates evolving Spatial Intelligence benchmarks.
  • Proposes a standardized testing protocol to ensure fair evaluation and enable cross-benchmark comparisons.

🗓️ News

🌟 [2025-11-21] EASI v0.1.1 is released. Major updates include:


🌟 [2025-11-07] EASI v0.1.0 is released. Major updates include:

🛠️ QuickStart

Installation

git clone --recursive https://github.com/EvolvingLMMs-Lab/EASI.git
cd EASI
pip install -e ./VLMEvalKit

Configuration

VLM Configuration: All VLMs are configured in vlmeval/config.py. During evaluation, you should use the model name specified in supported_VLM in vlmeval/config.py to select the VLM. Make sure you can successfully infer with the VLM before starting the evaluation with the following command vlmutil check {MODEL_NAME}.

Benchmark Configuration: The full list of supported Benchmarks can be found in the official VLMEvalKit documentation VLMEvalKit Supported Benchmarks (Feishu). For the EASI Leaderboard, the following Benchmarks are currently supported:

Benchmark Evaluation settings
VSI-Bench VSI-Bench_origin_32frame
VSI-Bench-Debiased_origin_32frame
SITE-Bench SiteBenchImage
SiteBenchVideo_32frame
MMSI-Bench MMSIBench_wo_circular
MindCube MindCubeBench_tiny_raw_qa
MindCubeBench_raw_qa
ViewSpatial ViewSpatialBench
EmbSpatial EmbSpatialBench

Evaluation

General command

python run.py --data {BENCHMARK_NAME} --model {MODEL_NAME} --verbose --reuse

See run.py for the full list of arguments.

Example

Evaluate SenseNova-SI-1.1-InternVL3-8B on MindCubeBench_tiny_raw_qa:

python run.py --data MindCubeBench_tiny_raw_qa \
              --model SenseNova-SI-1.1-InternVL3-8B \
              --verbose --reuse

🖊️ Citation

Spatial intelligence is a rapidly evolving field. Our evaluation scope has expanded beyond GPT-5 to include a broader range of models, leading us to update the paper's title to Holistic Evaluation of Multimodal LLMs on Spatial Intelligence. For consistency, however, the BibTeX below retains the original title for reference.

@article{easi2025,
  title={Has gpt-5 achieved spatial intelligence? an empirical study},
  author={Cai, Zhongang and Wang, Yubo and Sun, Qingping and Wang, Ruisi and Gu, Chenyang and Yin, Wanqi and Lin, Zhiqian and Yang, Zhitao and Wei, Chen and Shi, Xuanke and Deng, Kewang and Han, Xiaoyang and Chen, Zukai and Li, Jiaqi and Fan, Xiangyu and Deng, Hanming and Lu, Lewei and Li, Bo and Liu, Ziwei and Wang, Quan and Lin, Dahua and Yang, Lei},
  journal={arXiv preprint arXiv:2508.13142},
  year={2025}
}