A Spatial Physical Reasoning Benchmark
You can also explore or download the dataset directly from Hugging Face:
Follow these steps to recreate the dataset from scratch.
-
Create a Conda Environment
Make sure you have Miniconda or Anaconda installed.conda create -n "sphyr" python -y conda activate sphyr -
Install Poetry & Project Dependencies
Poetry is used for dependency management.pip install poetry poetry install
-
Download Rhinoceros 8.0
Rhino includes the Grasshopper visual programming environment.
📥 Download here -
Install Millipede Plugin
Move the Millipede plugin to Grasshopper's special components folder:src/sphyr/dataset_creation/topology_optimization_data/2D/rhino_grasshopper/libraries/millipedeYou can access the special folder in Grasshopper via:
File>Special Folders>Components Folder -
Open the Rhino & Grasshopper Files
- Rhino File:
src/sphyr/dataset_creation/topology_optimization_data/2D/rhino_grasshopper/SPhyR_2D.3dm - Grasshopper Script:
src/sphyr/dataset_creation/topology_optimization_data/2D/rhino_grasshopper/SPhyR_2D.gh
✅ Once opened, run the Grasshopper script by toggling the boolean on the top-left of the canvas.
💡 Tip: If you'd rather skip this step, precomputed results are available:
- Raw Data:
src/sphyr/dataset_creation/topology_optimization_data/2D/raw_data - Plots/Frames:
src/sphyr/dataset_creation/topology_optimization_data/2D/frames
- Rhino File:
Run the following Python script to convert raw simulation output to a format suitable for evaluation on HuggingFace:
python src/sphyr/dataset_creation/raw_data_2D_to_huggingface_datasets.pyThis script processes the .csv simulation outputs into structured .json entries.
Benchmarks for 100 samples are available for the following models:
- Claude 3.7 Sonnet
- Claude Opus 4
- DeepSeek-R1
- Gemini 1.5 Pro
- Gemini 2.5 Pro
- GPT-3.5 Turbo
- GPT-4.1
- GPT-4o
- Perplexity Sonar
- Perplexity Sonar Reasoning
📁 You can find these results inside the results directory.
We have included a preliminary sub-set of 3D data and corresponding plots, but we plan to release a full set in the future. 3D Data can be found here: src/sphyr/dataset_creation/topology_optimization_data/3D.
BibTeX:
@misc{siedler2025sphyr,
title = {SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution},
author = {Philipp D. Siedler},
year = {2025},
eprint = {2505.16048},
archivePrefix= {arXiv},
primaryClass = {cs.AI},
doi = {10.48550/arXiv.2505.16048},
url = {https://arxiv.org/abs/2505.16048}
}
APA:
Siedler, P. D. (2025). SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution. arXiv. https://doi.org/10.48550/arXiv.2505.16048