Side‑by‑side HTML visualizer for MMMU/MMMU_Pro that pairs the Vision split with the Standard (10 options) split by id. It lists the actual multiple‑choice options, anchors in‑question <image k> placeholders to figures, and supports MathJax, incremental rendering, and multi‑process image exporting.
- Pairs Vision vs. Standard(10) by
idwith consistent metadata. - Correctly renders the 10 options on the right panel.
- Parses
optionseven when it is a stringified list. - If
optionsonly contains letters A–J, it reconstructs texts from the question when present; otherwise shows the original letter order. - Anchors
<image k>options to the first matching figure.
- Parses
- MathJax for inline formulas and lazy, incremental page rendering.
- Multi‑process export of original images (PNG/JPEG) with sensible defaults.
- Client‑side filters: subject, image type, difficulty, plus full‑text search.
- Python 3.9+
pip install -r requirements.txt
Run from the repo root:
python visualize_mmmu_pro_pairs_idjoin_mathjax_mp_optfix.py \
--rows 200 --page-size 60 --workers 8
Common flags:
--subject SUBJECTfilter a specific subject.--start Nstart index;--rows Nmax items to render;--page-size Nitems per virtual page.--workers Nprocesses for saving images;--jpeg-quality/--jpeg-optimize/--jpeg-progressivecontrol output.--out DIRoutput folder, defaultmmmu_pro_pairs_idjoin_mathjax_mp_options_listed(containsindex.html+images/).
Open the generated index.html in a browser. Use Ctrl+F5 to avoid stale caches when iterating.
- The script auto‑detects image columns (
image,image_1..image_10), saves originals toimages/, and links them. - It tolerates different column suffixes after merging (
*_v/*_s), always preferring the Standard side for question/options/meta. - If an
<image k>placeholder has no matchingimage_kcolumn, it is flagged in the yellow warning bar.
This tool downloads data from the Hugging Face dataset:
MMMU/MMMU_Pro(Vision split and Standard (10 options) split).
Please follow the dataset’s license/terms when using the data. Cite the MMMU/MMMU_Pro paper/dataset where appropriate.
MIT — see LICENSE.