[Benchmark] Add Spatial-DISE benchmark by shinmohuang · Pull Request #1542 · open-compass/VLMEvalKit

shinmohuang · 2026-05-09T19:56:49Z

Summary

This PR adds Spatial-DISE, an ICLR 2026 spatial reasoning benchmark, to VLMEvalKit.

Spatial-DISE evaluates VLMs on 2D and 3D spatial reasoning tasks, including rotation, folding, projection, shape finding, and compositional reasoning. The default dataset entry follows the official benchmark split:

DISE-bench/DISE-benchmark.csv

Supported entries:

Spatial-DISE
Spatial-DISE_BENCH

Additional reusable dataset splits are also exposed:

Spatial-DISE_TRAIN
Spatial-DISE_VAL
Spatial-DISE_TEST

Implementation

This PR follows the VLMEvalKit benchmark contribution pattern by adding a dataset class with:

build_prompt(self, line)
evaluate(self, eval_file, **judge_kwargs)

The implementation:

adds vlmeval/dataset/spatial_dise.py
registers SpatialDISE in vlmeval/dataset/__init__.py
loads the official dataset from Hugging Face:
TACPS-liv/Spatial-DISE
supports local reuse via:
SPATIAL_DISE_ROOT=/path/to/Spatial-DISE
reads images from tar shards on demand instead of requiring users to manually extract all images
maps CSV paths like images/... to tar members by removing the images/ prefix
reports accuracy breakdowns by:
- category
- difficulty
- dise_category

Why This Format

Spatial-DISE is distributed on Hugging Face with CSV metadata and tar-sharded images. Instead of embedding images into a large TSV/base64 file, this PR keeps the official dataset layout and provides a VLMEvalKit-compatible loader that handles tar-shard image access transparently.

Users can run the benchmark directly through the standard VLMEvalKit interface:

python run.py --data Spatial-DISE --model <MODEL_NAME>

Validation

Validated locally with:

python3.10 -m py_compile vlmeval/dataset/spatial_dise.py vlmeval/dataset/__init__.py
git diff --check

Smoke test with:

SPATIAL_DISE_ROOT=/LOCAL2/hxm826/Spatial-DISE-hf

confirmed that:

Spatial-DISE loads 559 official benchmark examples
both 2D and 3D categories are present
benchmark images are extracted from tar shards on demand
exact-matching evaluation on 12 gold predictions returns 100% overall accuracy

Add Spatial-DISE benchmark

a80377e

shinmohuang changed the title ~~Add Spatial-DISE (ICLR2026) benchmark~~ [Benchmark] Add Spatial-DISE benchmark May 9, 2026

shinmohuang added 3 commits May 9, 2026 21:49

Use merged and separate Spatial-DISE images

6819baf

Split Spatial-DISE image input modes

9e4a033

Support dynamic Spatial-DISE answer options

6834474

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmark] Add Spatial-DISE benchmark#1542

[Benchmark] Add Spatial-DISE benchmark#1542
shinmohuang wants to merge 4 commits into
open-compass:mainfrom
shinmohuang:codex/add-spatial-dise-benchmark

shinmohuang commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shinmohuang commented May 9, 2026

Summary

Implementation

Why This Format

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant