Skip to content

[Benchmark] Add Spatial-DISE benchmark#1542

Open
shinmohuang wants to merge 4 commits into
open-compass:mainfrom
shinmohuang:codex/add-spatial-dise-benchmark
Open

[Benchmark] Add Spatial-DISE benchmark#1542
shinmohuang wants to merge 4 commits into
open-compass:mainfrom
shinmohuang:codex/add-spatial-dise-benchmark

Conversation

@shinmohuang
Copy link
Copy Markdown

Summary

This PR adds Spatial-DISE, an ICLR 2026 spatial reasoning benchmark, to VLMEvalKit.

Spatial-DISE evaluates VLMs on 2D and 3D spatial reasoning tasks, including rotation, folding, projection, shape finding, and compositional reasoning. The default dataset entry follows the official benchmark split:

DISE-bench/DISE-benchmark.csv

Supported entries:

  • Spatial-DISE
  • Spatial-DISE_BENCH

Additional reusable dataset splits are also exposed:

  • Spatial-DISE_TRAIN
  • Spatial-DISE_VAL
  • Spatial-DISE_TEST

Implementation

This PR follows the VLMEvalKit benchmark contribution pattern by adding a dataset class with:

  • build_prompt(self, line)
  • evaluate(self, eval_file, **judge_kwargs)

The implementation:

  • adds vlmeval/dataset/spatial_dise.py
  • registers SpatialDISE in vlmeval/dataset/__init__.py
  • loads the official dataset from Hugging Face:
    TACPS-liv/Spatial-DISE
  • supports local reuse via:
    SPATIAL_DISE_ROOT=/path/to/Spatial-DISE
  • reads images from tar shards on demand instead of requiring users to manually extract all images
  • maps CSV paths like images/... to tar members by removing the images/ prefix
  • reports accuracy breakdowns by:
    • category
    • difficulty
    • dise_category

Why This Format

Spatial-DISE is distributed on Hugging Face with CSV metadata and tar-sharded images. Instead of embedding images into a large TSV/base64 file, this PR keeps the official dataset layout and provides a VLMEvalKit-compatible loader that handles tar-shard image access transparently.

Users can run the benchmark directly through the standard VLMEvalKit interface:

python run.py --data Spatial-DISE --model <MODEL_NAME>

Validation

Validated locally with:

python3.10 -m py_compile vlmeval/dataset/spatial_dise.py vlmeval/dataset/__init__.py
git diff --check

Smoke test with:

SPATIAL_DISE_ROOT=/LOCAL2/hxm826/Spatial-DISE-hf

confirmed that:

  • Spatial-DISE loads 559 official benchmark examples
  • both 2D and 3D categories are present
  • benchmark images are extracted from tar shards on demand
  • exact-matching evaluation on 12 gold predictions returns 100% overall accuracy

@shinmohuang shinmohuang changed the title Add Spatial-DISE (ICLR2026) benchmark [Benchmark] Add Spatial-DISE benchmark May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant