[WIP] Fix default judge model selection conflict in run.py and tools.py by TianhaoLiang2000 · Pull Request #1532 · open-compass/VLMEvalKit

TianhaoLiang2000 · 2026-05-06T03:44:58Z

Summary

This PR centralizes the default judge model selection logic into vlmeval/judge.py and makes both run.py and vlmeval/tools.py::EVAL() use the same source of truth.

Changes

Added get_default_judge_model() in vlmeval/judge.py.
Updated run.py::get_judge_kwargs() to reuse the shared default judge resolver.
Updated vlmeval/tools.py::EVAL() to use the same resolver instead of its older hardcoded defaults.
Removed the stale chatgpt-0125 default for MCQ / Y-N evaluation through tools.py::EVAL().
Kept compatibility for both MMBench_Video and MMBench-Video dataset naming.

Testing

python -m py_compile run.py vlmeval/tools.py vlmeval/judge.py
pre-commit run passed during commit.
pre-commit run --all-files was also run; it fails only on pre-existing flake8 issues outside this PR's changed files.

mzr1996

Please move the judge model selection to the benchmark class (for example, a class attribute), and this function will use the class attribute and dataset type to determine the final judge model name.

Centralize default judge model selection

ad360b6

TianhaoLiang2000 changed the title ~~[Fix] Centralize default judge model selection~~ [Fix] Fix default judge model selection conflict in run.py and tools.py May 6, 2026

mzr1996 requested changes May 14, 2026

View reviewed changes

TianhaoLiang2000 changed the title ~~[Fix] Fix default judge model selection conflict in run.py and tools.py~~ [WIP] Fix default judge model selection conflict in run.py and tools.py May 15, 2026

Move judge defaults to benchmark classes

9751877

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix default judge model selection conflict in run.py and tools.py#1532

[WIP] Fix default judge model selection conflict in run.py and tools.py#1532
TianhaoLiang2000 wants to merge 2 commits into
open-compass:mainfrom
TianhaoLiang2000:fix/judge-defaults

TianhaoLiang2000 commented May 6, 2026

Uh oh!

mzr1996 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TianhaoLiang2000 commented May 6, 2026

Summary

Changes

Testing

Uh oh!

mzr1996 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants