Skip to content

[WIP] Fix default judge model selection conflict in run.py and tools.py#1532

Open
TianhaoLiang2000 wants to merge 2 commits into
open-compass:mainfrom
TianhaoLiang2000:fix/judge-defaults
Open

[WIP] Fix default judge model selection conflict in run.py and tools.py#1532
TianhaoLiang2000 wants to merge 2 commits into
open-compass:mainfrom
TianhaoLiang2000:fix/judge-defaults

Conversation

@TianhaoLiang2000
Copy link
Copy Markdown
Contributor

Summary

This PR centralizes the default judge model selection logic into vlmeval/judge.py and makes both run.py and vlmeval/tools.py::EVAL() use the same source of truth.

Changes

  • Added get_default_judge_model() in vlmeval/judge.py.
  • Updated run.py::get_judge_kwargs() to reuse the shared default judge resolver.
  • Updated vlmeval/tools.py::EVAL() to use the same resolver instead of its older hardcoded defaults.
  • Removed the stale chatgpt-0125 default for MCQ / Y-N evaluation through tools.py::EVAL().
  • Kept compatibility for both MMBench_Video and MMBench-Video dataset naming.

Testing

  • python -m py_compile run.py vlmeval/tools.py vlmeval/judge.py
  • pre-commit run passed during commit.
  • pre-commit run --all-files was also run; it fails only on pre-existing flake8 issues outside this PR's changed files.

@TianhaoLiang2000 TianhaoLiang2000 changed the title [Fix] Centralize default judge model selection [Fix] Fix default judge model selection conflict in run.py and tools.py May 6, 2026
Copy link
Copy Markdown
Collaborator

@mzr1996 mzr1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the judge model selection to the benchmark class (for example, a class attribute), and this function will use the class attribute and dataset type to determine the final judge model name.

@TianhaoLiang2000 TianhaoLiang2000 changed the title [Fix] Fix default judge model selection conflict in run.py and tools.py [WIP] Fix default judge model selection conflict in run.py and tools.py May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants