Vision ReAct

Python version used: 3.12.

Benchmarks to evaluate on:

A-OKVQA: A crowdsourced VQA dataset composed of a diverse set of about 25,000 questions that require a broad base of commonsense and world knowledge to answer.
CVQA: A culturally diverse multilingual benchmark featuring 10,000 questions across 30 countries and 31 languages, each provided in both English and the local language.
ScienceQA: A large‐scale multimodal multiple‐choice science benchmark with questions drawn from elementary to high‐school curricula.

VLMs to evaluate:

google/gemma-3-12b-it.
Qwen/Qwen2.5-VL-7B-Instruct.
meta-llama/Llama-3.2-11B-Vision-Instruct.

Methods to evaluate:

Simple VQA inference with VLMs.
VQA inference using a VLM-based ReAct agent.

To make inference, run these commands:

$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='A-OKVQA' --method='vlm' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='gemma3'      > logs/aokvqa/gemma3_vlm.log 2>&1 &
$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='A-OKVQA' --method='vlm' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='qwen2dot5vl' > logs/aokvqa/qwen2dot5vl_vlm.log 2>&1 &
$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='A-OKVQA' --method='vlm' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='llama3'      > logs/aokvqa/llama3_vlm.log 2>&1 &

$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='A-OKVQA' --method='vlm+wiki' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='gemma3'      > logs/aokvqa/gemma3_vlm_wiki.log 2>&1 &
$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='A-OKVQA' --method='vlm+wiki' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='qwen2dot5vl' > logs/aokvqa/qwen2dot5vl_vlm_wiki.log 2>&1 &
$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='A-OKVQA' --method='vlm+wiki' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='llama3'      > logs/aokvqa/llama3_vlm_wiki.log 2>&1 &


$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='CVQA' --method='vlm' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='gemma3'      > logs/cvqa/gemma3_vlm.log 2>&1 &
$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='CVQA' --method='vlm' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='qwen2dot5vl' > logs/cvqa/qwen2dot5vl_vlm.log 2>&1 &
$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='CVQA' --method='vlm' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='llama3'      > logs/cvqa/llama3_vlm.log 2>&1 &

$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='CVQA' --method='vlm+wiki' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='gemma3'      > logs/cvqa/gemma3_vlm_wiki.log 2>&1 &
$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='CVQA' --method='vlm+wiki' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='qwen2dot5vl' > logs/cvqa/qwen2dot5vl_vlm_wiki.log 2>&1 &
$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='CVQA' --method='vlm+wiki' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='llama3'      > logs/cvqa/llama3_vlm_wiki.log 2>&1 &


$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='ScienceQA' --method='vlm' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='gemma3'      > logs/scienceqa/gemma3_vlm.log 2>&1 &
$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='ScienceQA' --method='vlm' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='qwen2dot5vl' > logs/scienceqa/qwen2dot5vl_vlm.log 2>&1 &
$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='ScienceQA' --method='vlm' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='llama3'      > logs/scienceqa/llama3_vlm.log 2>&1 &

$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='ScienceQA' --method='vlm+wiki' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='gemma3'      > logs/scienceqa/gemma3_vlm_wiki.log 2>&1 &
$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='ScienceQA' --method='vlm+wiki' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='qwen2dot5vl' > logs/scienceqa/qwen2dot5vl_vlm_wiki.log 2>&1 &
$ CUDA_VISIBLE_DEVICES=XX nohup python predict.py --dataset='ScienceQA' --method='vlm+wiki' --data_root_dir='/hadatasets/caio.rosa/vqa' --model='llama3'      > logs/scienceqa/llama3_vlm_wiki.log 2>&1 &

To make submissions files for evaluation, run these commands:

$ python make_submission_file.py --dataset='A-OKVQA' --results_file='predictions/A-OKVQA/gemma3_vlm_predictions.json'
$ python make_submission_file.py --dataset='A-OKVQA' --results_file='predictions/A-OKVQA/qwen2dot5vl_vlm_predictions.json'
$ python make_submission_file.py --dataset='A-OKVQA' --results_file='predictions/A-OKVQA/llama3_vlm_predictions.json'

$ python make_submission_file.py --dataset='A-OKVQA' --results_file='predictions/A-OKVQA/gemma3_vlm+wiki_predictions.json'
$ python make_submission_file.py --dataset='A-OKVQA' --results_file='predictions/A-OKVQA/qwen2dot5vl_vlm+wiki_predictions.json'
$ python make_submission_file.py --dataset='A-OKVQA' --results_file='predictions/A-OKVQA/llama3_vlm+wiki_predictions.json'


$ python make_submission_file.py --dataset='CVQA' --results_file='predictions/CVQA/gemma3_vlm_predictions.json'
$ python make_submission_file.py --dataset='CVQA' --results_file='predictions/CVQA/qwen2dot5vl_vlm_predictions.json'
$ python make_submission_file.py --dataset='CVQA' --results_file='predictions/CVQA/llama3_vlm_predictions.json'

$ python make_submission_file.py --dataset='CVQA' --results_file='predictions/CVQA/gemma3_vlm+wiki_predictions.json'
$ python make_submission_file.py --dataset='CVQA' --results_file='predictions/CVQA/qwen2dot5vl_vlm+wiki_predictions.json'
$ python make_submission_file.py --dataset='CVQA' --results_file='predictions/CVQA/llama3_vlm+wiki_predictions.json'


$ python make_submission_file.py --dataset='ScienceQA' --results_file='predictions/ScienceQA/gemma3_vlm_predictions.json'
$ python make_submission_file.py --dataset='ScienceQA' --results_file='predictions/ScienceQA/qwen2dot5vl_vlm_predictions.json'
$ python make_submission_file.py --dataset='ScienceQA' --results_file='predictions/ScienceQA/llama3_vlm_predictions.json'

$ python make_submission_file.py --dataset='ScienceQA' --results_file='predictions/ScienceQA/gemma3_vlm+wiki_predictions.json'
$ python make_submission_file.py --dataset='ScienceQA' --results_file='predictions/ScienceQA/qwen2dot5vl_vlm+wiki_predictions.json'
$ python make_submission_file.py --dataset='ScienceQA' --results_file='predictions/ScienceQA/llama3_vlm+wiki_predictions.json'

To evaluate for ScienceQA, run these commands:

$ python submodules/ScienceQA/tools/evaluate_acc.py --data_file='submodules/ScienceQA/data/scienceqa/problems.json' --result_file='submissions/ScienceQA/gemma3_vlm_submission.json' > evaluations/gemma3_vlm.txt
$ python submodules/ScienceQA/tools/evaluate_acc.py --data_file='submodules/ScienceQA/data/scienceqa/problems.json' --result_file='submissions/ScienceQA/qwen2dot5vl_vlm_submission.json' > evaluations/qwen2dot5vl_vlm.txt
$ python submodules/ScienceQA/tools/evaluate_acc.py --data_file='submodules/ScienceQA/data/scienceqa/problems.json' --result_file='submissions/ScienceQA/llama3_vlm_submission.json' > evaluations/llama3_vlm.txt

$ python submodules/ScienceQA/tools/evaluate_acc.py --data_file='submodules/ScienceQA/data/scienceqa/problems.json' --result_file='submissions/ScienceQA/gemma3_vlm+wiki_submission.json' > evaluations/gemma3_vlm+wiki.txt
$ python submodules/ScienceQA/tools/evaluate_acc.py --data_file='submodules/ScienceQA/data/scienceqa/problems.json' --result_file='submissions/ScienceQA/qwen2dot5vl_vlm+wiki_submission.json' > evaluations/qwen2dot5vl_vlm+wiki.txt
$ python submodules/ScienceQA/tools/evaluate_acc.py --data_file='submodules/ScienceQA/data/scienceqa/problems.json' --result_file='submissions/ScienceQA/llama3_vlm+wiki_submission.json' > evaluations/llama3_vlm+wiki.txt

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
notebooks		notebooks
submodules		submodules
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
dataset.py		dataset.py
executor.py		executor.py
make_submission_file.py		make_submission_file.py
predict.py		predict.py
react.py		react.py
requirements.txt		requirements.txt
tools.py		tools.py
utils.py		utils.py
vlm.py		vlm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision ReAct

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision ReAct

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages