README

VisCE^2

This page contains the benchmark dataset for the paper Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction

We presents VisCE$^2$, a vision language model-based caption evaluation method. Our method focuses on visual context, which refers to the detailed content of images, including objects, attributes, and relationships. By extracting and organizing them into a structured format, we replace the human-written references with visual contexts and help VLMs better understand the image, enhancing evaluation performance. Through meta-evaluation on multiple datasets, we validated that VisCE2 outperforms the conventional pre-trained metrics in capturing caption quality and demonstrates superior consistency with human judgment.

Dataset Preparation

We provide the dataset for [meta-]evaluation in the huggingface datasets. You do not have to prepare the dataset by yourself.

Usage

Clone this repository.

git clone [email protected]:Silviase/VisCE2.git
cd VisCE2

Run the evaluation script. If needed, set CUDA_VISIBLE_DEVICES to specify the GPU.
The script will output the evaluation results in results/eval/{dataset_id}/{model_id}/{eval_results_file_name}.json.
If necessary, modify scripts/sample.sh for evaluation.

CUDA_VISIBLE_DEVICES=1 python src/eval.py \
    --dataset_id=flickr8k-expert \
    --model_id=liuhaotian/llava-v1.5-7b \
    --prompt_path=prompts/base.txt \
    --split=0 \
    --result_key=score_model \
    --eval_results_file_name=sample \
    --use_cand \
    --debug

Others

Contact

If you have any questions, please contact Koki Maeda (koki.maeda \[at-mark-without-space\] nlp.c.titech.ac.jp). You also feel free to create an issue on this repository.

Citation

@misc{maeda2024vision,
      title={Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction},
      author={Koki Maeda and Shuhei Kurita and Taiki Miyanishi and Naoaki Okazaki},
      year={2024},
      eprint={2402.17969},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
assets		assets
prompts		prompts
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

README

VisCE^2

Dataset Preparation

Usage

Others

Contact

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Silviase/VisCE2

Folders and files

Latest commit

History

Repository files navigation

README

VisCE^2

Dataset Preparation

Usage

Others

Contact

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages