CC_OCR 日本語サブセットの追加 #168

Silviase · 2025-05-16T02:39:29Z

eval_all.sh への追加
src/eval_mm/metrics/__init__.py への追加
src/eval_mm/metrics/scorer_registry.pyへの追加
src/eval_mm/metrics/**_scorer.py の実装（Optional）
src/eval_mm/tasks/__init__.pyへの追加
src/eval_mm/tasks/task_registry.pyへの追加
src/eval_mm/tasks/**.pyの実装

Silviase · 2025-05-16T03:56:21Z

Qwen2.5-VL-Instruct-7Bを評価したところ，0.748程度でした. precisionが0.9程度でrecallが0.5くらいと見ると、読めていない細かい文字があると見るのが妥当でしょうか。150枚なのでそこまでカバレッジは広くないですが、傾向は観れるかもしれないです。
こちらも動作確認をお願いします。

Silviase · 2025-05-17T13:22:53Z

CVQAのjapanese subsetも追加しました。

Copilot

Pull Request Overview

Adds support for a Japanese subset of the CC-OCR and CVQA tasks into the eval_mm framework, wiring them through the task and metric registries and enabling end-to-end evaluation.

Registers CCOCR and CVQA in task and scorer registries
Implements CC-OCR and CVQA Task classes with quick local tests
Updates eval_all.sh to include the new tasks

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/eval_mm/tasks/task_registry.py	Register new CCOCR and CVQA tasks
src/eval_mm/tasks/cc_ocr.py	Implement CCOCR Task
src/eval_mm/tasks/cvqa.py	Implement CVQA Task
src/eval_mm/tasks/init.py	Export new task classes
src/eval_mm/metrics/scorer_registry.py	Register new CCOCRScorer and CVQAScorer
src/eval_mm/metrics/cvqa_scorer.py	Implement CVQA scorer
src/eval_mm/metrics/init.py	Export new scorer classes
eval_all.sh	Include CC-OCR and CVQA in evaluation script

Comments suppressed due to low confidence (3)

src/eval_mm/metrics/scorer_registry.py:29

The registry key "mmmu" looks like a typo; it should be "mmmlu" to match the task ID and ensure the MMMLU scorer is correctly registered.

"mmmu": MMMUScorer,

src/eval_mm/tasks/cvqa.py:77

[nitpick] The helper function is named test_task, which may collide with similar helpers; rename it to test_cvqa_task for clarity and to avoid conflicts.

def test_task():

src/eval_mm/tasks/cc_ocr.py:53

[nitpick] The helper function is named test_task; consider renaming it to test_ccocr_task to clearly associate it with the CCOCR task.

def test_task():

Copilot · 2025-05-17T15:20:52Z

src/eval_mm/tasks/cvqa.py

+    return f"{question}\n{parsed_options}\n\n{MULTI_CHOICE_PROMPT}"
+
+
+class CVQA(Task):


[nitpick] Adding a class docstring to CVQA would help future maintainers understand the dataset source, prompt structure, and filter criteria.

Suggested change

class CVQA(Task):

class CVQA(Task):

"""

CVQA is a task for evaluating visual question answering in the context of a specific dataset.

This task uses the `afaji/cvqa` dataset, filtered to include only entries where the `Subset` field

matches `('Japanese', 'Japan')`. The dataset includes questions, multiple-choice options, and images.

The input prompt is constructed by combining the question and its options, formatted as a multiple-choice

question. The user is instructed to select the most appropriate answer by providing the corresponding letter.

Key methods:

- `_prepare_dataset`: Prepares and filters the dataset, and constructs input prompts.

- `doc_to_text`: Extracts the input text for a given document.

- `doc_to_visual`: Extracts the associated image(s) for a given document.

- `doc_to_id`: Retrieves the unique identifier for a given document.

- `doc_to_answer`: Retrieves the correct answer for a given document.

"""

src/eval_mm/tasks/cc_ocr.py

speed1313

substring-matchだけ対応お願いします

speed1313 · 2025-05-17T15:30:02Z

src/eval_mm/metrics/cvqa_scorer.py

src/eval_mm/metrics/substring_match_scorer.py
こちらで代替していただけますか?

speed1313 · 2025-05-17T15:44:25Z

gemma-3-4b-it on cc-ocr

{"question_id": "0", "text": "終点線 (特色・快速)\nSōbu Line\n船橋・津田沼・千葉方面\nfor Funabashi, Tsudanuma & Chiba\n\n快速 20:36\n快速 20:40\n千葉 15両\n津田沼 15両", "answer": "4 総武線（特急·快速） Sōbu Line 船橋·津田沼·千葉方面 for Funabashi, Tsudanuma & Chiba 快速 20:36 千 葉 15両 快速 20:40 津田沼 15両", "input_text": "Please output only the text content from the image without any additional descriptions or formatting.", "cc-ocr": 0.8969696964588428}
{"question_id": "1", "text": "本０限定!!\n鶴ちゃんこ鍋 が\n一人前 ¥500", "answer": "本日限定！！ 鶏ちゃんご鍋が 一人前¥500 （税込） ワソコイソ", "input_text": "Please output only the text content from the image without any additional descriptions or formatting.", "cc-ocr": 0.6122448974510619}
{"question_id": "2", "text": "この出口は\n海抜 3.4m\nAbove sea level\n해발\n東京都交通局", "answer": "この出入口は 海抜 Above sea level 海拔 해발 3.4m 東京都交通局", "input_text": "Please output only the text content from the image without any additional descriptions or formatting.", "cc-ocr": 0.9552238800695032}
{"question_id": "3", "text": "BS\nブリヂストン\nスイッチサインを確かめましょう", "answer": "BS ブリヂストン スリップサインを確かめましょう", "input_text": "Please output only the text content from the image without any additional descriptions or formatting.", "cc-ocr": 0.913043477721172}
{"question_id": "4", "text": "コインランドリー\n八千代", "answer": "コインランドリー 八千代 アパ CO*N LM* 年中無体 お湯で洗たく マンション D-1 yDo 1 *い *** ***", "input_text": "Please output only the text content from the image without any additional descriptions or formatting.", "cc-ocr": 0.3999999996654545}
{"question_id": "5", "text": "ありません。", "answer": "2F ポプラ 珈琲 *** 休業 *** *** Barドウム 店名 3月6日（日）まで 2月21日（月）~ 休*いたします。 *コロナウイ*ス感染症の拡大*止のため *の*力要*内容A）に基づ* ポプラ", "input_text": "Please output only the text content from the image without any additional descriptions or formatting.", "cc-ocr": 0.04705882339709341}
{"question_id": "6", "text": "鏡はししわ\nGinbeko\n(The First Old Sake Brewery)\n角/バイキン道場\n(Kakigoriya Akiba)\n土浦の家\nどうもありがとう", "answer": "築地 銀だこ GINDACO 鰻 はしな 角ハイボール 酒場 tghball akaba *ighball Sakaba 圡用うなき うなぎ はしな 九州熱中屋 gragmmo 2F 矢澤 坐うま とうがらい 蓬莱 中国 料理 *** ***", "input_text": "Please output only the text content from the image without any additional descriptions or formatting.", "cc-ocr": 0.4146341458424301}
{"question_id": "7", "text": "金泽ホテルモン\nキリン\n岐京屋\n290", "answer": "龍角散 龍 ダイレクト ばんしゃ*家 *料理 餋老乃龍 黒ホル 情ホル丁 情*ル マサヒ スー*ライ 生中 290 *品 KIRIN KIRIN KIRIN *情熱ホルモソ *ホル 情熱ホルモソ JONETSU HORUMON 酒* 出*丁 *家思い KIRIN キリンビール ラーメン 餃子 岐阜屋 質 *** *広小* *ウラ イニツ商事 *** -3672 ***", "input_text": "Please output only the text content from the image without any additional descriptions or formatting.", "cc-ocr": 0.15483870949028095}
{"question_id": "8", "text": "クリーニング\n10:30から\n会員電話番号\nクリーナー 130円", "answer": "10-17 クリーニンク おしゃれ洗濯 じゃぷじゃぷ 会員価格 カッター 白色（立体） 130円 *** じゃぷじゃぷ *店 *** 10:30* お預かり 年*6* お*し！", "input_text": "Please output only the text content from the image without any additional descriptions or formatting.", "cc-ocr": 0.39999999957584487}
{"question_id": "9", "text": "天谷町 竹馬", "answer": "*たさと 竹馬 天ぶら 京料理 ー* *け* ***", "input_text": "Please output only the text content from the image without any additional descriptions or formatting.", "cc-ocr": 0.27272727235123967}

画質的に難しそうなものがいくつかありそうですね.
{"question_id": "9", "text": "天谷町竹馬", "answer": "たさと竹馬天ぶら京料理ー け ***", "input_text": "Please output only the text content from the image without any additional descriptions or formatting.", "cc-ocr": 0.27272727235123967}

Co-authored-by: Copilot <[email protected]>

Silviase · 2025-05-18T00:22:39Z

ありがとうございます、助かります

…o 167-task-ccocr

speed1313 · 2025-05-18T03:27:18Z

llm-jp-vila on cc-ocr
OCR以外も出力してしまってInstruction Following能力が低いためにスコアが下がってしまいますね. questionが英語のせいもあるかもですが.

{"question_id": "0", "text": "The image shows a train station with a digital display board. The board displays the time of the next train, which is 20:36. The board also shows the destination of the train, which is for Funabashi, Tsudanuma, and Chiba. The board is located above a platform, and there are two cameras mounted on the ceiling.", "answer": "4 総武線（特急·快速） Sōbu Line 船橋·津田沼·千葉方面 for Funabashi, Tsudanuma & Chiba 快速 20:36 千 葉 15両 快速 20:40 津田沼 15両", "input_text": "Please output only the text content from the image without any additional descriptions or formatting.", "cc-ocr": 0.23008849520407934}

sarashina 14b on cc-ocr

{"question_id": "0", "text": " 4. Sōbu Line for Funabashi, Tsudanuma & Chiba. 20:36 Chiba 15. 20:40 Tsudanuma 15.", "answer": "4 総武線（特急·快速） Sōbu Line 船橋·津田沼·千葉方面 for Funabashi, Tsudanuma & Chiba 快速 20:36 千 葉 15両 快速 20:40 津田沼 15両", "input_text": "Please output only the text content from the image without any additional descriptions or formatting.", "cc-ocr": 0.6710526310743593}
{"question_id": "1", "text": " 本日限定!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!", "answer": "本日限定！！ 鶏ちゃんご鍋が 一人前¥500 （税込） ワソコイソ", "input_text": "Please output only the text content from the image without any additional descriptions or formatting.", "cc-ocr": 0.027874564278065776}

全体の結果

| Model                                    | CC-OCR/macro_f1   | CVQA/Acc   |
|:-----------------------------------------|:------------------|:-----------|
| stabilityai/japanese-instructblip-alpha  | 0.0               | 0.24       |
| SakanaAI/Llama-3-EvoVLM-JP-v2            | 0.1               | 0.44       |
| cyberagent/llava-calm2-siglip            | 0.11              | 0.07       |
| llm-jp/llm-jp-3-vila-14b                 | 0.12              | 0.3        |
| sbintuitions/sarashina2-vision-8b        | 0.08              | 0.49       |
| sbintuitions/sarashina2-vision-14b       | 0.23              | 0.56       |
| llava-hf/llava-1.5-7b-hf                 | 0.14              | 0.41       |
| llava-hf/llava-v1.6-mistral-7b-hf        | 0.2               | 0.27       |
| neulab/Pangea-7B-hf                      | 0.19              | 0.48       |
| mistralai/Pixtral-12B-2409               | 0.27              | 0.49       |
| meta-llama/Llama-3.2-11B-Vision-Instruct | 0.28              | 0.52       |
| Efficient-Large-Model/VILA1.5-13b        | 0.21              | 0.44       |
| OpenGVLab/InternVL2-8B                   | 0.37              | 0.48       |
| OpenGVLab/InternVL2-26B                  | 0.39              | 0.5        |
| Qwen/Qwen2.5-VL-7B-Instruct              | <u>0.8</u>        | 0.56       |
| Qwen/Qwen2.5-VL-72B-Instruct             | **0.8**           | <u>0.7</u> |
| google/gemma-3-4b-it                     | 0.55              | 0.49       |
| google/gemma-3-12b-it                    | 0.65              | 0.59       |
| google/gemma-3-27b-it                    | 0.67              | 0.63       |
| microsoft/Phi-4-multimodal-instruct      | 0.42              | 0.37       |
| gpt-4o-2024-11-20                        | 0.64              | **0.8**    |
| turing-motors/Heron-NVILA-Lite-15B       | 0.45              | 0.61       |

Silviase added 2 commits May 16, 2025 11:02

add CCOCR Task class

e02e7fa

add CCOCR metrics

fa87311

Silviase linked an issue May 16, 2025 that may be closed by this pull request

CC-OCR の日本語サブセットの追加 #167

Closed

add task&metrics to registry, scripts, init

a1a662c

Silviase requested a review from speed1313 May 16, 2025 02:45

add cvqa japanese subset

9f673d9

add cvqa japanese subset

6ca82be

speed1313 requested a review from Copilot May 17, 2025 15:18

Copilot AI reviewed May 17, 2025

View reviewed changes

speed1313 requested changes May 17, 2025

View reviewed changes

Update src/eval_mm/tasks/cc_ocr.py

32f87e8

Co-authored-by: Copilot <[email protected]>

speed1313 added 2 commits May 18, 2025 12:21

refactoring and add results

496fe58

Merge remote-tracking branch 'refs/remotes/origin/167-task-ccocr' int…

ce8b888

…o 167-task-ccocr

speed1313 merged commit 77cc07a into master May 18, 2025
1 check passed

speed1313 deleted the 167-task-ccocr branch May 18, 2025 03:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CC_OCR 日本語サブセットの追加 #168

CC_OCR 日本語サブセットの追加 #168

Uh oh!

Silviase commented May 16, 2025 •

edited

Loading

Uh oh!

Silviase commented May 16, 2025

Uh oh!

Silviase commented May 17, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 17, 2025

Uh oh!

Uh oh!

speed1313 left a comment

Uh oh!

speed1313 May 17, 2025

Uh oh!

speed1313 commented May 17, 2025

Uh oh!

Silviase commented May 18, 2025

Uh oh!

speed1313 commented May 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return f"{question}\n{parsed_options}\n\n{MULTI_CHOICE_PROMPT}"


		class CVQA(Task):

-class CVQA(Task):
+class CVQA(Task):
+    """
+    CVQA is a task for evaluating visual question answering in the context of a specific dataset.
+    This task uses the `afaji/cvqa` dataset, filtered to include only entries where the `Subset` field
+    matches `('Japanese', 'Japan')`. The dataset includes questions, multiple-choice options, and images.
+    The input prompt is constructed by combining the question and its options, formatted as a multiple-choice
+    question. The user is instructed to select the most appropriate answer by providing the corresponding letter.
+    Key methods:
+    - `_prepare_dataset`: Prepares and filters the dataset, and constructs input prompts.
+    - `doc_to_text`: Extracts the input text for a given document.
+    - `doc_to_visual`: Extracts the associated image(s) for a given document.
+    - `doc_to_id`: Retrieves the unique identifier for a given document.
+    - `doc_to_answer`: Retrieves the correct answer for a given document.
+    """

CC_OCR 日本語サブセットの追加 #168

CC_OCR 日本語サブセットの追加 #168

Uh oh!

Conversation

Silviase commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Silviase commented May 16, 2025

Uh oh!

Silviase commented May 17, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI May 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

speed1313 left a comment

Choose a reason for hiding this comment

Uh oh!

speed1313 May 17, 2025

Choose a reason for hiding this comment

Uh oh!

speed1313 commented May 17, 2025

Uh oh!

Silviase commented May 18, 2025

Uh oh!

speed1313 commented May 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Silviase commented May 16, 2025 •

edited

Loading