-
Notifications
You must be signed in to change notification settings - Fork 4
CC_OCR 日本語サブセットの追加 #168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CC_OCR 日本語サブセットの追加 #168
Conversation
|
Qwen2.5-VL-Instruct-7Bを評価したところ,0.748程度でした. precisionが0.9程度でrecallが0.5くらいと見ると、読めていない細かい文字があると見るのが妥当でしょうか。150枚なのでそこまでカバレッジは広くないですが、傾向は観れるかもしれないです。 |
|
CVQAのjapanese subsetも追加しました。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds support for a Japanese subset of the CC-OCR and CVQA tasks into the eval_mm framework, wiring them through the task and metric registries and enabling end-to-end evaluation.
- Registers CCOCR and CVQA in task and scorer registries
- Implements CC-OCR and CVQA Task classes with quick local tests
- Updates
eval_all.shto include the new tasks
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/eval_mm/tasks/task_registry.py | Register new CCOCR and CVQA tasks |
| src/eval_mm/tasks/cc_ocr.py | Implement CCOCR Task |
| src/eval_mm/tasks/cvqa.py | Implement CVQA Task |
| src/eval_mm/tasks/init.py | Export new task classes |
| src/eval_mm/metrics/scorer_registry.py | Register new CCOCRScorer and CVQAScorer |
| src/eval_mm/metrics/cvqa_scorer.py | Implement CVQA scorer |
| src/eval_mm/metrics/init.py | Export new scorer classes |
| eval_all.sh | Include CC-OCR and CVQA in evaluation script |
Comments suppressed due to low confidence (3)
src/eval_mm/metrics/scorer_registry.py:29
- The registry key "mmmu" looks like a typo; it should be "mmmlu" to match the task ID and ensure the MMMLU scorer is correctly registered.
"mmmu": MMMUScorer,
src/eval_mm/tasks/cvqa.py:77
- [nitpick] The helper function is named
test_task, which may collide with similar helpers; rename it totest_cvqa_taskfor clarity and to avoid conflicts.
def test_task():
src/eval_mm/tasks/cc_ocr.py:53
- [nitpick] The helper function is named
test_task; consider renaming it totest_ccocr_taskto clearly associate it with the CCOCR task.
def test_task():
| return f"{question}\n{parsed_options}\n\n{MULTI_CHOICE_PROMPT}" | ||
|
|
||
|
|
||
| class CVQA(Task): |
Copilot
AI
May 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Adding a class docstring to CVQA would help future maintainers understand the dataset source, prompt structure, and filter criteria.
| class CVQA(Task): | |
| class CVQA(Task): | |
| """ | |
| CVQA is a task for evaluating visual question answering in the context of a specific dataset. | |
| This task uses the `afaji/cvqa` dataset, filtered to include only entries where the `Subset` field | |
| matches `('Japanese', 'Japan')`. The dataset includes questions, multiple-choice options, and images. | |
| The input prompt is constructed by combining the question and its options, formatted as a multiple-choice | |
| question. The user is instructed to select the most appropriate answer by providing the corresponding letter. | |
| Key methods: | |
| - `_prepare_dataset`: Prepares and filters the dataset, and constructs input prompts. | |
| - `doc_to_text`: Extracts the input text for a given document. | |
| - `doc_to_visual`: Extracts the associated image(s) for a given document. | |
| - `doc_to_id`: Retrieves the unique identifier for a given document. | |
| - `doc_to_answer`: Retrieves the correct answer for a given document. | |
| """ |
speed1313
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
substring-matchだけ対応お願いします
src/eval_mm/metrics/cvqa_scorer.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/eval_mm/metrics/substring_match_scorer.py
こちらで代替していただけますか?
Co-authored-by: Copilot <[email protected]>
|
ありがとうございます、助かります |
|
llm-jp-vila on cc-ocr sarashina 14b on cc-ocr 全体の結果 |

eval_all.shへの追加src/eval_mm/metrics/__init__.pyへの追加src/eval_mm/metrics/scorer_registry.pyへの追加src/eval_mm/metrics/**_scorer.pyの実装(Optional)src/eval_mm/tasks/__init__.pyへの追加src/eval_mm/tasks/task_registry.pyへの追加src/eval_mm/tasks/**.pyの実装