Skip to content

Commit e39e7d4

Browse files
committed
Fix
1 parent ac462c5 commit e39e7d4

File tree

3 files changed

+37
-14
lines changed

3 files changed

+37
-14
lines changed

.github/workflows/test.yml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
name: Example
2+
3+
on:
4+
push:
5+
6+
jobs:
7+
uv-example:
8+
name: python
9+
runs-on: ubuntu-latest
10+
11+
steps:
12+
- uses: actions/checkout@v4
13+
14+
- name: Install uv
15+
uses: astral-sh/setup-uv@v5
16+
17+
- name: Install the project
18+
run: uv sync --dev
19+
20+
- name: Run tests
21+
# For example, using `pytest`
22+
run: uv run pytest src/eval_mm/metrics/*.py

src/eval_mm/metrics/jmmmu_scorer.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -494,12 +494,12 @@ def test_jmmmu_aggregate():
494494
scores = JMMMUScorer.score(ds, preds, docs=ds)
495495
metric = JMMMUScorer.aggregate(scores, docs=ds)
496496
true_metric = {
497-
"Overall-Art and Psychology": {"num": 0, "acc": 0},
498-
"Overall-Business": {"num": 10, "acc": 0.3},
499-
"Accounting": {"num": 10, "acc": 0.3},
500-
"Overall-Science": {"num": 0, "acc": 0},
501-
"Overall-Health and Medicine": {"num": 0, "acc": 0},
502-
"Overall-Tech and Engineering": {"num": 0, "acc": 0},
503-
"Overall": {"num": 10, "acc": 0.3},
497+
"Overall-Art and Psychology": 0,
498+
"Overall-Business": 0.3,
499+
"Accounting": 0.3,
500+
"Overall-Science": 0,
501+
"Overall-Health and Medicine": 0,
502+
"Overall-Tech and Engineering": 0,
503+
"Overall": 0.3,
504504
}
505505
assert metric == true_metric

src/eval_mm/metrics/mmmu_scorer.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -494,12 +494,13 @@ def test_mmmu_aggregate():
494494
scores = MMMUScorer.score(ds, preds, docs=ds)
495495
metric = MMMUScorer.aggregate(scores, docs=ds)
496496
true_metric = {
497-
"Overall-Art and Psychology": {"num": 0, "acc": 0},
498-
"Overall-Business": {"num": 10, "acc": 0.3},
499-
"Accounting": {"num": 10, "acc": 0.3},
500-
"Overall-Science": {"num": 0, "acc": 0},
501-
"Overall-Health and Medicine": {"num": 0, "acc": 0},
502-
"Overall-Tech and Engineering": {"num": 0, "acc": 0},
503-
"Overall": {"num": 10, "acc": 0.3},
497+
"Overall-Art and Design": 0,
498+
"Overall-Business": 0.3,
499+
"Accounting": 0.3,
500+
"Overall-Science": 0,
501+
"Overall-Health and Medicine": 0,
502+
"Overall-Humanities and Social Science": 0,
503+
"Overall-Tech and Engineering": 0,
504+
"Overall": 0.3,
504505
}
505506
assert metric == true_metric

0 commit comments

Comments
 (0)