[Test]Add accuracy test for model ERNIE-4.5-21B-A3B-PT #3658

MrZ20 · 2025-10-23T06:33:17Z

What this PR does / why we need it?

Add accuracy test for model ERNIE-4.5-21B-A3B-PT

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: MrZ20 <[email protected]>

github-actions · 2025-10-23T06:33:30Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds a new accuracy test for the ERNIE-4.5-21B-A3B-PT model. The changes include a new YAML configuration file for the test and an update to the list of test configurations. While the changes are straightforward, the new configuration file uses composite metric names (e.g., "exact_match,flexible-extract") that are likely to expose a bug in the test runner script (test_lm_eval_correctness.py), causing the new test to fail with a KeyError when retrieving standard error values. I've added critical comments on the specific metric definitions in the YAML file explaining the issue.

gemini-code-assist · 2025-10-23T06:34:31Z

tests/e2e/models/configs/ERNIE-4.5-21B-A3B-PT.yaml

+  - name: "exact_match,flexible-extract"
+    value: 0.72


The composite metric name "exact_match,flexible-extract" will likely cause the test to fail with a KeyError. The test script (tests/e2e/models/test_lm_eval_correctness.py, lines 152-155) attempts to find the standard error by creating a key "exact_match_stderr,flexible-extract", which is probably not present in the evaluation results. The script doesn't seem to correctly parse compound metric names for stderr retrieval. This might require a fix in the test script itself, or splitting this into separate metric definitions if the framework supports it.

gemini-code-assist · 2025-10-23T06:34:31Z

tests/e2e/models/configs/ERNIE-4.5-21B-A3B-PT.yaml

+  - name: "acc,none"
+    value: 0.85


Similar to the gsm8k task, the metric name "acc,none" will likely cause a KeyError during the test run. The test script (tests/e2e/models/test_lm_eval_correctness.py, lines 152-155) will try to access a key "acc_stderr,none" for the standard error, which is unlikely to exist in the results from lm_eval. This points to a potential bug in how the test script handles stderr for metrics with commas in their names.

MrZ20 · 2025-10-23T07:15:34Z

@MengqingCao Review required

github-actions · 2025-10-25T03:01:02Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

add acc test

553c8ec

Signed-off-by: MrZ20 <[email protected]>

github-actions bot added the module:tests label Oct 23, 2025

gemini-code-assist bot reviewed Oct 23, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Oct 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Test]Add accuracy test for model ERNIE-4.5-21B-A3B-PT #3658

[Test]Add accuracy test for model ERNIE-4.5-21B-A3B-PT #3658

MrZ20 commented Oct 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Uh oh!

gemini-code-assist bot Oct 23, 2025

Uh oh!

MrZ20 commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Test]Add accuracy test for model ERNIE-4.5-21B-A3B-PT #3658

Are you sure you want to change the base?

[Test]Add accuracy test for model ERNIE-4.5-21B-A3B-PT #3658

Conversation

MrZ20 commented Oct 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

MrZ20 commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MrZ20 commented Oct 23, 2025 •

edited by github-actions bot

Loading