Skip to content

Conversation

hidefromkgb
Copy link
Contributor

This addresses MFDNN-14300.

@hidefromkgb hidefromkgb requested a review from a team as a code owner October 14, 2025 06:02
@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Oct 14, 2025
@hidefromkgb
Copy link
Contributor Author

make test perf-gpu
set primitive=matmul ip

}
if(problem.needsAGroupSums()){
for (int b = 0; b < problem.batchDims; b++) {
scale(problem.Tag, state.inputs.strideGroupSumsA[b]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: formatting looks odd here and below.

@hidefromkgb hidefromkgb force-pushed the aguskov/gemm_batched_gs branch from 6585f6c to 532d991 Compare October 15, 2025 03:28
@hidefromkgb hidefromkgb requested a review from a team as a code owner October 15, 2025 03:28
@github-actions github-actions bot added the component:tests Codeowner: @oneapi-src/onednn-arch label Oct 15, 2025
@hidefromkgb
Copy link
Contributor Author

make test
set test_scope=NIGHTLY
disable test_device_cpu
disable benchdnn_all
enable benchdnn_matmul
enable benchdnn_ip
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lp
enable arch_gpu_xe-lpg
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg
enable arch_gpu_xe3-lpg

@hidefromkgb
Copy link
Contributor Author

PTL failures in CI are not due to this PR.
The following test fails randomly on PTL even with the current main:

$ LD_LIBRARY_PATH="../../src:$LD_LIBRARY_PATH" ./benchdnn --engine=gpu --matmul --dt=bf16 --stag=abc --wtag=acb --dtag=abc --repeats-per-prb=10 1024x1x96:1024x96x1

0:PASSED (104 ms) __REPRO: --matmul --engine=gpu --dt=bf16:bf16:bf16 --stag=abc --wtag=acb --dtag=abc 1024x1x96:1024x96x1
1:PASSED (32 ms) __REPRO: --matmul --engine=gpu --dt=bf16:bf16:bf16 --stag=abc --wtag=acb --dtag=abc 1024x1x96:1024x96x1
[ 922][DST][922:0:0] exp_f32:           5 exp:           5 got:          36 diff:      31 rdiff:     6.2
[ 924][DST][924:0:0] exp_f32:           4 exp:           4 got:         -25 diff:      29 rdiff:    7.25
[ 938][DST][938:0:0] exp_f32:          25 exp:          25 got:         -15 diff:      40 rdiff:     1.6
[ 950][DST][950:0:0] exp_f32:          16 exp:          16 got:          33 diff:      17 rdiff:  1.0625
[ 961][DST][961:0:0] exp_f32:          28 exp:          28 got:           6 diff:      22 rdiff:0.785714
[ 970][DST][970:0:0] exp_f32:          18 exp:          18 got:           8 diff:      10 rdiff:0.555556
[ 973][DST][973:0:0] exp_f32:          40 exp:          40 got:          53 diff:      13 rdiff:   0.325
[1002][DST][1002:0:0] exp_f32:         -11 exp:         -11 got:           1 diff:      12 rdiff: 1.09091
[1011][DST][1011:0:0] exp_f32:           9 exp:           9 got:          -4 diff:      13 rdiff: 1.44444
[1014][DST][1014:0:0] exp_f32:          -9 exp:          -9 got:          43 diff:      52 rdiff: 5.77778
[COMPARE_STATS][DST]: trh=0 err_max_diff:      52 err_max_rdiff:    7.25 all_max_diff:      52 all_max_rdiff:    7.25
[PRIM_REF][INFO]: L2_size:524288 bytes; per_core_L3_size:2097152 bytes; nthr:6; impl_name:brg_matmul:avx2_vnni_2
2:FAILED (errors:11 total:1024) (33 ms) __REPRO: --matmul --engine=gpu --dt=bf16:bf16:bf16 --stag=abc --wtag=acb --dtag=abc 1024x1x96:1024x96x1
3:PASSED (28 ms) __REPRO: --matmul --engine=gpu --dt=bf16:bf16:bf16 --stag=abc --wtag=acb --dtag=abc 1024x1x96:1024x96x1
4:PASSED (30 ms) __REPRO: --matmul --engine=gpu --dt=bf16:bf16:bf16 --stag=abc --wtag=acb --dtag=abc 1024x1x96:1024x96x1
[ 472][DST][472:0:0] exp_f32:          -4 exp:          -4 got:          -7 diff:       3 rdiff:    0.75
[ 476][DST][476:0:0] exp_f32:          23 exp:          23 got:         -24 diff:      47 rdiff: 2.04348
[ 529][DST][529:0:0] exp_f32:         -10 exp:         -10 got:          -7 diff:       3 rdiff:     0.3
[ 563][DST][563:0:0] exp_f32:           1 exp:           1 got:         -35 diff:      36 rdiff:      36
[ 593][DST][593:0:0] exp_f32:         -69 exp:         -69 got:           9 diff:      78 rdiff: 1.13043
[ 606][DST][606:0:0] exp_f32:           2 exp:           2 got:          22 diff:      20 rdiff:      10
[ 641][DST][641:0:0] exp_f32:         -25 exp:         -25 got:          23 diff:      48 rdiff:    1.92
[ 664][DST][664:0:0] exp_f32:         -10 exp:         -10 got:          25 diff:      35 rdiff:     3.5
[ 669][DST][669:0:0] exp_f32:          32 exp:          32 got:           5 diff:      27 rdiff: 0.84375
[ 674][DST][674:0:0] exp_f32:          18 exp:          18 got:          -1 diff:      19 rdiff: 1.05556
[COMPARE_STATS][DST]: trh=0 err_max_diff:      78 err_max_rdiff:      36 all_max_diff:      78 all_max_rdiff:      36
[PRIM_REF][INFO]: L2_size:524288 bytes; per_core_L3_size:2097152 bytes; nthr:6; impl_name:brg_matmul:avx2_vnni_2
5:FAILED (errors:24 total:1024) (30 ms) __REPRO: --matmul --engine=gpu --dt=bf16:bf16:bf16 --stag=abc --wtag=acb --dtag=abc 1024x1x96:1024x96x1
6:PASSED (30 ms) __REPRO: --matmul --engine=gpu --dt=bf16:bf16:bf16 --stag=abc --wtag=acb --dtag=abc 1024x1x96:1024x96x1
7:PASSED (31 ms) __REPRO: --matmul --engine=gpu --dt=bf16:bf16:bf16 --stag=abc --wtag=acb --dtag=abc 1024x1x96:1024x96x1
8:PASSED (32 ms) __REPRO: --matmul --engine=gpu --dt=bf16:bf16:bf16 --stag=abc --wtag=acb --dtag=abc 1024x1x96:1024x96x1
9:PASSED (42 ms) __REPRO: --matmul --engine=gpu --dt=bf16:bf16:bf16 --stag=abc --wtag=acb --dtag=abc 1024x1x96:1024x96x1
============================================================
= Implementation statistics (--summary=no-impl to disable) =
============================================================
| jit:gemm:any : 10 (100%)                                 |
============================================================
===========================================================
= Failed cases summary (--summary=no-failures to disable) =
===========================================================
2:FAILED (errors:11 total:1024) (33 ms) __REPRO: --matmul --engine=gpu --dt=bf16:bf16:bf16 --stag=abc --wtag=acb --dtag=abc 1024x1x96:1024x96x1
5:FAILED (errors:24 total:1024) (30 ms) __REPRO: --matmul --engine=gpu --dt=bf16:bf16:bf16 --stag=abc --wtag=acb --dtag=abc 1024x1x96:1024x96x1
============================
tests:10 passed:8 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:2 listed:0
total: 0.40s; create_pd: 0.01s (2%); create_prim: 0.00s (0%); fill: 0.25s (63%); execute: 0.03s (7%); compute_ref: 0.00s (1%); compare: 0.01s (3%);

@hidefromkgb hidefromkgb merged commit 7b351ae into main Oct 16, 2025
29 of 30 checks passed
@hidefromkgb hidefromkgb deleted the aguskov/gemm_batched_gs branch October 16, 2025 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:tests Codeowner: @oneapi-src/onednn-arch platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants