-
Notifications
You must be signed in to change notification settings - Fork 1.1k
oneDNN v3.10 release notes #4106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vgvozdeva
wants to merge
17
commits into
rls-v3.10
Choose a base branch
from
vgvozdeva/release_notes_update_v3.10
base: rls-v3.10
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+90
−0
Open
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
2a06243
doc: oneDNN v3.10 release notes initial version
vgvozdeva 500c3fb
doc: remove empty lines
vgvozdeva cc6415f
doc: update release notes
vgvozdeva e9b5b8d
doc: update release notes
vgvozdeva 92c945a
doc: update release notes
vgvozdeva 70cf197
doc: update release notes
vgvozdeva d0e73ce
doc: update release notes
vgvozdeva 997fb51
doc: update release notes
vgvozdeva a76ae25
doc: update relase notes
vgvozdeva eaeb788
doc: Update RELEASE_NOTES.md
vgvozdeva 501ed14
doc: Update RELEASE_NOTES.md
vgvozdeva 3f1c2b3
doc: Update RELEASE_NOTES.md
vgvozdeva 339ef80
doc: update release_notes.md for cpu
tprimak 11525de
doc: update release notes
vgvozdeva c740ddc
Add AArch64 release notes for v3.10
Sqvid b11247e
doc: add breaking change details for AArch64
Sqvid c56a140
doc: fix grammatical error
Sqvid File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# Performance Optimizations | ||
## Intel Architecture Processors | ||
* Improved performance on future Intel Xeon processors with Intel AVX 10.2 and Intel AMX instruction sets support. | ||
This functionality is not dispatched by default and requires opt-in with environment variable `ONEDNN_MAX_CPU_ISA=AVX10_2_512_AMX_2`. | ||
* Improved performance on future Intel Core processors with Intel AVX 10.2 instruction set support. This functionality is not dispatched by default and requires opt-in with environment variable `ONEDNN_MAX_CPU_ISA=AVX10_2_512`. | ||
* Improved performance of matmul primitive on processors with Intel AMX support. | ||
* Improved performance of `f32` matmul primitive for GEMV cases on on processors with Intel AVX2 instruction set support. | ||
* Improved matmul performance with `int4` and `int8` compressed weights and per-channel zero-points. | ||
* Improved `f32` matmul performance with `int4` and `int8` compressed weights on processors with Intel AVX2 and Intel AVX512 instruction set support. | ||
* Improved `bf16` matmul performance with `int4` and `int8` compressed weights on processors with Intel AVX512, Intel DL Boost and bfloat16 instruction set support. | ||
* Improved performance of `int8` convolution primitive when using zero points. | ||
* Improved performance of `int8` matmul and inner product primitives with `fp16` destination. | ||
* Improved performance of `f32` and `bf16` convolution primitive with `int8` destination. | ||
* Improved performance of RNN primitive on processors with Intel AVX2 instruction set support when using OpenMP runtime. | ||
* Improved performance of subgraphs containing sequence of multiple binary ops with Graph API. | ||
|
||
## Intel Graphics Products | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @karturov Could you please review and update section if required? |
||
* Improve GEMM performance for small batch size on Intel Core Ultra processors (Series 2) (formerly Lunar Lake). | ||
* Improved matmul performance for Qwen2-7B shapes on Intel Arc graphics (formerly DG2) and Intel Arc Graphics for Intel Core Ultra processors (formerly Arrow Lake-H). | ||
* Improved `int8` matmul performance with `int4` weights and per-tensor zero-points. | ||
* Improved `bf16` matmul performance with `fp8` weights. | ||
* Graph API optimizations: | ||
* Improved [Scaled Dot Product Attention (SDPA)] subgraph performance for inference when relaxed accumulation mode is enabled on Intel Core Ultra processors (formerly Meteor Lake). | ||
* Improved SDPA and GQA subgraphs performance when using host-side scalars. | ||
* Improved performance of GQA subgraph for 2nd token scenarios. | ||
* Improved performance of subgraphs containing sequence of multiple binary ops. | ||
* Improved performance of [Grouped Query Attention (GQA)] subgraphs for training forward and backward propagation. | ||
|
||
[Grouped Query Attention (GQA)]: https://uxlfoundation.github.io/oneDNN/v3.10/dev_guide_graph_gqa.html#gqa-for-training-forward-propagation | ||
[Scaled Dot Product Attention (SDPA)]: https://uxlfoundation.github.io/oneDNN/v3.10/dev_guide_graph_sdpa.html | ||
## AArch64-based Processors | ||
Sqvid marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Improved performance of reorder primitive | ||
* Improved performance of `bf16` convolutions | ||
* Improved performance of convolutions on 128-bit SVE platforms | ||
* Improved performance of eltwise on Arm® Neoverse™ N1 | ||
|
||
# Functionality | ||
## Functional API | ||
|
||
* Introduced [host-side scalar memory objects]. This functionality allows passing host-side scalars instead of device memory objects when using oneDNN with OpenCL or SYCL runtimes. Host-side scalars are currently supported in matmul and convolution primitives on Intel GPUs. | ||
|
||
[host-side scalar memory objects]: https://uxlfoundation.github.io/oneDNN/v3.10/dev_guide_host_side_scalars.html | ||
* Introduced support for pre-computed reductions in matmul primitive. This functionality is intended to improve performance in case of `int8` activations and `int8` weights with zero-point. | ||
|
||
## Graph API | ||
|
||
* Introduced [`host_scalar` property] for logical tensors. This functionality allows passing host-side scalars instead of device memory objects when using oneDNN with OpenCL or SYCL runtimes. Host-side scalars are currently supported to define attention scale, sequence length, and the negative infinity value in SDPA/GQA subgraphs. | ||
* Introduced [accumulation mode attribute] support in `Matmul` op. This attribute allows relaxing `fp32` accumulation requirements to achieve performance benefits on some platforms. | ||
|
||
vgvozdeva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
[`host_scalar` property]: https://uxlfoundation.github.io/oneDNN/v3.10/enum_dnnl_graph_logical_tensor_property_type.html | ||
[accumulation mode attribute]: https://uxlfoundation.github.io/oneDNN/v3.10/dev_guide_op_matmul.html | ||
|
||
## Intel Graphics Products | ||
* Introduced support for `fp4` weights in matmul primitive. | ||
* Introduced support for grouped quantization with group size 16 in matmul with `int8` compressed weights. | ||
* Introduced support group size 16 `int8` for decompressed weight with regular weights decompression. | ||
|
||
## Intel Architecture Processors | ||
* Introduced `fp4` weights support for `fp32` matmul and convolution for future Intel Xeon processors with Intel AVX10.2 instruction set support. | ||
|
||
# Usability | ||
* Extended diagnostics available in verbose mode for primitive descriptor creation issues. | ||
* Extended dispatch diagnostics in verbose mode output for primitives implementations on Intel GPUs. | ||
|
||
## AArch64-based Processors | ||
* Fixed crashes in backward-pass convolutions | ||
* Fixed numerical errors in 4D matmul primitives | ||
* Fixed numerical errors in low-precision convolutions | ||
* Fixed numerical errors in reorders with compensation | ||
* Fixed illegal-instruction crashes on Arm® Neoverse™ N1 | ||
* Fixed crashes in binary primitive in Debug builds | ||
* Fixed segmentation fault in `eltwise_log` post-ops for large kernels | ||
|
||
# Validation | ||
|
||
# Deprecated Functionality | ||
* [BLAS-like API] including `dnnl::sgemm`, `dnnl::gemm_u8s8s32`, and `dnnl::gemm_s8s8s32` functions is deprecated and will be removed in future releases. If you are using this API consider switching to [matmul primitive]. | ||
|
||
[BLAS-like API]: https://uxlfoundation.github.io/oneDNN/v3.10/group_dnnl_api_blas.html | ||
[matmul primitive]: https://uxlfoundation.github.io/oneDNN/v3.10/dev_guide_matmul.html | ||
|
||
# Breaking Changes | ||
## AArch64-based Processors | ||
* Bumped the minimum required [Arm® Compute Library](https://github.com/ARM-software/ComputeLibrary) version to 52.4.0 | ||
|
||
# Thanks to our Contributors | ||
This release contains contributions from the [project core team] as well as Andrei Hutu @Anndrey24, | ||
Anna Sztukowska @asztukow, Arseniy Obolenskiy @aobolensk, Avanish Tiwari @Tiwari-Avanish, Daniel Kuts @apach301, Daniel Whittaker @danwhittaker-arm, Deeksha Kasture @kasturedeeksha, George Nash @georgen117, Henry Gardiner @henry-gar, Keanu Czirjak @keanucz, Krishna Sai @krishnasai-mcw, Marek Michalowski @michalowski-arm, Sheldon Robinson @sheldonrobinson, @Shreyas-fuj, Viktoriia Gvozdeva @vgvozdeva, Xiang1 Guo, Yejing Lai @Yejing-Lai, Yonghao Gu, Yusuf Butt @UseTheForce007, Zhibo Li @zhili03, @almayne, @co63oc, @focusunsink, @gassan-arm, @jstachowintel, @pmanczak, @puneetmatharu, @raistefintel, @vishwascm, @vyevtyus, @zhangfeiv0, @zhangjian29, and @xiazhuozhao. | ||
|
||
[project core team]: https://github.com/uxlfoundation/oneDNN/blob/rls-v3.10/MAINTAINERS.md |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.