-
Notifications
You must be signed in to change notification settings - Fork 1.1k
oneDNN v3.10 release notes #4106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rls-v3.10
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,69 @@ | |||
# Performance Optimizations | |||
## Intel Architecture Processors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tprimak Could you please review and update section if required?
|
||
[Grouped Query Attention (GQA)]: https://uxlfoundation.github.io/oneDNN/v3.10/dev_guide_graph_gqa.html#gqa-for-training-forward-propagation | ||
|
||
## AArch64-based Processors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@theComputeKid @Sqvid could you please help summarizing AArch64 improvements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will draft something shortly. If I remember correctly, the etiquette is just to push directly to this branch right?
P.S. @Ryo-not-rio is currently more active than @theComputeKid and should probably be tagged instead (or additionally) going forward. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can just push changes directly or mention in the comments here.
* Improved performance of `int8` matmul and inner product primitives with `fp16` destination. | ||
* Improved performance of subgraphs containing sequence of multiple binary ops with Graph API. | ||
|
||
## Intel Graphics Products |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karturov Could you please review and update section if required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds release notes for oneDNN v3.10, documenting performance optimizations, new functionality, and breaking changes for the upcoming release.
- Comprehensive release notes covering performance improvements across Intel processors and graphics products
- Documentation of new functional and Graph API features including host-side scalar support
- Acknowledgment of contributors and deprecation notices for BLAS-like API
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
RELEASE_NOTES.md
Outdated
* Improved performance of subgraphs containing sequence of multiple binary ops with Graph API. | ||
|
||
## Intel Graphics Products | ||
*Improve GEMM performance for small batch size on Intel Core Ultra processors (Series 2) (formerly Lunar Lake). |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing bullet point formatting. Should start with '* ' instead of '*'.
*Improve GEMM performance for small batch size on Intel Core Ultra processors (Series 2) (formerly Lunar Lake). | |
* Improve GEMM performance for small batch size on Intel Core Ultra processors (Series 2) (formerly Lunar Lake). |
Copilot uses AI. Check for mistakes.
RELEASE_NOTES.md
Outdated
* Improved `int8` matmul performance with `int4` weights and per-tensor zero-points. | ||
* Improved `bf16` matmul performance with `fp8` weights. | ||
* Graph API optimizations: | ||
* Improved Scaled Dot Product Attention (SDPA) subgraph performance when relaxed accumulation mode is enabled on Intel Core Ultra processors (formerly Meteor Lake). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RELEASE_NOTES.md
Outdated
## Intel Graphics Products | ||
* Introduced support for `fp4` weights in matmul primitive. | ||
* Introduced support for grouped quantization with group size 16 in matmul with int8 compressed weights on Intel GPUs. | ||
* Introduced support group size16 `int8` for decompressed weight with regular weights decompression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like duplication of the previous line.
* Introduced support group size16 `int8` for decompressed weight with regular weights decompression. |
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Vadim Pirogov <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Vadim Pirogov <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Vadim Pirogov <[email protected]>
|
||
* Introduced [`host_scalar` property] for logical tensors. This functionality allows passing host-side scalars instead of device memory objects when using oneDNN with OpenCL or SYCL runtimes. | ||
* Introduced [accumulation mode attribute] support in `Matmul` op. This attribute allows relaxing `fp32` accumulation requirements to achieve performance benefits on some platforms. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduced fusion support for GQA training forward and backward propagation. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the addition! This one belongs in performance optimizations section though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vpirogov , I removed the one in the performance optimizations section and added this one here.... I was referring to v3.9 release note, where sdpa training forward and backward is mentioned in functionality section..
Co-authored-by: Tao Lv <[email protected]>
Co-authored-by: YixinBao <[email protected]>
Co-authored-by: YixinBao <[email protected]>
This PR includes a release notes draft based on the information from the PRs for the contributors to review. Your additions and corrections are highly appreciated.