Skip to content

Conversation

vgvozdeva
Copy link
Contributor

This PR includes a release notes draft based on the information from the PRs for the contributors to review. Your additions and corrections are highly appreciated.

@vgvozdeva vgvozdeva requested review from a team as code owners October 10, 2025 11:26
@github-actions github-actions bot added documentation A request to change/fix/improve the documentation. Codeowner: @oneapi-src/onednn-doc backport labels Oct 10, 2025
@@ -0,0 +1,69 @@
# Performance Optimizations
## Intel Architecture Processors
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tprimak Could you please review and update section if required?


[Grouped Query Attention (GQA)]: https://uxlfoundation.github.io/oneDNN/v3.10/dev_guide_graph_gqa.html#gqa-for-training-forward-propagation

## AArch64-based Processors
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@theComputeKid @Sqvid could you please help summarizing AArch64 improvements?

Copy link
Contributor

@Sqvid Sqvid Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will draft something shortly. If I remember correctly, the etiquette is just to push directly to this branch right?

P.S. @Ryo-not-rio is currently more active than @theComputeKid and should probably be tagged instead (or additionally) going forward. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can just push changes directly or mention in the comments here.

* Improved performance of `int8` matmul and inner product primitives with `fp16` destination.
* Improved performance of subgraphs containing sequence of multiple binary ops with Graph API.

## Intel Graphics Products
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karturov Could you please review and update section if required?

@vgvozdeva vgvozdeva requested review from a team October 10, 2025 11:30
@vpirogov vpirogov requested a review from Copilot October 10, 2025 17:15
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds release notes for oneDNN v3.10, documenting performance optimizations, new functionality, and breaking changes for the upcoming release.

  • Comprehensive release notes covering performance improvements across Intel processors and graphics products
  • Documentation of new functional and Graph API features including host-side scalar support
  • Acknowledgment of contributors and deprecation notices for BLAS-like API

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

RELEASE_NOTES.md Outdated
* Improved performance of subgraphs containing sequence of multiple binary ops with Graph API.

## Intel Graphics Products
*Improve GEMM performance for small batch size on Intel Core Ultra processors (Series 2) (formerly Lunar Lake).
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing bullet point formatting. Should start with '* ' instead of '*'.

Suggested change
*Improve GEMM performance for small batch size on Intel Core Ultra processors (Series 2) (formerly Lunar Lake).
* Improve GEMM performance for small batch size on Intel Core Ultra processors (Series 2) (formerly Lunar Lake).

Copilot uses AI. Check for mistakes.

RELEASE_NOTES.md Outdated
* Improved `int8` matmul performance with `int4` weights and per-tensor zero-points.
* Improved `bf16` matmul performance with `fp8` weights.
* Graph API optimizations:
* Improved Scaled Dot Product Attention (SDPA) subgraph performance when relaxed accumulation mode is enabled on Intel Core Ultra processors (formerly Meteor Lake).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RELEASE_NOTES.md Outdated
## Intel Graphics Products
* Introduced support for `fp4` weights in matmul primitive.
* Introduced support for grouped quantization with group size 16 in matmul with int8 compressed weights on Intel GPUs.
* Introduced support group size16 `int8` for decompressed weight with regular weights decompression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like duplication of the previous line.

Suggested change
* Introduced support group size16 `int8` for decompressed weight with regular weights decompression.


* Introduced [`host_scalar` property] for logical tensors. This functionality allows passing host-side scalars instead of device memory objects when using oneDNN with OpenCL or SYCL runtimes.
* Introduced [accumulation mode attribute] support in `Matmul` op. This attribute allows relaxing `fp32` accumulation requirements to achieve performance benefits on some platforms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Introduced fusion support for GQA training forward and backward propagation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition! This one belongs in performance optimizations section though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vpirogov , I removed the one in the performance optimizations section and added this one here.... I was referring to v3.9 release note, where sdpa training forward and backward is mentioned in functionality section..
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport documentation A request to change/fix/improve the documentation. Codeowner: @oneapi-src/onednn-doc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants