Announcements
- No large announcements of note this release! We've made a lot of small refinements to streamline your ONNX Runtime experience.
GenAI & Advanced Model Features
Enhanced Decoding & Pipeline Support
- Added "chat mode" support for CPU, GPU, and WebGPU.
- Provided support for decoder model pipelines.
- Added support for Java API for MultiLoRA.
API & Compatibility Updates
- Chat mode introduced breaking changes in the API (see migration guide).
Bug Fixes for Model Output
- Fixed Phi series garbage output issues with long prompts.
- Resolved gibberish issues with
top_k
on CPU.
Execution & Core Optimizations
Core Refinements
- Reduced default logger usage for improved efficiency(#23030).
- Fixed a visibility issue in theadpool (#23098).
Execution Provider (EP) Updates
General
- Removed TVM EP from the source tree(#22827).
- Marked NNAPI EP for deprecation (following Google's deprecation of NNAPI).
- Fixed a DLL delay loading issue that impacts WebGPU EP and DirectML EP's usability on Windows (#23111, #23227)
TensorRT EP Improvements
- Added support for TensorRT 10.8.
- onnx-tensorrt open-source parser user: please check here for requirement.
- Assigned DDS ops (
NMS
,RoiAlign
,NonZero
) to TensorRT by default. - Introduced option
trt_op_types_to_exclude
to exclude specific ops from TensorRT assignment.
CUDA EP Improvements
- Added a python API preload_dlls to coexist with PyTorch
- Miscellaneous enhancements for Flux model inference.
QNN EP Improvements
- Introduced QNN shared memory support.
- Improved performance for AI Hub models.
- Added support for QAIRT/QNN SDK 2.31.
- Added Python 3.13 package.
- Miscellaneous bug fixes and enhancements.
- QNN EP is now built as a shared library/DLL by default. To retain previous build behavior, use build option
--use_qnn static_lib
.
DirectML EP Support & Upgrades
- Updated DirectML version from 1.15.2 to 1.15.4(#22635).
OpenVINO EP Improvements
- Introduced OpenVINO EP Weights Sharing feature.
- Added support for various contrib Ops in OVEP:
SkipLayerNormalization
,MatMulNBits
,FusedGemm
,FusedConv
,EmbedLayerNormalization
,BiasGelu
,Attention
,DynamicQuantizeMatMul
,FusedMatMul
,QuickGelu
,SkipSimplifiedLayerNormalization
- Miscellaneous bug fixes and improvements.
VitisAI EP Improvements
- Miscellaneous bug fixes and improvements.
Mobile Platform Enhancements
CoreML Updates
- Added support for caching generated CoreML models.
Extensions & Tokenizer Improvements
Expanded Tokenizer Support
- Now supports more tokenizer models, including
ChatGLM
,Baichuan2
,Phi-4
, etc. - Added full
Phi-4
pre/post-processing support for text, vision, and audio. - Introduced RegEx pattern loading from
tokenizer.json
.
Image Codec Enhancements
ImageCodec
now links to native APIs if available; otherwise, falls back to built-in libraries.
Unified Tokenizer API
- Introduced a new tokenizer op schema to unify the tokenizer codebase.
- Added support for loading tokenizer data from a memory blob in the C API.
Infrastructure & Build Improvements
Runtime Requirements
All the prebuilt Windows packages now require VC++ Runtime version >= 14.40(instead of 14.38). If your VC++ runtime version is lower than that, you may see a crash when ONNX Runtime was initializing. See https://github.com/microsoft/STL/wiki/Changelog#vs-2022-1710 for more details.
Updated minimum iOS and Android SDK requirements to align with React Native 0.76:
All macOS packages now require macOS version >= 13.3.
CMake File Changes
CMake Version: Increased the minimum required CMake version from 3.26 to 3.28.
Python Version: Increased the minimum required Python version from 3.8 to 3.10 for building ONNX Runtime from source.
Improved VCPKG support
Added the following cmake options for WebGPU EP
- onnxruntime_USE_EXTERNAL_DAWN
- onnxruntime_CUSTOM_DAWN_SRC_PATH
- onnxruntime_BUILD_DAWN_MONOLITHIC_LIBRARY
- onnxruntime_ENABLE_PIX_FOR_WEBGPU_EP
- onnxruntime_ENABLE_DAWN_BACKEND_VULKAN
- onnxruntime_ENABLE_DAWN_BACKEND_D3D12
Added cmake option onnxruntime_BUILD_QNN_EP_STATIC_LIB for building with QNN EP as a static library.
Removed cmake option onnxruntime_USE_PREINSTALLED_EIGEN.
Fixed a build issue with Visual Studio 2022 17.3 (#23911)
Modernized Build Tools
- Now using VCPKG for most package builds.
- Upgraded Gradle from 7.x to 8.x.
- Updated JDK from 11 to 17.
- Enabled
onnxruntime_USE_CUDA_NHWC_OPS
by default for CUDA builds. - Added support for WASM64 (build from source; no package published).
Dependency Cleanup
- Removed Google’s
nsync
from dependencies.
Others
Updated Node.js installation script to support network proxy usage (#23231)
Web
- No updates of note.
Contributors
Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:
Changming Sun, Yulong Wang, Tianlei Wu, Jian Chen, Wanming Lin, Adrian Lizarraga, Hector Li, Jiajia Qin, Yifan Li, Edward Chen, Prathik Rao, Jing Fang, shiyi, Vincent Wang, Yi Zhang, Dmitri Smirnov, Satya Kumar Jandhyala, Caroline Zhu, Chi Lo, Justin Chu, Scott McKay, Enrico Galli, Kyle, Ted Themistokleous, dtang317, wejoncy, Bin Miao, Jambay Kinley, Sushanth Rajasankar, Yueqing Zhang, amancini-N, ivberg, kunal-vaishnavi, liqun Fu, Corentin Maravat, Peishen Yan, Preetha Veeramalai, Ranjit Ranjan, Xavier Dupré, amarin16, jzm-intel, kailums, xhcao, A-Satti, Aleksei Nikiforov, Ankit Maheshkar, Javier Martinez, Jianhui Dai, Jie Chen, Jon Campbell, Karim Vadsariya, Michael Tyler, PARK DongHa, Patrice Vignola, Pranav Sharma, Sam Webster, Sophie Schoenmeyer, Ti-Tai Wang, Xu Xing, Yi-Hong Lyu, genmingz@AMD, junchao-zhao, sheetalarkadam, sushraja-msft, Akshay Sonawane, Alexis Tsogias, Ashrit Shetty, Bilyana Indzheva, Chen Feiyue, Christian Larson, David Fan, David Hotham, Dmitry Deshevoy, Frank Dong, Gavin Kinsey, George Wu, Grégoire, Guenther Schmuelling, Indy Zhu, Jean-Michaël Celerier, Jeff Daily, Joshua Lochner, Kee, Malik Shahzad Muzaffar, Matthieu Darbois, Michael Cho, Michael Sharp, Misha Chornyi, Po-Wei (Vincent), Sevag H, Takeshi Watanabe, Wu, Junze, Xiang Zhang, Xiaoyu, Xinpeng Dou, Xinya Zhang, Yang Gu, Yateng Hong, mindest, mingyue, raoanag, saurabh, shaoboyan091, sstamenk, tianf-fff, wonchung-microsoft, xieofxie, zz002