@@ -8,6 +8,107 @@ The Intel® oneAPI DPC++ Library (oneDPL) accompanies the Intel® oneAPI DPC++/C
8
8
and provides high-productivity APIs aimed to minimize programming efforts of C++ developers
9
9
creating efficient heterogeneous applications.
10
10
11
+ New in 2022.7.0
12
+ ===============
13
+
14
+ New Features
15
+ ------------
16
+ - Improved performance of the ``adjacent_find ``, ``all_of ``, ``any_of ``, ``copy_if ``, ``exclusive_scan ``, ``equal ``,
17
+ ``find ``, ``find_if ``, ``find_end ``, ``find_first_of ``, ``find_if_not ``, ``inclusive_scan ``, ``includes ``,
18
+ ``is_heap ``, ``is_heap_until ``, ``is_partitioned ``, ``is_sorted ``, ``is_sorted_until ``, ``lexicographical_compare ``,
19
+ ``max_element ``, ``min_element ``, ``minmax_element ``, ``mismatch ``, ``none_of ``, ``partition ``, ``partition_copy ``,
20
+ ``reduce ``, ``remove ``, ``remove_copy ``, ``remove_copy_if ``, ``remove_if ``, ``search ``, ``search_n ``,
21
+ ``stable_partition ``, ``transform_exclusive_scan ``, ``transform_inclusive_scan ``, ``unique ``, and ``unique_copy ``
22
+ algorithms with device policies.
23
+ - Improved performance of ``sort ``, ``stable_sort `` and ``sort_by_key `` algorithms with device policies when using Merge
24
+ sort [#fnote1 ]_.
25
+ - Added ``stable_sort_by_key `` algorithm in ``namespace oneapi::dpl ``.
26
+ - Added parallel range algorithms in ``namespace oneapi::dpl::ranges ``: ``all_of ``, ``any_of ``,
27
+ ``none_of ``, ``for_each ``, ``find ``, ``find_if ``, ``find_if_not ``, ``adjacent_find ``, ``search ``, ``search_n ``,
28
+ ``transform ``, ``sort ``, ``stable_sort ``, ``is_sorted ``, ``merge ``, ``count ``, ``count_if ``, ``equal ``, ``copy ``,
29
+ ``copy_if ``, ``min_element ``, ``max_element ``. These algorithms operate with C++20 random access ranges
30
+ and views while also taking an execution policy similarly to other oneDPL algorithms.
31
+ - Added support for operators ==, !=, << and >> for RNG engines and distributions.
32
+ - Added experimental support for the Philox RNG engine in ``namespace oneapi::dpl::experimental ``.
33
+ - Added the ``<oneapi/dpl/version> `` header containing oneDPL version macros and new feature testing macros.
34
+
35
+ Fixed Issues
36
+ ------------
37
+ - Fixed unused variable and unused type warnings.
38
+ - Fixed memory leaks when using ``sort `` and ``stable_sort `` algorithms with the oneTBB backend.
39
+ - Fixed a build error for ``oneapi::dpl::begin `` and ``oneapi::dpl::end `` functions used with
40
+ the Microsoft* Visual C++ standard library and with C++20.
41
+ - Reordered template parameters of the ``histogram `` algorithm to match its function parameter order.
42
+ For affected ``histogram `` calls we recommend to remove explicit specification of template parameters
43
+ and instead add explicit type conversions of the function arguments as necessary.
44
+ - ``gpu::esimd::radix_sort `` and ``gpu::esimd::radix_sort_by_key `` kernel templates now throw ``std::bad_alloc ``
45
+ if they fail to allocate global memory.
46
+ - Fixed a potential hang occurring with ``gpu::esimd::radix_sort `` and
47
+ ``gpu::esimd::radix_sort_by_key `` kernel templates.
48
+ - Fixed documentation for ``sort_by_key `` algorithm, which used to be mistakenly described as stable, despite being
49
+ possibly unstable for some execution policies. If stability is required, use ``stable_sort_by_key `` instead.
50
+ - Fixed an error when calling ``sort `` with device execution policies on CUDA devices.
51
+ - Allow passing C++20 random access iterators to oneDPL algorithms.
52
+ - Fixed issues caused by initialization of SYCL queues in the predefined device execution policies.
53
+ These policies have been updated to be immutable (``const ``) objects.
54
+
55
+ Known Issues and Limitations
56
+ ----------------------------
57
+ New in This Release
58
+ ^^^^^^^^^^^^^^^^^^^
59
+ - ``histogram `` may provide incorrect results with device policies in a program built with -O0 option.
60
+ - Inclusion of ``<oneapi/dpl/dynamic_selection> `` prior to ``<oneapi/dpl/random> `` may result in compilation errors.
61
+ Include ``<oneapi/dpl/random> `` first as a workaround.
62
+ - Incorrect results may occur when using ``oneapi::dpl::experimental::philox_engine `` with no predefined template
63
+ parameters and with `word_size ` values other than 64 and 32.
64
+ - Incorrect results or a synchronous SYCL exception may be observed with the following algorithms built
65
+ with -O0 option and executed on a GPU device: ``exclusive_scan ``, ``inclusive_scan ``, ``transform_exclusive_scan ``,
66
+ ``transform_inclusive_scan ``, ``copy_if ``, ``remove ``, ``remove_copy ``, ``remove_copy_if ``, ``remove_if ``,
67
+ ``partition ``, ``partition_copy ``, ``stable_partition ``, ``unique ``, ``unique_copy ``, and ``sort ``.
68
+ - The value type of the input sequence should be convertible to the type of the initial element for the following
69
+ algorithms with device execution policies: ``transform_inclusive_scan ``, ``transform_exclusive_scan ``,
70
+ ``inclusive_scan ``, and ``exclusive_scan ``.
71
+ - The following algorithms with device execution policies may exceed the C++ standard requirements on the number
72
+ of applications of user-provided predicates or equality operators: ``copy_if ``, ``remove ``, ``remove_copy ``,
73
+ ``remove_copy_if ``, ``remove_if ``, ``partition_copy ``, ``unique ``, and ``unique_copy ``. In all cases,
74
+ the predicate or equality operator is applied ``O(n) `` times.
75
+ - The ``adjacent_find ``, ``all_of ``, ``any_of ``, ``equal ``, ``find ``, ``find_if ``, ``find_end ``, ``find_first_of ``,
76
+ ``find_if_not ``, ``includes ``, ``is_heap ``, ``is_heap_until ``, ``is_sorted ``, ``is_sorted_until ``, ``mismatch ``,
77
+ ``none_of ``, ``search ``, and ``search_n `` algorithms may cause a segmentation fault when used with a device execution
78
+ policy on a CPU device, and built on Linux with Intel® oneAPI DPC++/C++ Compiler 2025.0.0 and -O0 -g compiler options.
79
+
80
+ Existing Issues
81
+ ^^^^^^^^^^^^^^^
82
+ See oneDPL Guide for other `restrictions and known limitations `_.
83
+
84
+ - ``histogram `` algorithm requires the output value type to be an integral type no larger than 4 bytes
85
+ when used with an FPGA policy.
86
+ - Compilation issues may be encountered when passing zip iterators to ``exclusive_scan_by_segment `` on Windows.
87
+ - For ``transform_exclusive_scan `` and ``exclusive_scan `` to run in-place (that is, with the same data
88
+ used for both input and destination) and with an execution policy of ``unseq `` or ``par_unseq ``,
89
+ it is required that the provided input and destination iterators are equality comparable.
90
+ Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
91
+ If these conditions are not met, the result of these algorithm calls is undefined.
92
+ - ``sort ``, ``stable_sort ``, ``sort_by_key ``, ``stable_sort_by_key ``, ``partial_sort_copy `` algorithms
93
+ may work incorrectly or cause a segmentation fault when used a device execution policy on a CPU device,
94
+ and built on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options.
95
+ To avoid the issue, pass ``-fsycl-device-code-split=per_kernel `` option to the compiler.
96
+ - Incorrect results may be produced by ``exclusive_scan ``, ``inclusive_scan ``, ``transform_exclusive_scan ``,
97
+ ``transform_inclusive_scan ``, ``exclusive_scan_by_segment ``, ``inclusive_scan_by_segment ``, ``reduce_by_segment ``
98
+ with ``unseq `` or ``par_unseq `` policy when compiled by Intel® oneAPI DPC++/C++ Compiler
99
+ with ``-fiopenmp ``, ``-fiopenmp-simd ``, ``-qopenmp ``, ``-qopenmp-simd `` options on Linux.
100
+ To avoid the issue, pass ``-fopenmp `` or ``-fopenmp-simd `` option instead.
101
+ - Incorrect results may be produced by ``reduce ``, ``reduce_by_segment ``, and ``transform_reduce ``
102
+ with 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer
103
+ and executed on a GPU device. For a workaround, define the ``ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION ``
104
+ macro to ``1 `` before including oneDPL header files.
105
+ - ``std::tuple ``, ``std::pair `` cannot be used with SYCL buffers to transfer data between host and device.
106
+ - ``std::array `` cannot be swapped in DPC++ kernels with ``std::swap `` function or ``swap `` member function
107
+ in the Microsoft* Visual C++ standard library.
108
+ - The ``oneapi::dpl::experimental::ranges::reverse `` algorithm is not available with ``-fno-sycl-unnamed-lambda `` option.
109
+ - STL algorithm functions (such as ``std::for_each ``) used in DPC++ kernels do not compile with the debug version of
110
+ the Microsoft* Visual C++ standard library.
111
+
11
112
New in 2022.6.0
12
113
===============
13
114
News
0 commit comments