Skip to content

Commit 02e472a

Browse files
committed
Release notes for 2022.7.0 (#1862)
--------- Co-authored-by: Dmitriy Sobolev <[email protected]> Co-authored-by: Adam Fidel <[email protected]> Co-authored-by: Matthew Michel <[email protected]> Co-authored-by: Alexey Kukanov <[email protected]> Co-authored-by: Ruslan Arutyunyan <[email protected]>
1 parent d52a56d commit 02e472a

File tree

1 file changed

+101
-0
lines changed

1 file changed

+101
-0
lines changed

documentation/release_notes.rst

+101
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,107 @@ The Intel® oneAPI DPC++ Library (oneDPL) accompanies the Intel® oneAPI DPC++/C
88
and provides high-productivity APIs aimed to minimize programming efforts of C++ developers
99
creating efficient heterogeneous applications.
1010

11+
New in 2022.7.0
12+
===============
13+
14+
New Features
15+
------------
16+
- Improved performance of the ``adjacent_find``, ``all_of``, ``any_of``, ``copy_if``, ``exclusive_scan``, ``equal``,
17+
``find``, ``find_if``, ``find_end``, ``find_first_of``, ``find_if_not``, ``inclusive_scan``, ``includes``,
18+
``is_heap``, ``is_heap_until``, ``is_partitioned``, ``is_sorted``, ``is_sorted_until``, ``lexicographical_compare``,
19+
``max_element``, ``min_element``, ``minmax_element``, ``mismatch``, ``none_of``, ``partition``, ``partition_copy``,
20+
``reduce``, ``remove``, ``remove_copy``, ``remove_copy_if``, ``remove_if``, ``search``, ``search_n``,
21+
``stable_partition``, ``transform_exclusive_scan``, ``transform_inclusive_scan``, ``unique``, and ``unique_copy``
22+
algorithms with device policies.
23+
- Improved performance of ``sort``, ``stable_sort`` and ``sort_by_key`` algorithms with device policies when using Merge
24+
sort [#fnote1]_.
25+
- Added ``stable_sort_by_key`` algorithm in ``namespace oneapi::dpl``.
26+
- Added parallel range algorithms in ``namespace oneapi::dpl::ranges``: ``all_of``, ``any_of``,
27+
``none_of``, ``for_each``, ``find``, ``find_if``, ``find_if_not``, ``adjacent_find``, ``search``, ``search_n``,
28+
``transform``, ``sort``, ``stable_sort``, ``is_sorted``, ``merge``, ``count``, ``count_if``, ``equal``, ``copy``,
29+
``copy_if``, ``min_element``, ``max_element``. These algorithms operate with C++20 random access ranges
30+
and views while also taking an execution policy similarly to other oneDPL algorithms.
31+
- Added support for operators ==, !=, << and >> for RNG engines and distributions.
32+
- Added experimental support for the Philox RNG engine in ``namespace oneapi::dpl::experimental``.
33+
- Added the ``<oneapi/dpl/version>`` header containing oneDPL version macros and new feature testing macros.
34+
35+
Fixed Issues
36+
------------
37+
- Fixed unused variable and unused type warnings.
38+
- Fixed memory leaks when using ``sort`` and ``stable_sort`` algorithms with the oneTBB backend.
39+
- Fixed a build error for ``oneapi::dpl::begin`` and ``oneapi::dpl::end`` functions used with
40+
the Microsoft* Visual C++ standard library and with C++20.
41+
- Reordered template parameters of the ``histogram`` algorithm to match its function parameter order.
42+
For affected ``histogram`` calls we recommend to remove explicit specification of template parameters
43+
and instead add explicit type conversions of the function arguments as necessary.
44+
- ``gpu::esimd::radix_sort`` and ``gpu::esimd::radix_sort_by_key`` kernel templates now throw ``std::bad_alloc``
45+
if they fail to allocate global memory.
46+
- Fixed a potential hang occurring with ``gpu::esimd::radix_sort`` and
47+
``gpu::esimd::radix_sort_by_key`` kernel templates.
48+
- Fixed documentation for ``sort_by_key`` algorithm, which used to be mistakenly described as stable, despite being
49+
possibly unstable for some execution policies. If stability is required, use ``stable_sort_by_key`` instead.
50+
- Fixed an error when calling ``sort`` with device execution policies on CUDA devices.
51+
- Allow passing C++20 random access iterators to oneDPL algorithms.
52+
- Fixed issues caused by initialization of SYCL queues in the predefined device execution policies.
53+
These policies have been updated to be immutable (``const``) objects.
54+
55+
Known Issues and Limitations
56+
----------------------------
57+
New in This Release
58+
^^^^^^^^^^^^^^^^^^^
59+
- ``histogram`` may provide incorrect results with device policies in a program built with -O0 option.
60+
- Inclusion of ``<oneapi/dpl/dynamic_selection>`` prior to ``<oneapi/dpl/random>`` may result in compilation errors.
61+
Include ``<oneapi/dpl/random>`` first as a workaround.
62+
- Incorrect results may occur when using ``oneapi::dpl::experimental::philox_engine`` with no predefined template
63+
parameters and with `word_size` values other than 64 and 32.
64+
- Incorrect results or a synchronous SYCL exception may be observed with the following algorithms built
65+
with -O0 option and executed on a GPU device: ``exclusive_scan``, ``inclusive_scan``, ``transform_exclusive_scan``,
66+
``transform_inclusive_scan``, ``copy_if``, ``remove``, ``remove_copy``, ``remove_copy_if``, ``remove_if``,
67+
``partition``, ``partition_copy``, ``stable_partition``, ``unique``, ``unique_copy``, and ``sort``.
68+
- The value type of the input sequence should be convertible to the type of the initial element for the following
69+
algorithms with device execution policies: ``transform_inclusive_scan``, ``transform_exclusive_scan``,
70+
``inclusive_scan``, and ``exclusive_scan``.
71+
- The following algorithms with device execution policies may exceed the C++ standard requirements on the number
72+
of applications of user-provided predicates or equality operators: ``copy_if``, ``remove``, ``remove_copy``,
73+
``remove_copy_if``, ``remove_if``, ``partition_copy``, ``unique``, and ``unique_copy``. In all cases,
74+
the predicate or equality operator is applied ``O(n)`` times.
75+
- The ``adjacent_find``, ``all_of``, ``any_of``, ``equal``, ``find``, ``find_if``, ``find_end``, ``find_first_of``,
76+
``find_if_not``, ``includes``, ``is_heap``, ``is_heap_until``, ``is_sorted``, ``is_sorted_until``, ``mismatch``,
77+
``none_of``, ``search``, and ``search_n`` algorithms may cause a segmentation fault when used with a device execution
78+
policy on a CPU device, and built on Linux with Intel® oneAPI DPC++/C++ Compiler 2025.0.0 and -O0 -g compiler options.
79+
80+
Existing Issues
81+
^^^^^^^^^^^^^^^
82+
See oneDPL Guide for other `restrictions and known limitations`_.
83+
84+
- ``histogram`` algorithm requires the output value type to be an integral type no larger than 4 bytes
85+
when used with an FPGA policy.
86+
- Compilation issues may be encountered when passing zip iterators to ``exclusive_scan_by_segment`` on Windows.
87+
- For ``transform_exclusive_scan`` and ``exclusive_scan`` to run in-place (that is, with the same data
88+
used for both input and destination) and with an execution policy of ``unseq`` or ``par_unseq``,
89+
it is required that the provided input and destination iterators are equality comparable.
90+
Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
91+
If these conditions are not met, the result of these algorithm calls is undefined.
92+
- ``sort``, ``stable_sort``, ``sort_by_key``, ``stable_sort_by_key``, ``partial_sort_copy`` algorithms
93+
may work incorrectly or cause a segmentation fault when used a device execution policy on a CPU device,
94+
and built on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options.
95+
To avoid the issue, pass ``-fsycl-device-code-split=per_kernel`` option to the compiler.
96+
- Incorrect results may be produced by ``exclusive_scan``, ``inclusive_scan``, ``transform_exclusive_scan``,
97+
``transform_inclusive_scan``, ``exclusive_scan_by_segment``, ``inclusive_scan_by_segment``, ``reduce_by_segment``
98+
with ``unseq`` or ``par_unseq`` policy when compiled by Intel® oneAPI DPC++/C++ Compiler
99+
with ``-fiopenmp``, ``-fiopenmp-simd``, ``-qopenmp``, ``-qopenmp-simd`` options on Linux.
100+
To avoid the issue, pass ``-fopenmp`` or ``-fopenmp-simd`` option instead.
101+
- Incorrect results may be produced by ``reduce``, ``reduce_by_segment``, and ``transform_reduce``
102+
with 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer
103+
and executed on a GPU device. For a workaround, define the ``ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION``
104+
macro to ``1`` before including oneDPL header files.
105+
- ``std::tuple``, ``std::pair`` cannot be used with SYCL buffers to transfer data between host and device.
106+
- ``std::array`` cannot be swapped in DPC++ kernels with ``std::swap`` function or ``swap`` member function
107+
in the Microsoft* Visual C++ standard library.
108+
- The ``oneapi::dpl::experimental::ranges::reverse`` algorithm is not available with ``-fno-sycl-unnamed-lambda`` option.
109+
- STL algorithm functions (such as ``std::for_each``) used in DPC++ kernels do not compile with the debug version of
110+
the Microsoft* Visual C++ standard library.
111+
11112
New in 2022.6.0
12113
===============
13114
News

0 commit comments

Comments
 (0)