Reconsider the use of forwarding references throughout all SYCL backend kernel submitters #2041

mmichel11 · 2025-02-03T18:43:05Z

Describe the Bug:
Almost all of oneDPL's SYCL backend kernel submitters rely on forwarding references of the execution policy and any ranges used within the kernel. Here is an example from our reduce submitter:

template <typename _ExecutionPolicy, typename _Size, typename _ReduceOp, typename _TransformOp, typename... _Ranges>
auto
operator()(oneapi::dpl::__internal::__device_backend_tag, _ExecutionPolicy&& __exec, const _Size __n,
    const _Size __work_group_size, const _Size __iters_per_work_item, _ReduceOp __reduce_op,
    __transform_op,
    const __result_and_scratch_storage<_ExecutionPolicy, _Tp>& __scratch_container,
    _Ranges&&... __rngs) ...

With these forwarding references, different cv / ref qualifiers on the same execution policy will lead to separate function template instantiations by the compiler. Similarly, the same can be said for the ranges.

With these separate submitter function template instantiations, a new kernel is compiled per instantiation. However, _ExecutionPolicy&& __exec is only used for kernel submission, so logically no new kernel is needed. Similarly, for ranges, these will be passed in as lightweight views, and we may be able to just accept these by-value.

If the user is compiling with unnamed lambda naming and the submitter is using our internal "kernel name provider", then I think there is no risk of a compilation error here. However, we may compile more kernels than what is logically necessary depending on how the user passes policies / ranges. This will lead to long JIT / AOT compile times.

If the user is trying to name kernels themselves or the underlying submitter is using our "kernel compiler" internal API for kernel bundles, then we may see compilation errors regarding duplicate kernel names.

To Reproduce:
The following results in compilation error using no unnamed lambdas (icpx 2025.0.0 and oneDPL 2022.7.0) despite the same underlying policy being used:

// icpx -fsycl -fno-sycl-unnamed-lambda reduce.cpp
// We will see a compilation error.
#include <oneapi/dpl/execution>
#include <oneapi/dpl/algorithm>

#include <iostream>

int main()
{
    sycl::queue q;
    int n = 10;
    int *ptr = sycl::malloc_shared<int>(n, q);
    q.fill(ptr, 1, n).wait();

    oneapi::dpl::execution::device_policy<class kernel> policy{q};

    auto res1 = oneapi::dpl::reduce(policy, ptr, ptr + n);
    auto res2 = oneapi::dpl::reduce(std::move(policy), ptr, ptr + n);

    std::cout << res1 << " " << res2 << std::endl;
}

results in the following error:

In file included from reduce.cpp:1:
In file included from /opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/execution:67:
In file included from /opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/algorithm_impl.h:26:
In file included from /opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/execution_impl.h:22:
In file included from /opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/parallel_backend.h:32:
In file included from /opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl.h:38:
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:149:17: error: definition with same mangled name '_ZTSN6oneapi3dpl20__par_backend_hetero21__reduce_small_kernelIJZ4mainE6kernelEEE' as another definition
  149 |                 [=](sycl::nd_item<1> __item_id) {
      |                 ^
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:149:17: note: previous definition is here
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:214:17: error: definition with same mangled name '_ZTSN6oneapi3dpl20__par_backend_hetero26__reduce_mid_device_kernelIJZ4mainE6kernelEEE' as another definition
  214 |                 [=](sycl::nd_item<1> __item_id) {
      |                 ^
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:214:17: note: previous definition is here
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:262:17: error: definition with same mangled name '_ZTSN6oneapi3dpl20__par_backend_hetero30__reduce_mid_work_group_kernelIJZ4mainE6kernelEEE' as another definition
  262 |                 [=](sycl::nd_item<1> __item_id) {
      |                 ^
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:262:17: note: previous definition is here
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:377:21: error: definition with same mangled name '_ZTSN6oneapi3dpl20__par_backend_hetero15__reduce_kernelIJZ4mainE6kernelEEE' as another definition
  377 |                     [=](sycl::nd_item<1> __item_id) {
      |                     ^
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:377:21: note: previous definition is here
4 errors generated.

Similarly for the above case with unnamed lambdas, we compile more kernels than is necessary. Here are the kernel names produced from a shader dump using "lazy compilation mode" (-fsycl-device-code-split=per_kernel) which shows that two reduction kernels are compiled when only one is needed:

Kernel 1:

.kernel "_ZTSZZNK6oneapi3dpl20__par_backend_hetero43__parallel_transform_reduce_small_submitterIiSt17integral_constantIbLb1EELh4ENS1_10__internal22__optional_kernel_nameIJEEEEclINS0_9execution5__dpl13device_policyINSB_17DefaultKernelNameEEEtSt4plusIiENS0_13unseq_backend6walk_nISE_NS0_10__internal7__no_opEEENSH_12__init_valueIiEEJNS0_8__ranges10guard_viewIPiEEEEEDaNSJ_20__device_backend_tagEOT_T0_SV_SV_T1_T2_T3_DpOT4_ENKUlRN4sycl3_V17handlerEE_clES15_EUlNS13_7nd_itemILi1EEEE_"

or their demangled presentation:

.kernel "typeinfo name for oneapi::dpl::__par_backend_hetero::__parallel_transform_reduce_small_submitter<int, std::integral_constant<bool, true>, (unsigned char)4, oneapi::dpl::__par_backend_hetero::__internal::__optional_kernel_name<> >::operator()<oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>, unsigned short, std::plus<int>, oneapi::dpl::unseq_backend::walk_n<oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>, oneapi::dpl::__internal::__no_op>, oneapi::dpl::unseq_backend::__init_value<int>, oneapi::dpl::__ranges::guard_view<int*> >(oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&&, unsigned short, unsigned short, unsigned short, std::plus<int>, oneapi::dpl::unseq_backend::walk_n<oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>, oneapi::dpl::__internal::__no_op>, oneapi::dpl::unseq_backend::__init_value<int>, oneapi::dpl::__ranges::guard_view<int*>&&) const::{lambda(sycl::_V1::handler&)#1}::operator()(sycl::_V1::handler&) const::{lambda(sycl::_V1::nd_item<1>)#1}"

Kernel 2:

.kernel "_ZTSZZNK6oneapi3dpl20__par_backend_hetero43__parallel_transform_reduce_small_submitterIiSt17integral_constantIbLb1EELh4ENS1_10__internal22__optional_kernel_nameIJEEEEclIRNS0_9execution5__dpl13device_policyINSB_17DefaultKernelNameEEEtSt4plusIiENS0_13unseq_backend6walk_nISF_NS0_10__internal7__no_opEEENSI_12__init_valueIiEEJNS0_8__ranges10guard_viewIPiEEEEEDaNSK_20__device_backend_tagEOT_T0_SW_SW_T1_T2_T3_DpOT4_ENKUlRN4sycl3_V17handlerEE_clES16_EUlNS14_7nd_itemILi1EEEE_"

or their demangled presentation:

.kernel "typeinfo name for oneapi::dpl::__par_backend_hetero::__parallel_transform_reduce_small_submitter<int, std::integral_constant<bool, true>, (unsigned char)4, oneapi::dpl::__par_backend_hetero::__internal::__optional_kernel_name<> >::operator()<oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, unsigned short, std::plus<int>, oneapi::dpl::unseq_backend::walk_n<oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, oneapi::dpl::__internal::__no_op>, oneapi::dpl::unseq_backend::__init_value<int>, oneapi::dpl::__ranges::guard_view<int*> >(oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, unsigned short, unsigned short, unsigned short, std::plus<int>, oneapi::dpl::unseq_backend::walk_n<oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, oneapi::dpl::__internal::__no_op>, oneapi::dpl::unseq_backend::__init_value<int>, oneapi::dpl::__ranges::guard_view<int*>&&) const::{lambda(sycl::_V1::handler&)#1}::operator()(sycl::_V1::handler&) const::{lambda(sycl::_V1::nd_item<1>)#1}"

The difference for these two Kernel names is in oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName> type against the oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>& type.

The text was updated successfully, but these errors were encountered:

By adding `_ExecutionPolicy` into the kernel name, we can work around the duplicate kernel name issue in reduce-then-scan based algorithms. However, a library wide solution is still needed for #2041 --------- Signed-off-by: Matthew Michel <[email protected]>

Insert example from #2041

SergeyKopienko · 2025-03-06T20:02:15Z

@rarutyun could you please comment this issue? Is it really an error?

rarutyun · 2025-03-10T06:24:25Z

First of all this issue does not directly depend on fno-sycl-unnamed-lambda. Right the opposite, it's better to consider it when unnamed lambda is on. And then, the behavioral comparison to me would be explicit kernel name vs implicit kernel name.

Example with implicit name:

int main()
{
    sycl::queue q;
    int n = 10;
    int *ptr = sycl::malloc_shared<int>(n, q);
    q.fill(ptr, 1, n).wait();

    // implicit kernel name
    oneapi::dpl::execution::device_policy policy{q};

    auto res1 = oneapi::dpl::reduce(policy, ptr, ptr + n);
    auto res2 = oneapi::dpl::reduce(std::move(policy), ptr, ptr + n);

    std::cout << res1 << " " << res2 << std::endl;
}

Should the example above work? Absolutely (unless somebody convinces me otherwise). The users do not care about the name. They just call the API and it should be valid as many times as necessary and do not depend on any template instantiations.

Example with explicit name:

int main()
{
    sycl::queue q;
    int n = 10;
    int *ptr = sycl::malloc_shared<int>(n, q);
    q.fill(ptr, 1, n).wait();

    // Explicit kernel name
    oneapi::dpl::execution::device_policy<class kernel> policy{q};

    auto res1 = oneapi::dpl::reduce(policy, ptr, ptr + n);
    auto res2 = oneapi::dpl::reduce(std::move(policy), ptr, ptr + n);

    std::cout << res1 << " " << res2 << std::endl;
}

Should this work? The short answer: I don't know. My slight preference that in the ideal world it should fail to compiler. The reason is the name is passed explicitly and technically the same name is used for two separate calls. If not template code it surely fails for the same name. Whether it's achievable in the implementation or not, I cannot say now. We need deeper investigation on that.

Furthermore, this question is not unique to ExecutionPolicy. We now have parallel range algorithms which also take ranges by forwarding reference, which implies potentially different value categories for different API calls. Binary transform (but not only) is funny because it has three Range&& parameters.

So, I suggest to have a technical discussion to understand what we want as a team. This problem is not unique for oneDPL. Technically, any template library that wraps SYCL and propagates kernel names has to deal with that somehow

mmichel11 added the bug label Feb 3, 2025

mmichel11 mentioned this issue Feb 3, 2025

Fix duplicate kernel naming in reduce-then-scan kernels #2040

Merged

mmichel11 linked a pull request Mar 4, 2025 that will close this issue

Avoid specializations of the same submitters with the some policy type but with different type qualifiers (l-value, r-value) #2093

Open

SergeyKopienko self-assigned this Mar 5, 2025

SergeyKopienko added a commit that referenced this issue Mar 5, 2025

test/general/lambda_naming.pass.cpp - expand test coverage

abd5ce5

Insert example from #2041

SergeyKopienko mentioned this issue Mar 5, 2025

Replace _ExecutionPolicy&& parameter types by const _ExecutionPolicy& in all internal oneDPL namespaces #2101

Open

SergeyKopienko added a commit that referenced this issue Mar 5, 2025

test/general/lambda_naming.pass.cpp - expand test coverage

4e468fe

Insert example from #2041

SergeyKopienko added a commit that referenced this issue Mar 5, 2025

test/general/lambda_naming.pass.cpp - expand test coverage

55e9cbc

Insert example from #2041

SergeyKopienko added a commit that referenced this issue Mar 5, 2025

test/general/lambda_naming.pass.cpp - expand test coverage

dc7b6fb

Insert example from #2041

SergeyKopienko mentioned this issue Mar 5, 2025

Extend test coverage for different policy qualifiers #2102

Open

SergeyKopienko linked a pull request Mar 6, 2025 that will close this issue

Avoid specializations of the same submitters with the some policy type but with different type qualifiers (l-value, r-value) #2093

Open

SergeyKopienko added a commit that referenced this issue Mar 6, 2025

test/general/lambda_naming.pass.cpp - expand test coverage

d75f5fd

Insert example from #2041

SergeyKopienko added a commit that referenced this issue Mar 6, 2025

test/general/lambda_naming.pass.cpp - expand test coverage

7b35f9e

Insert example from #2041

SergeyKopienko added a commit that referenced this issue Mar 6, 2025

test/general/lambda_naming.pass.cpp - expand test coverage

1b13d6f

Insert example from #2041

rarutyun mentioned this issue Mar 10, 2025

Some __kernel_name_generator usages doesn't depends from _ExecutionPolicy type #2083

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconsider the use of forwarding references throughout all SYCL backend kernel submitters #2041

Reconsider the use of forwarding references throughout all SYCL backend kernel submitters #2041

mmichel11 commented Feb 3, 2025 •

edited by SergeyKopienko

Loading

SergeyKopienko commented Mar 6, 2025

rarutyun commented Mar 10, 2025 •

edited

Loading

Reconsider the use of forwarding references throughout all SYCL backend kernel submitters #2041

Reconsider the use of forwarding references throughout all SYCL backend kernel submitters #2041

Comments

mmichel11 commented Feb 3, 2025 • edited by SergeyKopienko Loading

Kernel 1:

Kernel 2:

SergeyKopienko commented Mar 6, 2025

rarutyun commented Mar 10, 2025 • edited Loading

mmichel11 commented Feb 3, 2025 •

edited by SergeyKopienko

Loading

rarutyun commented Mar 10, 2025 •

edited

Loading