-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconsider the use of forwarding references throughout all SYCL backend kernel submitters #2041
Comments
By adding `_ExecutionPolicy` into the kernel name, we can work around the duplicate kernel name issue in reduce-then-scan based algorithms. However, a library wide solution is still needed for #2041 --------- Signed-off-by: Matthew Michel <[email protected]>
@rarutyun could you please comment this issue? Is it really an error? |
First of all this issue does not directly depend on Example with implicit name: int main()
{
sycl::queue q;
int n = 10;
int *ptr = sycl::malloc_shared<int>(n, q);
q.fill(ptr, 1, n).wait();
// implicit kernel name
oneapi::dpl::execution::device_policy policy{q};
auto res1 = oneapi::dpl::reduce(policy, ptr, ptr + n);
auto res2 = oneapi::dpl::reduce(std::move(policy), ptr, ptr + n);
std::cout << res1 << " " << res2 << std::endl;
} Should the example above work? Absolutely (unless somebody convinces me otherwise). The users do not care about the name. They just call the API and it should be valid as many times as necessary and do not depend on any template instantiations. Example with explicit name: int main()
{
sycl::queue q;
int n = 10;
int *ptr = sycl::malloc_shared<int>(n, q);
q.fill(ptr, 1, n).wait();
// Explicit kernel name
oneapi::dpl::execution::device_policy<class kernel> policy{q};
auto res1 = oneapi::dpl::reduce(policy, ptr, ptr + n);
auto res2 = oneapi::dpl::reduce(std::move(policy), ptr, ptr + n);
std::cout << res1 << " " << res2 << std::endl;
} Should this work? The short answer: I don't know. My slight preference that in the ideal world it should fail to compiler. The reason is the name is passed explicitly and technically the same name is used for two separate calls. If not template code it surely fails for the same name. Whether it's achievable in the implementation or not, I cannot say now. We need deeper investigation on that. Furthermore, this question is not unique to So, I suggest to have a technical discussion to understand what we want as a team. This problem is not unique for oneDPL. Technically, any template library that wraps SYCL and propagates kernel names has to deal with that somehow |
Describe the Bug:
Almost all of oneDPL's SYCL backend kernel submitters rely on forwarding references of the execution policy and any ranges used within the kernel. Here is an example from our reduce submitter:
With these forwarding references, different cv / ref qualifiers on the same execution policy will lead to separate function template instantiations by the compiler. Similarly, the same can be said for the ranges.
With these separate submitter function template instantiations, a new kernel is compiled per instantiation. However,
_ExecutionPolicy&& __exec
is only used for kernel submission, so logically no new kernel is needed. Similarly, for ranges, these will be passed in as lightweight views, and we may be able to just accept these by-value.If the user is compiling with unnamed lambda naming and the submitter is using our internal "kernel name provider", then I think there is no risk of a compilation error here. However, we may compile more kernels than what is logically necessary depending on how the user passes policies / ranges. This will lead to long JIT / AOT compile times.
If the user is trying to name kernels themselves or the underlying submitter is using our "kernel compiler" internal API for kernel bundles, then we may see compilation errors regarding duplicate kernel names.
To Reproduce:
The following results in compilation error using no unnamed lambdas (icpx 2025.0.0 and oneDPL 2022.7.0) despite the same underlying policy being used:
results in the following error:
Similarly for the above case with unnamed lambdas, we compile more kernels than is necessary. Here are the kernel names produced from a shader dump using "lazy compilation mode" (-fsycl-device-code-split=per_kernel) which shows that two reduction kernels are compiled when only one is needed:
Kernel 1:
or their demangled presentation:
Kernel 2:
or their demangled presentation:
The difference for these two Kernel names is in
oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>
type against theoneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&
type.The text was updated successfully, but these errors were encountered: