-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Doc] Add proposed range_type extension #15962
base: sycl
Are you sure you want to change the base?
Conversation
This extension proposes a new kernel property that allows developers to declare the range requirements of individual kernels, providing more fine-grained control than existing compiler options and improved error behavior. Signed-off-by: John Pennycook <[email protected]>
sycl/doc/extensions/proposed/sycl_ext_oneapi_range_type.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_range_type.asciidoc
Outdated
Show resolved
Hide resolved
Signed-off-by: John Pennycook <[email protected]>
Signed-off-by: John Pennycook <[email protected]>
If a translation unit is compiled with the `-fsycl-id-queries-fit-in-int` | ||
option, all kernels and `SYCL_EXTERNAL` functions without an explicitly | ||
specified `range_type` property are compiled as-if `range_type<int>` was | ||
specified as a property of that kernel or function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you decorate a non-kernel function with range_type
? If so, you should add that to the specification above. Until you write this, I assumed this option could only be used to decorate a kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need to support in both places, because of things like function pointers and non-inlined functions. Otherwise, a compiler (like DPC++) might compile a function that assumes 32-bit ranges, and try to call it from a kernel that supports 64-bit ranges.
Borrowing again from the default sub-group size stuff, we should probably add wording like this:
This property can also be associated with a device function using the
SYCL_EXT_ONEAPI_FUNCTION_PROPERTY
macro.There are special requirements whenever a device function defined in one translation unit makes a call to a device function that is defined in a second translation unit. In such a case, the second device function is always declared using
SYCL_EXTERNAL
. If the kernel calling these device functions is defined using arange_type
property, the functions declared usingSYCL_EXTERNAL
must be similarly decorated to ensure that a compatiblerange_type
is used. This decoration must exist in both the translation unit making the call and also in the translation unit that defines the function. If therange_type
property is missing in the translation unit that makes the call, or if therange_type
of the called function is not compatible with therange_type
of the calling function, the program is ill-formed and the compiler must raise a diagnostic. Tworange_type
properties are considered compatible if all values that can be represented by therange_type
of the caller function can be represented by therange_type
of the called function.
The last sentence is new, and the intent is to allow range_type<int>
kernels to call range_type<size_t>
functions, but not vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that looks good. I do wonder, though, if we need this generality. Would it be easier to require the caller and called functions to have the same range_type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. It means the compatibility check becomes something more like a >= than a ==, which doesn't seem like a big implementation change to me. It might have wider implications for bundling optional features, but I don't know a lot about that. @AlexeySachkov, do you think the behavior I've sketched above is implementable?
Assuming that it's implementable, I think the generality is preferable. If a library wants to ship a device function that supports 64-bit indices, it'll mark that function with range_type<size_t>
; if doing so prevents user kernels from calling it, that's a big usability issue.
What happens today if we have a kernel in a translation unit compiled with -fsycl-id-queries-fit-in-int
and it calls a function in a translation unit compiled with -fno-sycl-id-queries-fit-in-int
? If that works, that would be another reason to try and mimic that behavior here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-fsycl-id-queries-fit-in-int
does not produce any optional kernel features. All it does is defining the __SYCL_ID_QUERIES_FIT_IN_INT__
macro which is then used in headers and you can trace all its uses.
The few key uses I'm aware of:
- we have
detail/id_queries_fit_in_int
header that is used fromhandler.hpp
and it defines helper functions to emit exceptions if user-defined range is too huge defines.hpp
uses the macro to do#define __SYCL_ASSUME_INT(x) __builtin_assume((x) <= INT_MAX)
. The latter is in turn used by files likend_item.hpp
,id.hpp
and others to put that assumption into every ID query (get_global_id
,operator[]
, conversion operators, etc.)
Therefore, I think that it should be possible today to perform cross-translation unit calls where translation units are compiled with different value of the aforementioned flag. I'm not entirely sure of what the behavior would be of optimizations which rely on that assumption, because there are many factors which contribute to that (like how exactly and when exactly and which exactly other optimizations have been performed on those translation units and the final linked device code).
@AlexeySachkov, do you think the behavior I've sketched above is implementable?
If the range_type property is missing in the translation unit that makes the call
Does it mean that we should emit an error if forward-declaration of foo
in a.cpp
differs (range_type
property-wise) from its actual definition in b.cpp
? If so, I'm not entirely sure if we can catch this.
Simple mismatches like foo
calls incompatible bar
should be detectable, because it sounds very similar to what we already do for named sub-group sizes
This extension proposes a new kernel property that allows developers to declare the range requirements of individual kernels, providing more fine-grained control than existing compiler options and improved error behavior.