Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Doc] Add proposed range_type extension #15962

Open
wants to merge 3 commits into
base: sycl
Choose a base branch
from

Conversation

Pennycook
Copy link
Contributor

This extension proposes a new kernel property that allows developers to declare the range requirements of individual kernels, providing more fine-grained control than existing compiler options and improved error behavior.

This extension proposes a new kernel property that allows developers to declare
the range requirements of individual kernels, providing more fine-grained
control than existing compiler options and improved error behavior.

Signed-off-by: John Pennycook <[email protected]>
@Pennycook Pennycook added the spec extension All issues/PRs related to extensions specifications label Nov 1, 2024
@Pennycook Pennycook requested a review from a team as a code owner November 1, 2024 14:57
If a translation unit is compiled with the `-fsycl-id-queries-fit-in-int`
option, all kernels and `SYCL_EXTERNAL` functions without an explicitly
specified `range_type` property are compiled as-if `range_type<int>` was
specified as a property of that kernel or function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you decorate a non-kernel function with range_type? If so, you should add that to the specification above. Until you write this, I assumed this option could only be used to decorate a kernel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to support in both places, because of things like function pointers and non-inlined functions. Otherwise, a compiler (like DPC++) might compile a function that assumes 32-bit ranges, and try to call it from a kernel that supports 64-bit ranges.

Borrowing again from the default sub-group size stuff, we should probably add wording like this:

This property can also be associated with a device function using the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY macro.

There are special requirements whenever a device function defined in one translation unit makes a call to a device function that is defined in a second translation unit. In such a case, the second device function is always declared using SYCL_EXTERNAL. If the kernel calling these device functions is defined using a range_type property, the functions declared using SYCL_EXTERNAL must be similarly decorated to ensure that a compatible range_type is used. This decoration must exist in both the translation unit making the call and also in the translation unit that defines the function. If the range_type property is missing in the translation unit that makes the call, or if the range_type of the called function is not compatible with the range_type of the calling function, the program is ill-formed and the compiler must raise a diagnostic. Two range_type properties are considered compatible if all values that can be represented by the range_type of the caller function can be represented by the range_type of the called function.

The last sentence is new, and the intent is to allow range_type<int> kernels to call range_type<size_t> functions, but not vice versa.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that looks good. I do wonder, though, if we need this generality. Would it be easier to require the caller and called functions to have the same range_type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. It means the compatibility check becomes something more like a >= than a ==, which doesn't seem like a big implementation change to me. It might have wider implications for bundling optional features, but I don't know a lot about that. @AlexeySachkov, do you think the behavior I've sketched above is implementable?

Assuming that it's implementable, I think the generality is preferable. If a library wants to ship a device function that supports 64-bit indices, it'll mark that function with range_type<size_t>; if doing so prevents user kernels from calling it, that's a big usability issue.

What happens today if we have a kernel in a translation unit compiled with -fsycl-id-queries-fit-in-int and it calls a function in a translation unit compiled with -fno-sycl-id-queries-fit-in-int? If that works, that would be another reason to try and mimic that behavior here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-fsycl-id-queries-fit-in-int does not produce any optional kernel features. All it does is defining the __SYCL_ID_QUERIES_FIT_IN_INT__ macro which is then used in headers and you can trace all its uses.

The few key uses I'm aware of:

  • we have detail/id_queries_fit_in_int header that is used from handler.hpp and it defines helper functions to emit exceptions if user-defined range is too huge
  • defines.hpp uses the macro to do #define __SYCL_ASSUME_INT(x) __builtin_assume((x) <= INT_MAX). The latter is in turn used by files like nd_item.hpp, id.hpp and others to put that assumption into every ID query (get_global_id, operator[], conversion operators, etc.)

Therefore, I think that it should be possible today to perform cross-translation unit calls where translation units are compiled with different value of the aforementioned flag. I'm not entirely sure of what the behavior would be of optimizations which rely on that assumption, because there are many factors which contribute to that (like how exactly and when exactly and which exactly other optimizations have been performed on those translation units and the final linked device code).

@AlexeySachkov, do you think the behavior I've sketched above is implementable?

If the range_type property is missing in the translation unit that makes the call

Does it mean that we should emit an error if forward-declaration of foo in a.cpp differs (range_type property-wise) from its actual definition in b.cpp? If so, I'm not entirely sure if we can catch this.

Simple mismatches like foo calls incompatible bar should be detectable, because it sounds very similar to what we already do for named sub-group sizes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec extension All issues/PRs related to extensions specifications
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants