Skip to content

Enable architecture selection for DPCTL_TARGET_CUDA #2096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

vlad-perevezentsev
Copy link
Collaborator

This PR proposes to change DPCTL_TARGET_CUDA CMake option from a boolean to a string allowing users to specify a CUDA architecture (e.g. sm_80). If not specified, it defaults to sm_50.

$ python scripts/build_locally.py --verbose --cmake-opts="-DDPCTL_TARGET_CUDA=<cuda_arch>"
# or
$ python scripts/build_locally.py --verbose --cmake-opts="-DDPCTL_TARGET_CUDA=ON"

The specified architecture is used to construct a SYCL alias target (e.g. nvidia_gpu_sm_80) and passed via -fsycl-targets option, following OneAPI for NVIDIA GPUs

Additionally removing DPCTL_TARGET_CUDA env handling logic

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

Copy link

github-actions bot commented Jun 5, 2025

Copy link

github-actions bot commented Jun 5, 2025

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_8 ran successfully.
Passed: 1115
Failed: 6
Skipped: 119

@coveralls
Copy link
Collaborator

coveralls commented Jun 5, 2025

Coverage Status

coverage: 84.972% (-0.01%) from 84.984%
when pulling feee948 on update_cuda_build
into 556a5c6 on master.

Copy link

github-actions bot commented Jun 5, 2025

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_9 ran successfully.
Passed: 1114
Failed: 7
Skipped: 119

Copy link

github-actions bot commented Jun 5, 2025

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_10 ran successfully.
Passed: 1114
Failed: 7
Skipped: 119

else()
if (DEFINED ENV{DPCTL_TARGET_CUDA})
set(_dpctl_sycl_targets "nvptx64-nvidia-cuda,spir64-unknown-unknown")
if (NOT "x${DPCTL_TARGET_CUDA}" STREQUAL "x")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fair to validate DPCTL_TARGET_CUDA only in case when empty DPCTL_SYCL_TARGETS?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was how we were doing it before—but it looks like current logical flow will add HIP targets even when DPCTL_SYCL_TARGETS is not none, but not CUDA

so that should probably be changed, either make DPCTL_SYCL_TARGETS exclusive from both or check DPCTL_TARGET_CUDA as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, when the user passes DPCTL_SYCL_TARGETS he is responsible for the correctness of the flags.

The logic of checking if (NOT “x${DPCTL_TARGET_HIP}” STREQUAL “x”) when DPCTL_SYCL_TARGETS is not none was added to pass the correct compile and link options.

if(_dpctl_amd_targets)
      list(APPEND _dpctl_sycl_target_compile_options -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=${_dpctl_amd_targets})
      list(APPEND _dpctl_sycl_target_link_options -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=${_dpctl_amd_targets})
   endif()

I am already working on PR that will refresh the logic for AMD build using aliases to remove if(_dpctl_amd_targets) branch.

For reference, compute architecture strings like ``sm_80`` are based on
CUDA Compute Capability. A complete mapping between NVIDIA GPU models and their
respective ``sm_XX`` values can be found in the official
`CUDA GPU Compute Capability <https://developer.nvidia.com/cuda-gpus>`_.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping is not clear from the reference doc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, seems they aren't necessarily related either (see here and below it)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A CUDA developer notes that sm_XX refers to machine code for a specific GPU hardware architecture. Since each Compute Capability version corresponds to a particular architecture (CC 8.0 -> Ampere A100) it is reasonable to say that sm_80 corresponds to CC 8.0

I changed the text a bit

Copy link

github-actions bot commented Jun 6, 2025

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_17 ran successfully.
Passed: 1113
Failed: 8
Skipped: 119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CUDA architecture to CMake option when building for NVidia devices
4 participants