-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make NVRTC work with Wave headers #11500
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This pull request was exported from Phabricator. Differential Revision: D65760296 |
Summary: NVRTC requires headers to be extracted into text strings and passed to the compiler. This is done by the jitify utility added from Nvidia, added to external. Extracting the headers takes a long time. This is therefore done on first use and the headers are subsequently kept as strings. NVRTC further requires some header substitutions that are provided by jitify. jitify also provides defaults for --gpu-architecture for NVRTC. Even with jitify, NVRTC fails when including <cuda/atomic> and cub at the same time. We depend on <cuda/atomic> and <cuda/semaphore> and extract warp scan from cub as jit/Scan.cuh. Other cub items may be similarly extracted if needed. All Wave headers that may be needed by NVRTC are extracted into jit/Headers.h. A make_headers.sh script is added for regenerating Headers.h from actual headers. We add the stringify utility from NVIDIA to make escapes when changing headers to string literals. We add tests that verify use of <cuda/semaphore> for updating aggregates where the updating kernel is compiled at run time. We also add a test for the warp scan replacement in jit/. Pull Request resolved: facebookincubator#11500 Differential Revision: D65760296 Pulled By: oerling
Summary: NVRTC requires headers to be extracted into text strings and passed to the compiler. This is done by the jitify utility added from Nvidia, added to external. Extracting the headers takes a long time. This is therefore done on first use and the headers are subsequently kept as strings. NVRTC further requires some header substitutions that are provided by jitify. jitify also provides defaults for --gpu-architecture for NVRTC. Even with jitify, NVRTC fails when including <cuda/atomic> and cub at the same time. We depend on <cuda/atomic> and <cuda/semaphore> and extract warp scan from cub as jit/Scan.cuh. Other cub items may be similarly extracted if needed. All Wave headers that may be needed by NVRTC are extracted into jit/Headers.h. A make_headers.sh script is added for regenerating Headers.h from actual headers. We add the stringify utility from NVIDIA to make escapes when changing headers to string literals. We add tests that verify use of <cuda/semaphore> for updating aggregates where the updating kernel is compiled at run time. We also add a test for the warp scan replacement in jit/. Pull Request resolved: facebookincubator#11500 Differential Revision: D65760296 Pulled By: oerling
This pull request was exported from Phabricator. Differential Revision: D65760296 |
NVRTC requires headers to be extracted into text strings and passed to the compiler. This is done by the jitify utility added from Nvidia, added to external. Extracting the headers takes a long time. This is therefore done on first use and the headers are subsequently kept as strings. NVRTC further requires some header substitutions that are provided by jitify. jitify also provides defaults for --gpu-architecture for NVRTC.
Even with jitify, NVRTC fails when including <cuda/atomic> and cub at the same time. We depend on <cuda/atomic> and <cuda/semaphore> and extract warp scan from cub as jit/Scan.cuh. Other cub items may be similarly extracted if needed.
All Wave headers that may be needed by NVRTC are extracted into jit/Headers.h. A make_headers.sh script is added for regenerating Headers.h from actual headers. We add the stringify utility from NVIDIA to make escapes when changing headers to string literals.
We add tests that verify use of <cuda/semaphore> for updating aggregates where the updating kernel is compiled at run time. We also add a test for the warp scan replacement in jit/.