Replies: 2 comments
-
CUDA has alignment requirements to actually generate vector ops unfortunately, so in the IR you're looking for the aligned tag. Here's one way to give halide+llvm sufficient information such that the stores should be known to be aligned:
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the detailed reply. .cpp
PTX code
|
Beta Was this translation helpful? Give feedback.
-
I use the vectorize schedule, but I look into PTX generated, there are no vectorize instructions(ld.global.v4 or st.global.v4 ....).
Did I miss something?
.cpp code
The PTX generated:
HTML (But we can see ramp here)
Beta Was this translation helpful? Give feedback.
All reactions