Description
The windowing stride syntax doesn't work quite right when configured with -DCUDA=ON
.
These 3 test cases fail:
890 - windowing/stride.windowing/(dense,compressed) (Failed)
891 - windowing/stride.windowing/(compressed,dense) (Failed)
892 - windowing/stride.windowing/(compressed,compressed) (Failed)
I dug into this a bit, and the stride syntax causes taco to emit C code like this:
if (jB2_window % 5 != 0) {
jB++;
continue;
}
But the CUDA codegen doesn't emit continue
for a Continue
op. Instead it returns, like this:
if (jB2_window % 5 != 0) {
jB = jB + 1;
return;
}
Returning is wrong in this case, it effectively drops the remainder of the thread's work on the floor, producing incorrect output. Changing that return
to a continue
fixes the problem. Adding a flag to force it to emit continue
in just this one case allows the tests to pass.
I cooked up a hacky workaround which does that. I don't think it's quite right. It just forces the CUDA codegen to emit a continue, rather than giving the CUDA codegen enough info to decide for itself. But hopefully it illustrates the problem.
Cc: @rohany