-
Notifications
You must be signed in to change notification settings - Fork 359
Description
The warpspeed scan implemenation checks whether it can fit the current workload (for the current tuning) into 48KiB SMEM, and if it can't, will fall back to the old scan implementation.
This is undesirable during tuning, because a random tuning may exceed the 48KiB limit and then benchmark the old scan implementation. We should rather fail to compile in this case and skip the tuning values.
A user providing a custom tuning to scan, may want to opt into ignoring the 48KiB SMEM limit, since they are willing to produce a binary that is not compatible with future CUDA architectures for the sake of higher performance. We are already aware of tunings that would yield better performance than the current CUB tuning, but we cannot implement them since a single stage would already exceed 48 KiB SMEM.
We should thus allow to disable the SMEM check or to enforce it without fallback.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status