Skip to content

Allow to control SMEM check in warpspeed scan #8028

@bernhardmgruber

Description

@bernhardmgruber

The warpspeed scan implemenation checks whether it can fit the current workload (for the current tuning) into 48KiB SMEM, and if it can't, will fall back to the old scan implementation.

This is undesirable during tuning, because a random tuning may exceed the 48KiB limit and then benchmark the old scan implementation. We should rather fail to compile in this case and skip the tuning values.

A user providing a custom tuning to scan, may want to opt into ignoring the 48KiB SMEM limit, since they are willing to produce a binary that is not compatible with future CUDA architectures for the sake of higher performance. We are already aware of tunings that would yield better performance than the current CUB tuning, but we cannot implement them since a single stage would already exceed 48 KiB SMEM.

We should thus allow to disable the SMEM check or to enforce it without fallback.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions