Skip to content

[BUG]: Dated segmented sort tuning #6173

@gevtushenko

Description

@gevtushenko

Is this a duplicate?

Type of Bug

Performance

Component

CUB

Describe the bug

Users reported potential 20x speedup of cub::DeviceSegmentedSort::SortKeys on SM90 for their workload. The issue stems from the fact that max policy for segmented sort is SM86 tuning. If we apply SM80 tuning to SM90, performance is back to normal

How to Reproduce

Run segmented sort benchmark

Expected behavior

We should provide SM90, SM100, and SM120 tunings. As a safe workaround, we can start with copying SM80 policy as SM90 and SM100, and SM86 as SM120.

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working right.

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions