Skip to content

Conversation

@gonidelis
Copy link
Member

Fixes #6173 by applying sm80 tunings parameters to sm90 compilations (server) and sm86 tuning parameters to sm120 compilations (workstation)

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Oct 9, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Oct 9, 2025
@gonidelis
Copy link
Member Author

Posting some results in Google Sheets instead of laying them here for readability Reasons

@gonidelis
Copy link
Member Author

B200 (SM100)

SM100 perf results spreadsheet

@gonidelis
Copy link
Member Author

NVIDIA GeForce RTX 5090 (SM120)

SM120 perf results spreadsheet

@gonidelis
Copy link
Member Author

NVIDIA H100 PCIe (SM90)

SM90 perf results spreadshseet

@gonidelis
Copy link
Member Author

I did some thorough looking into the sm90 results and for power law distribution segments the only problematic workload is when OffsetT{ct}=I32 Elements{io}=2^22 Segments{io}=2^20 (highlighted rows). This regresses a lot but all the other workloads show significant speedup. The thing is that segment sizes are a runtime param so we cannot have control over it so when to choose the tunings. large and small regress significantly and in a different pattern. So maybe for the time being we should defer merging this.

@gonidelis
Copy link
Member Author

Closing as it fixes some cases but regresses others. Read detailed explanation in the tracking issue.

@gonidelis gonidelis closed this Nov 5, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in CCCL Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG]: Dated segmented sort tuning

1 participant