gpu - pad out elem loop for shared/gen by jeremylt · Pull Request #1950 · CEED/libCEED

jeremylt · 2026-04-09T19:09:40Z

Purpose:

Ensure all threads hit all syncthreads() for #1942

Closes: #N/A

LLM/GenAI Disclosure:

None

By submitting this PR, the author certifies to its contents as described by the Developer's Certificate of Origin.
Please follow the Contributing Guidelines for all PRs.

nbeams · 2026-04-09T19:51:45Z

I don't know about shared, but for hip-gen, I looked into this awhile back when we first started testing chipStar (so long ago it still had a different name...). I investigated some different kernel options; IIRC none of them were identical to what you've done here, but I think one was very similar (adding checks around the element restriction to avoid trying to read/write out of memory and removing the current loop bounds). When running the hip-gen backend on AMD hardware/with hiprtc, it caused some pretty large performance losses for the Poisson operator, especially as I increased basis function order. The compiler output showed increased register usage which led to a drop in occupancy. I have some old notes I could dig up from a past quarterly report with the actual numbers.

Anyway, I would recommend some updated performance testing before merging this for all backends. In case it's still an issue, would there be a way to know that the kernel will be built with chipStar and only add the element check in that case?

jeremylt · 2026-04-09T19:56:13Z

These changes should not have any effect on register pressure I don't think? Here I am keeping the same strategy we currently have but making sure every thread is working during the last block of elements by padding with valid dummy data

nbeams · 2026-04-09T20:11:03Z

Oh, I guess I didn't look closely enough at the code here. I also tried a version that had any "leftover" threads doing a dummy read/write, though I think they were all reading from the same (valid) element rather than padded data (which could definitely affect things). Anyway, it also had performance drops over what we currently had in hip-gen.

Just a warning since I didn't expect the perf drops I saw before I did the tests, either. It's not exactly the same code and of course hiprtc has changed since then, so no idea if it will be a problem, but I'd still recommend checking just to be sure before merging.

jeremylt · 2026-04-09T20:16:00Z

For sure. If we see a performance difference, then I think the way to go for ChipStar would be to make chipstar backends /gpu/hip/chipstar/shared and /gpu/hip/chipstar/gen that delegate back to the current shared/gen code and that code would check the resource string for the root /gpu/hip/chipstar to determine if it needs to do the padding elements

jeremylt self-assigned this Apr 9, 2026

jeremylt added GPU CUDA 0-WIP HIP labels Apr 9, 2026

jeremylt force-pushed the jeremy/hip-all-elems branch 2 times, most recently from 319d781 to 803d69e Compare April 9, 2026 19:23

jeremylt mentioned this pull request Apr 9, 2026

Add chipStar (SPIR-V) support for HIP backends #1942

Open

gpu - pad out elem loop for shared/gen

dbffd6c

jeremylt force-pushed the jeremy/hip-all-elems branch from 803d69e to dbffd6c Compare April 9, 2026 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu - pad out elem loop for shared/gen#1950

gpu - pad out elem loop for shared/gen#1950
jeremylt wants to merge 1 commit intomainfrom
jeremy/hip-all-elems

jeremylt commented Apr 9, 2026 •

edited

Loading

Uh oh!

nbeams commented Apr 9, 2026

Uh oh!

jeremylt commented Apr 9, 2026

Uh oh!

nbeams commented Apr 9, 2026

Uh oh!

jeremylt commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jeremylt commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nbeams commented Apr 9, 2026

Uh oh!

jeremylt commented Apr 9, 2026

Uh oh!

nbeams commented Apr 9, 2026

Uh oh!

jeremylt commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeremylt commented Apr 9, 2026 •

edited

Loading