Skip to content

Conversation

@AlbertThie
Copy link

Add a separate pipeline to use GPUs on a ROCM environment. This pipeline avoids using gpu-numba, as this has not been updated for ROCM for quite some time. The loading and unloading is a bit hacky, but I haven't noticed too much perfomance degradation.

Passed all pytest tests.

Tested on a node with 8 AMD Instinct MI250X GPU's.

Please note I did not have access to a NVIDIA multi GPU environment, some testing may be warranted.

@zubatyuk
Copy link
Contributor

Thanks for the contribution.

Couple of issues:

I notice .cpu() and .cuda() transfers for the tensors.
Are these host-device transfers really required?
You should use current device index instead of the first one.

I still need to confirm that the PR does not brake CUDA and CPU compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants