Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a timeout issue in DeepEP intranode-combine on ROCm 7.2 by forcing the use of the -fgpu-rdc (Relocatable Device Code) compiler flag across all compilation stages.
Changes:
- Added
-fgpu-rdcflag to CXX compilation flags - Added
-fgpu-rdcand--hip-linkflags to linker arguments - Added
-fgpu-rdcflag to HIP compilation flags
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
setup.py
Outdated
| "-Wl,-rpath,/opt/rocm/lib", | ||
| f"-L/usr/lib/{arch}-linux-gnu", | ||
| ] | ||
| extra_link_args = ["-Wl,-rpath,/opt/rocm/lib", f"-L/usr/lib/{arch}-linux-gnu", "-fgpu-rdc", "--hip-link"] |
There was a problem hiding this comment.
The extra_link_args list is formatted on a single line, making it harder to read and maintain compared to the original multi-line format. Consider restoring the multi-line format for consistency and readability.
| extra_link_args = ["-Wl,-rpath,/opt/rocm/lib", f"-L/usr/lib/{arch}-linux-gnu", "-fgpu-rdc", "--hip-link"] | |
| extra_link_args = [ | |
| "-Wl,-rpath,/opt/rocm/lib", | |
| f"-L/usr/lib/{arch}-linux-gnu", | |
| "-fgpu-rdc", | |
| "--hip-link", | |
| ] |
setup.py
Outdated
| cxx_flags = [ | ||
| "-O3", | ||
| "-fvisibility=hidden", | ||
| "-fgpu-rdc", |
There was a problem hiding this comment.
The -fgpu-rdc flag is duplicated across cxx_flags and hip_flags. Consider extracting this to a shared constant or variable to ensure consistency and make future updates easier to manage.
setup.py
Outdated
| "-amdgpu-early-inline-all=true", | ||
| "-mllvm", | ||
| "-amdgpu-function-calls=false", | ||
| "-fgpu-rdc", |
There was a problem hiding this comment.
The -fgpu-rdc flag is duplicated across cxx_flags and hip_flags. Consider extracting this to a shared constant or variable to ensure consistency and make future updates easier to manage.
|
I remember this compile option caused issues before. |
From JIRA AIMA-212, it shows that DeepEP intranode-combine timeout on rocm7.2 docker image, but works well on rocm7.1.
I found a workaround to solve the timeout which forcely add
-fgpu-rdcto compile and disable-amdgpu-function-calls=false