Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run nccl test in vm without nvswitch passthroughed? #260

Open
joydchh opened this issue Oct 31, 2024 · 1 comment
Open

How to run nccl test in vm without nvswitch passthroughed? #260

joydchh opened this issue Oct 31, 2024 · 1 comment

Comments

@joydchh
Copy link

joydchh commented Oct 31, 2024

Hi,
We are trying to run 4 vms in a host with 8 H100s, and each vm with 2 GPUs.
We found that the nvswitches can only be passthroughed into a single vm, and the rest vms got none. In this case, vms without nvswitch cannot run nccl test. The error is like blow.
image
Then, it came to my mind that maybe disabling nvlink would help to find the path with pcie. So, I tried to set NCCL_P2P_DISABLE=1, but still not working.
image
I don't know if there is any way to make through?

@joydchh
Copy link
Author

joydchh commented Nov 5, 2024

Any insights on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant