Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get the latency and the package of NCCL #262

Open
gabbychen opened this issue Nov 4, 2024 · 3 comments
Open

How to get the latency and the package of NCCL #262

gabbychen opened this issue Nov 4, 2024 · 3 comments

Comments

@gabbychen
Copy link

Hi
When I was utilizing GPU for AI inference
I found the communication takes much time in latency (compared to the open sourced code in NCCL)
even with same H/W & S/W configuration on different servers/GPUs the latency is different
May I know if there is any tools I can utilize for package capture or data analysis besides nccl_test?

@kiskra-nvidia
Copy link
Member

I'm not sure if I understand the scenario. Do you find your custom code to be slower/less predictable than NCCL, or do you find that the NCCL code performs differently on different servers/GPUs?

I don't think we have anything for packet capturing, but in terms of performance analysis, you can try the NVIDIA Nsight tool. NCCL 2.23 also added a new profiler plugin API that you can leverage for customized performance analysis.

@gabbychen
Copy link
Author

Hi, I find the both scenario
My custom code is slower than NCCL
and the NCCL code also performs differently on different servers and same server with different GPUs.

@kiskra-nvidia
Copy link
Member

There could be many reasons for NCCL to perform non-uniformly, but the topology differences (the connectivity between the GPUs, and between GPUs and NICs) are probably the most common. nvidia-smi topo -m will give a quick overview of the topology within a node. NCCL can provide similar (though more detailed) output when run with NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT,GRAPH.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants