How to get the latency and the package of NCCL #262

gabbychen · 2024-11-04T08:31:33Z

Hi
When I was utilizing GPU for AI inference
I found the communication takes much time in latency (compared to the open sourced code in NCCL)
even with same H/W & S/W configuration on different servers/GPUs the latency is different
May I know if there is any tools I can utilize for package capture or data analysis besides nccl_test?

kiskra-nvidia · 2024-11-04T15:19:36Z

I'm not sure if I understand the scenario. Do you find your custom code to be slower/less predictable than NCCL, or do you find that the NCCL code performs differently on different servers/GPUs?

I don't think we have anything for packet capturing, but in terms of performance analysis, you can try the NVIDIA Nsight tool. NCCL 2.23 also added a new profiler plugin API that you can leverage for customized performance analysis.

gabbychen · 2024-11-05T01:40:03Z

Hi, I find the both scenario
My custom code is slower than NCCL
and the NCCL code also performs differently on different servers and same server with different GPUs.

kiskra-nvidia · 2024-11-05T05:00:44Z

There could be many reasons for NCCL to perform non-uniformly, but the topology differences (the connectivity between the GPUs, and between GPUs and NICs) are probably the most common. nvidia-smi topo -m will give a quick overview of the topology within a node. NCCL can provide similar (though more detailed) output when run with NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT,GRAPH.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get the latency and the package of NCCL #262

How to get the latency and the package of NCCL #262

gabbychen commented Nov 4, 2024

kiskra-nvidia commented Nov 4, 2024

gabbychen commented Nov 5, 2024

kiskra-nvidia commented Nov 5, 2024

How to get the latency and the package of NCCL #262

How to get the latency and the package of NCCL #262

Comments

gabbychen commented Nov 4, 2024

kiskra-nvidia commented Nov 4, 2024

gabbychen commented Nov 5, 2024

kiskra-nvidia commented Nov 5, 2024