You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi
When I was utilizing GPU for AI inference
I found the communication takes much time in latency (compared to the open sourced code in NCCL)
even with same H/W & S/W configuration on different servers/GPUs the latency is different
May I know if there is any tools I can utilize for package capture or data analysis besides nccl_test?
The text was updated successfully, but these errors were encountered:
I'm not sure if I understand the scenario. Do you find your custom code to be slower/less predictable than NCCL, or do you find that the NCCL code performs differently on different servers/GPUs?
I don't think we have anything for packet capturing, but in terms of performance analysis, you can try the NVIDIA Nsight tool. NCCL 2.23 also added a new profiler plugin API that you can leverage for customized performance analysis.
Hi, I find the both scenario
My custom code is slower than NCCL
and the NCCL code also performs differently on different servers and same server with different GPUs.
There could be many reasons for NCCL to perform non-uniformly, but the topology differences (the connectivity between the GPUs, and between GPUs and NICs) are probably the most common. nvidia-smi topo -m will give a quick overview of the topology within a node. NCCL can provide similar (though more detailed) output when run with NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT,GRAPH.
Hi
When I was utilizing GPU for AI inference
I found the communication takes much time in latency (compared to the open sourced code in NCCL)
even with same H/W & S/W configuration on different servers/GPUs the latency is different
May I know if there is any tools I can utilize for package capture or data analysis besides nccl_test?
The text was updated successfully, but these errors were encountered: