Add NCCL collective sequence number (seq_num) to Kineto profiler traces#1294
Closed
mdlogic wants to merge 1 commit intopytorch:mainfrom
Closed
Add NCCL collective sequence number (seq_num) to Kineto profiler traces#1294mdlogic wants to merge 1 commit intopytorch:mainfrom
mdlogic wants to merge 1 commit intopytorch:mainfrom
Conversation
mdlogic
added a commit
to mdlogic/kineto
that referenced
this pull request
Mar 10, 2026
…es (pytorch#1294) Summary: X-link: pytorch/pytorch#177080 Thread the per-process-group sequence number from ProcessGroupNCCL through ParamCommsDebugInfo into the Kineto trace JSON output. This enables cross-rank correlation of collective operations: all ranks participating in the same collective instance share the same seq_num within a process group. Without this, there is no way to match collective events across ranks in production trace data (gpu_comms_events Scuba table). Changes: - ParamCommsUtils.hpp: Add sequenceNumber_/isP2P_ fields and setter to ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA macros to populate them from the existing seq tuple. - util.h: Add kSeqNum constant ("Seq") - util.cpp: Emit seq_num in saveNcclMeta() when available - default_event_args.py (HTA): Add nccl::seq mapping so HTA parses the new field into a seq_num column Differential Revision: D94566477
0885e94 to
1bf350e
Compare
mdlogic
added a commit
to mdlogic/HolisticTraceAnalysis
that referenced
this pull request
Mar 10, 2026
Summary: X-link: pytorch/pytorch#177080 X-link: pytorch/kineto#1294 Thread the per-process-group sequence number from ProcessGroupNCCL through ParamCommsDebugInfo into the Kineto trace JSON output. This enables cross-rank correlation of collective operations: all ranks participating in the same collective instance share the same seq_num within a process group. Without this, there is no way to match collective events across ranks in production trace data (gpu_comms_events Scuba table). Changes: - ParamCommsUtils.hpp: Add sequenceNumber_/isP2P_ fields and setter to ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA macros to populate them from the existing seq tuple. - util.h: Add kSeqNum constant ("Seq") - util.cpp: Emit seq_num in saveNcclMeta() when available - default_event_args.py (HTA): Add nccl::seq mapping so HTA parses the new field into a seq_num column Differential Revision: D94566477
mdlogic
added a commit
to mdlogic/pytorch
that referenced
this pull request
Mar 10, 2026
…es (pytorch#177080) Summary: X-link: pytorch/kineto#1294 Thread the per-process-group sequence number from ProcessGroupNCCL through ParamCommsDebugInfo into the Kineto trace JSON output. This enables cross-rank correlation of collective operations: all ranks participating in the same collective instance share the same seq_num within a process group. Without this, there is no way to match collective events across ranks in production trace data (gpu_comms_events Scuba table). Changes: - ParamCommsUtils.hpp: Add sequenceNumber_/isP2P_ fields and setter to ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA macros to populate them from the existing seq tuple. - util.h: Add kSeqNum constant ("Seq") - util.cpp: Emit seq_num in saveNcclMeta() when available - default_event_args.py (HTA): Add nccl::seq mapping so HTA parses the new field into a seq_num column Test Plan: * Added new tests: ``` [marvindz@devvm12146.ncg0 /data/repos/fbsource (e3e89a12f0)]$ buck2 test fbcode//mode/dev-nosan fbcode//caffe2/test/distributed/fb:test_c10d_nccl_seq_num_trace File changed: fbcode//caffe2/test/distributed/fb/test_c10d_nccl_seq_num_trace.py File changed: fbsource//xplat/caffe2/test/distributed/fb/test_c10d_nccl_seq_num_trace.py Buck UI: https://www.internalfb.com/buck2/56eecce1-7093-4f2f-841d-d9215077e212 Test UI: https://www.internalfb.com/intern/testinfra/testrun/17732923687903807 Network: Up: 0B Down: 2.6KiB (reSessionID-fbf39a12-a6aa-4a8b-a5f2-74f91464714d) Executing actions. Remaining 0/2 0.1s exec time total Command: test. Finished 1 local Time elapsed: 1:55.8s Tests finished: Pass 2. Fail 0. Timeout 0. Fatal 0. Skip 0. Omit 0. Infra Failure 0. Build failure 0 [marvindz@devvm12146.ncg0 /data/repos/fbsource (bf2aaca219)]$ ``` Differential Revision: D94566477
…es (pytorch#1294) Summary: X-link: pytorch/pytorch#177080 Pull Request resolved: pytorch#1294 Thread the per-process-group sequence number from ProcessGroupNCCL through ParamCommsDebugInfo into the Kineto trace JSON output. This enables cross-rank correlation of collective operations: all ranks participating in the same collective instance share the same seq_num within a process group. Without this, there is no way to match collective events across ranks in production trace data (gpu_comms_events Scuba table). Changes: - ParamCommsUtils.hpp: Add sequenceNumber_/isP2P_ fields and setter to ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA macros to populate them from the existing seq tuple. - util.h: Add kSeqNum constant ("Seq") - util.cpp: Emit seq_num in saveNcclMeta() when available - default_event_args.py (HTA): Add nccl::seq mapping so HTA parses the new field into a seq_num column Differential Revision: D94566477
mdlogic
added a commit
to mdlogic/HolisticTraceAnalysis
that referenced
this pull request
Mar 10, 2026
Summary: X-link: pytorch/pytorch#177080 X-link: pytorch/kineto#1294 Thread the per-process-group sequence number from ProcessGroupNCCL through ParamCommsDebugInfo into the Kineto trace JSON output. This enables cross-rank correlation of collective operations: all ranks participating in the same collective instance share the same seq_num within a process group. Without this, there is no way to match collective events across ranks in production trace data (gpu_comms_events Scuba table). Changes: - ParamCommsUtils.hpp: Add sequenceNumber_/isP2P_ fields and setter to ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA macros to populate them from the existing seq tuple. - util.h: Add kSeqNum constant ("Seq") - util.cpp: Emit seq_num in saveNcclMeta() when available - default_event_args.py (HTA): Add nccl::seq mapping so HTA parses the new field into a seq_num column Differential Revision: D94566477
1bf350e to
6396d35
Compare
mdlogic
added a commit
to mdlogic/pytorch
that referenced
this pull request
Mar 10, 2026
…es (pytorch#177080) Summary: Pull Request resolved: pytorch#177080 X-link: pytorch/kineto#1294 Thread the per-process-group sequence number from ProcessGroupNCCL through ParamCommsDebugInfo into the Kineto trace JSON output. This enables cross-rank correlation of collective operations: all ranks participating in the same collective instance share the same seq_num within a process group. Without this, there is no way to match collective events across ranks in production trace data (gpu_comms_events Scuba table). Changes: - ParamCommsUtils.hpp: Add sequenceNumber_/isP2P_ fields and setter to ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA macros to populate them from the existing seq tuple. - util.h: Add kSeqNum constant ("Seq") - util.cpp: Emit seq_num in saveNcclMeta() when available - default_event_args.py (HTA): Add nccl::seq mapping so HTA parses the new field into a seq_num column Test Plan: * Added new tests: ``` [marvindz@devvm12146.ncg0 /data/repos/fbsource (e3e89a12f0)]$ buck2 test fbcode//mode/dev-nosan fbcode//caffe2/test/distributed/fb:test_c10d_nccl_seq_num_trace File changed: fbcode//caffe2/test/distributed/fb/test_c10d_nccl_seq_num_trace.py File changed: fbsource//xplat/caffe2/test/distributed/fb/test_c10d_nccl_seq_num_trace.py Buck UI: https://www.internalfb.com/buck2/56eecce1-7093-4f2f-841d-d9215077e212 Test UI: https://www.internalfb.com/intern/testinfra/testrun/17732923687903807 Network: Up: 0B Down: 2.6KiB (reSessionID-fbf39a12-a6aa-4a8b-a5f2-74f91464714d) Executing actions. Remaining 0/2 0.1s exec time total Command: test. Finished 1 local Time elapsed: 1:55.8s Tests finished: Pass 2. Fail 0. Timeout 0. Fatal 0. Skip 0. Omit 0. Infra Failure 0. Build failure 0 [marvindz@devvm12146.ncg0 /data/repos/fbsource (bf2aaca219)]$ ``` Differential Revision: D94566477
meta-codesync bot
pushed a commit
to facebookresearch/HolisticTraceAnalysis
that referenced
this pull request
Mar 11, 2026
…es (#320) Summary: Pull Request resolved: #320 X-link: pytorch/pytorch#177080 X-link: pytorch/kineto#1294 Thread the per-process-group sequence number from ProcessGroupNCCL through ParamCommsDebugInfo into the Kineto trace JSON output. This enables cross-rank correlation of collective operations: all ranks participating in the same collective instance share the same seq_num within a process group. Without this, there is no way to match collective events across ranks in production trace data (gpu_comms_events Scuba table). Changes: - ParamCommsUtils.hpp: Add sequenceNumber_/isP2P_ fields and setter to ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA macros to populate them from the existing seq tuple. - util.h: Add kSeqNum constant ("Seq") - util.cpp: Emit seq_num in saveNcclMeta() when available - default_event_args.py (HTA): Add nccl::seq mapping so HTA parses the new field into a seq_num column Differential Revision: D94566477 fbshipit-source-id: 857cd6ceab9c380258bb2d74d711faa896d795f9
|
This pull request has been merged in a7c5f4d. |
Contributor
|
This pull request has been reverted by e2e7e97. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Thread the per-process-group sequence number from ProcessGroupNCCL through
ParamCommsDebugInfo into the Kineto trace JSON output.
This enables cross-rank correlation of collective operations: all ranks
participating in the same collective instance share the same seq_num
within a process group. Without this, there is no way to match collective
events across ranks in production trace data (gpu_comms_events Scuba table).
Changes:
ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA
macros to populate them from the existing seq tuple.
new field into a seq_num column
Differential Revision: D94566477