Skip to content

Add NCCL collective sequence number (seq_num) to Kineto profiler traces#1294

Closed
mdlogic wants to merge 1 commit intopytorch:mainfrom
mdlogic:export-D94566477
Closed

Add NCCL collective sequence number (seq_num) to Kineto profiler traces#1294
mdlogic wants to merge 1 commit intopytorch:mainfrom
mdlogic:export-D94566477

Conversation

@mdlogic
Copy link
Copy Markdown
Contributor

@mdlogic mdlogic commented Mar 10, 2026

Summary:
Thread the per-process-group sequence number from ProcessGroupNCCL through
ParamCommsDebugInfo into the Kineto trace JSON output.

This enables cross-rank correlation of collective operations: all ranks
participating in the same collective instance share the same seq_num
within a process group. Without this, there is no way to match collective
events across ranks in production trace data (gpu_comms_events Scuba table).

Changes:

  • ParamCommsUtils.hpp: Add sequenceNumber_/isP2P_ fields and setter to
    ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA
    macros to populate them from the existing seq tuple.
  • util.h: Add kSeqNum constant ("Seq")
  • util.cpp: Emit seq_num in saveNcclMeta() when available
  • default_event_args.py (HTA): Add nccl::seq mapping so HTA parses the
    new field into a seq_num column

Differential Revision: D94566477

@meta-cla meta-cla bot added the cla signed label Mar 10, 2026
mdlogic added a commit to mdlogic/kineto that referenced this pull request Mar 10, 2026
…es (pytorch#1294)

Summary:
X-link: pytorch/pytorch#177080


Thread the per-process-group sequence number from ProcessGroupNCCL through
ParamCommsDebugInfo into the Kineto trace JSON output.

This enables cross-rank correlation of collective operations: all ranks
participating in the same collective instance share the same seq_num
within a process group. Without this, there is no way to match collective
events across ranks in production trace data (gpu_comms_events Scuba table).

Changes:
- ParamCommsUtils.hpp: Add sequenceNumber_/isP2P_ fields and setter to
  ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA
  macros to populate them from the existing seq tuple.
- util.h: Add kSeqNum constant ("Seq")
- util.cpp: Emit seq_num in saveNcclMeta() when available
- default_event_args.py (HTA): Add nccl::seq mapping so HTA parses the
  new field into a seq_num column

Differential Revision: D94566477
mdlogic added a commit to mdlogic/HolisticTraceAnalysis that referenced this pull request Mar 10, 2026
Summary:
X-link: pytorch/pytorch#177080

X-link: pytorch/kineto#1294

Thread the per-process-group sequence number from ProcessGroupNCCL through
ParamCommsDebugInfo into the Kineto trace JSON output.

This enables cross-rank correlation of collective operations: all ranks
participating in the same collective instance share the same seq_num
within a process group. Without this, there is no way to match collective
events across ranks in production trace data (gpu_comms_events Scuba table).

Changes:
- ParamCommsUtils.hpp: Add sequenceNumber_/isP2P_ fields and setter to
  ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA
  macros to populate them from the existing seq tuple.
- util.h: Add kSeqNum constant ("Seq")
- util.cpp: Emit seq_num in saveNcclMeta() when available
- default_event_args.py (HTA): Add nccl::seq mapping so HTA parses the
  new field into a seq_num column

Differential Revision: D94566477
mdlogic added a commit to mdlogic/pytorch that referenced this pull request Mar 10, 2026
…es (pytorch#177080)

Summary:

X-link: pytorch/kineto#1294

Thread the per-process-group sequence number from ProcessGroupNCCL through
ParamCommsDebugInfo into the Kineto trace JSON output.

This enables cross-rank correlation of collective operations: all ranks
participating in the same collective instance share the same seq_num
within a process group. Without this, there is no way to match collective
events across ranks in production trace data (gpu_comms_events Scuba table).

Changes:
- ParamCommsUtils.hpp: Add sequenceNumber_/isP2P_ fields and setter to
  ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA
  macros to populate them from the existing seq tuple.
- util.h: Add kSeqNum constant ("Seq")
- util.cpp: Emit seq_num in saveNcclMeta() when available
- default_event_args.py (HTA): Add nccl::seq mapping so HTA parses the
  new field into a seq_num column

Test Plan:
* Added new tests:
```
[marvindz@devvm12146.ncg0 /data/repos/fbsource (e3e89a12f0)]$ buck2 test fbcode//mode/dev-nosan   fbcode//caffe2/test/distributed/fb:test_c10d_nccl_seq_num_trace
File changed: fbcode//caffe2/test/distributed/fb/test_c10d_nccl_seq_num_trace.py
File changed: fbsource//xplat/caffe2/test/distributed/fb/test_c10d_nccl_seq_num_trace.py
Buck UI: https://www.internalfb.com/buck2/56eecce1-7093-4f2f-841d-d9215077e212
Test UI: https://www.internalfb.com/intern/testinfra/testrun/17732923687903807
Network: Up: 0B  Down: 2.6KiB  (reSessionID-fbf39a12-a6aa-4a8b-a5f2-74f91464714d)
Executing actions. Remaining     0/2                                                     0.1s exec time total
Command: test.     Finished 1 local                                                                                                                           
Time elapsed: 1:55.8s
Tests finished: Pass 2. Fail 0. Timeout 0. Fatal 0. Skip 0. Omit 0. Infra Failure 0. Build failure 0
[marvindz@devvm12146.ncg0 /data/repos/fbsource (bf2aaca219)]$ 
```

Differential Revision: D94566477
…es (pytorch#1294)

Summary:
X-link: pytorch/pytorch#177080

Pull Request resolved: pytorch#1294

Thread the per-process-group sequence number from ProcessGroupNCCL through
ParamCommsDebugInfo into the Kineto trace JSON output.

This enables cross-rank correlation of collective operations: all ranks
participating in the same collective instance share the same seq_num
within a process group. Without this, there is no way to match collective
events across ranks in production trace data (gpu_comms_events Scuba table).

Changes:
- ParamCommsUtils.hpp: Add sequenceNumber_/isP2P_ fields and setter to
  ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA
  macros to populate them from the existing seq tuple.
- util.h: Add kSeqNum constant ("Seq")
- util.cpp: Emit seq_num in saveNcclMeta() when available
- default_event_args.py (HTA): Add nccl::seq mapping so HTA parses the
  new field into a seq_num column

Differential Revision: D94566477
mdlogic added a commit to mdlogic/HolisticTraceAnalysis that referenced this pull request Mar 10, 2026
Summary:
X-link: pytorch/pytorch#177080

X-link: pytorch/kineto#1294

Thread the per-process-group sequence number from ProcessGroupNCCL through
ParamCommsDebugInfo into the Kineto trace JSON output.

This enables cross-rank correlation of collective operations: all ranks
participating in the same collective instance share the same seq_num
within a process group. Without this, there is no way to match collective
events across ranks in production trace data (gpu_comms_events Scuba table).

Changes:
- ParamCommsUtils.hpp: Add sequenceNumber_/isP2P_ fields and setter to
  ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA
  macros to populate them from the existing seq tuple.
- util.h: Add kSeqNum constant ("Seq")
- util.cpp: Emit seq_num in saveNcclMeta() when available
- default_event_args.py (HTA): Add nccl::seq mapping so HTA parses the
  new field into a seq_num column

Differential Revision: D94566477
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Mar 10, 2026

@mdlogic has exported this pull request. If you are a Meta employee, you can view the originating Diff in D94566477.

mdlogic added a commit to mdlogic/pytorch that referenced this pull request Mar 10, 2026
…es (pytorch#177080)

Summary:
Pull Request resolved: pytorch#177080

X-link: pytorch/kineto#1294

Thread the per-process-group sequence number from ProcessGroupNCCL through
ParamCommsDebugInfo into the Kineto trace JSON output.

This enables cross-rank correlation of collective operations: all ranks
participating in the same collective instance share the same seq_num
within a process group. Without this, there is no way to match collective
events across ranks in production trace data (gpu_comms_events Scuba table).

Changes:
- ParamCommsUtils.hpp: Add sequenceNumber_/isP2P_ fields and setter to
  ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA
  macros to populate them from the existing seq tuple.
- util.h: Add kSeqNum constant ("Seq")
- util.cpp: Emit seq_num in saveNcclMeta() when available
- default_event_args.py (HTA): Add nccl::seq mapping so HTA parses the
  new field into a seq_num column

Test Plan:
* Added new tests:
```
[marvindz@devvm12146.ncg0 /data/repos/fbsource (e3e89a12f0)]$ buck2 test fbcode//mode/dev-nosan   fbcode//caffe2/test/distributed/fb:test_c10d_nccl_seq_num_trace
File changed: fbcode//caffe2/test/distributed/fb/test_c10d_nccl_seq_num_trace.py
File changed: fbsource//xplat/caffe2/test/distributed/fb/test_c10d_nccl_seq_num_trace.py
Buck UI: https://www.internalfb.com/buck2/56eecce1-7093-4f2f-841d-d9215077e212
Test UI: https://www.internalfb.com/intern/testinfra/testrun/17732923687903807
Network: Up: 0B  Down: 2.6KiB  (reSessionID-fbf39a12-a6aa-4a8b-a5f2-74f91464714d)
Executing actions. Remaining     0/2                                                     0.1s exec time total
Command: test.     Finished 1 local
Time elapsed: 1:55.8s
Tests finished: Pass 2. Fail 0. Timeout 0. Fatal 0. Skip 0. Omit 0. Infra Failure 0. Build failure 0
[marvindz@devvm12146.ncg0 /data/repos/fbsource (bf2aaca219)]$
```

Differential Revision: D94566477
meta-codesync bot pushed a commit to facebookresearch/HolisticTraceAnalysis that referenced this pull request Mar 11, 2026
…es (#320)

Summary:
Pull Request resolved: #320

X-link: pytorch/pytorch#177080

X-link: pytorch/kineto#1294

Thread the per-process-group sequence number from ProcessGroupNCCL through
ParamCommsDebugInfo into the Kineto trace JSON output.

This enables cross-rank correlation of collective operations: all ranks
participating in the same collective instance share the same seq_num
within a process group. Without this, there is no way to match collective
events across ranks in production trace data (gpu_comms_events Scuba table).

Changes:
- ParamCommsUtils.hpp: Add sequenceNumber_/isP2P_ fields and setter to
  ParamCommsDebugInfo. Update RECORD_PARAM_COMMS and RECORD_PARAM_COMMS_DATA
  macros to populate them from the existing seq tuple.
- util.h: Add kSeqNum constant ("Seq")
- util.cpp: Emit seq_num in saveNcclMeta() when available
- default_event_args.py (HTA): Add nccl::seq mapping so HTA parses the
  new field into a seq_num column

Differential Revision: D94566477

fbshipit-source-id: 857cd6ceab9c380258bb2d74d711faa896d795f9
@meta-codesync meta-codesync bot closed this in a7c5f4d Mar 11, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Mar 11, 2026

This pull request has been merged in a7c5f4d.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been reverted by e2e7e97.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants