TraceLens

TraceLens is a Python library focused on automating analysis from trace files and enabling rich performance insights. Designed with simplicity and extensibility in mind, this library provides tools to simplify the process of profiling and debugging complex distributed training and inference systems.

Key Features

Hierarchical Performance Breakdowns — Pinpoint bottlenecks with a top-down view, moving from the overall GPU timeline (idle/busy) to operator categories, individual operators, and right down to unique argument shapes.

Compute & Roofline Modeling — Automatically translate raw timings into efficiency metrics like TFLOP/s and TB/s for popular operations. Determine if an op is compute- or memory-bound and see how effectively your code uses the hardware.

Multi-GPU Communication Analysis — Accurately diagnose scaling issues by dissecting collective operations. TraceLens separates pure communication time from synchronization skew and calculates effective bandwidth on your real workload.

Trace Comparison — Quantify the impact of your changes with powerful trace diffing. By analyzing performance at the CPU dispatch level, TraceLens enables meaningful side-by-side comparisons across different hardware and software versions.

Event Replay — Isolate any operation for focused debugging. TraceLens generates minimal, self-contained replay scripts directly from trace metadata, making it simple to share IP-safe test cases with kernel developers.

Extensible SDK — Get started instantly with ready-to-use scripts, then build your own custom workflows using a flexible and hackable Python API.

Quick Start

1. Install

pip install git+https://github.com/AMD-AGI/TraceLens.git

2. Generate a report from your PyTorch trace

TraceLens_generate_perf_report_pytorch --profile_json_path path/to/your/trace.json

This produces an Excel workbook with GPU timeline breakdown, ops summary, roofline metrics, and more. See Performance Report Column Definitions for what each column means.

Don't have a trace yet? Follow the PyTorch profiling guide, or use the demo traces bundled in the repo.

For the full CLI reference (JAX, rocprofv3, trace comparison, multi-rank collectives), see Supported Profile Formats below.

Examples & Notebooks

Hands-on notebooks that walk through the core TraceLens features:

Example	What it covers
Trace2Tree	Navigate the hierarchical event tree — link Python ops, CPU dispatches, and GPU kernels
TreePerf	GPU timeline breakdown, per-op performance, and roofline metrics via the SDK
NN Module View	See GPU time broken down by `nn.Module` — useful for model developers
NCCL Analyser	Multi-rank collective analysis: latency, bandwidth, skew
Trace Diff	Morphological comparison of two trace trees to pinpoint structural divergences
Event Replay	Extract and replay operations for isolated debugging
Trace Fusion	Merge multi-rank PyTorch traces into a single file for Perfetto visualization
Roofline Plots	Build roofline-style visualizations for specific operators
JAX NCCL Analyser	Bandwidth analysis for JAX collective operations from XPlane traces

For community-contributed utilities — including interactive trace dashboards (traceMap), roofline analysis tooling, and a Streamlit UI — see examples/custom_workflows/.

Supported Profile Formats

Format	Tool	Documentation
PyTorch	`torch.profiler`	docs/generate_perf_report.md
JAX	XPlane protobuf	docs/jax_analyses.md
rocprofv3 JSON	AMD ROCm rocprofiler-sdk	docs/generate_perf_report_rocprof.md
rocprofv3 pftrace	Perfetto-style	docs/generate_perf_report_rocprof_pftrace.md

PyTorch

TraceLens_generate_perf_report_pytorch --profile_json_path path/to/your/trace.json

Detailed docs: generate_perf_report.md. Supports compressed traces (.zip, .gz).

Compare PyTorch reports

TraceLens_compare_perf_reports_pytorch \
    baseline.xlsx candidate.xlsx \
    --names baseline candidate \
    --sheets all \
    -o comparison.xlsx

Detailed docs: compare_perf_reports_pytorch.md.

Multi-rank collective report

TraceLens_generate_multi_rank_collective_report_pytorch \
    --trace_dir /path/to/traces \
    --world_size 8

Detailed docs: generate_multi_rank_collective_report_pytorch.md.

rocprofv3 JSON

For *_results.json from rocprofv3:

TraceLens_generate_perf_report_rocprof \
    --profile_json_path trace_results.json \
    --short_kernel_study --kernel_details

Detailed docs: generate_perf_report_rocprof.md.

pftrace (rocprofv3 / Perfetto)

For Perfetto-style traces (e.g. rocprofv3 --output-format pftrace):

# Record a pftrace
rocprofv3 --hip-trace --kernel-trace --memory-copy-trace --rccl-trace \
    --output-format pftrace -d ./v3_traces -- python3 your_app.py

# Activity report (NSYS-style category summary, optional Markdown)
TraceLens_generate_perf_report_pftrace_hip_activity --trace_path sample.pftrace --write_md

# API↔Kernel report (latency breakdown T = A + Q + K)
TraceLens_generate_perf_report_pftrace_hip_api --trace_path sample.pftrace

# Memory copy report
TraceLens_generate_perf_report_pftrace_memory_copy --trace_path sample.pftrace

.pftrace is Perfetto's binary format that needs to be converted to JSON for parsing. traceconv (a Perfetto tool) is optional — if not on PATH, it is downloaded automatically. You can also pass --traceconv /path/to/traceconv explicitly.

Documentation

Deeper dives on the core modules:

Module	Doc
Trace2Tree	docs/Trace2Tree.md
TreePerf	docs/TreePerf.md
NCCL Analyser	docs/NcclAnalyser.md
TraceDiff	docs/TraceDiff.md
Event Replay	docs/EventReplay.md
TraceFusion	docs/TraceFusion.md
GPU Event Analyser	docs/gpu_event_analyser.md
JAX Analyses	docs/jax_analyses.md
pftrace Reports	docs/generate_perf_report_rocprof_pftrace.md
Performance Report Columns	docs/perf_report_columns.md

Development

git clone https://github.com/AMD-AGI/TraceLens.git && cd TraceLens
pip install -e .[dev]
python -m pytest tests/ -v

Contributing

We welcome contributions across the entire project — new analysis modules, performance models, docs, examples, or bug fixes. Whether you're adding a new metric or building a custom workflow, the SDK is designed to make that easy.

Please see CONTRIBUTING.md for guidelines on branching, commit style, and project structure.

Name		Name	Last commit message	Last commit date
Latest commit History 495 Commits
.github		.github
TraceLens		TraceLens
docs		docs
examples		examples
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TraceLens

Key Features

Quick Start

1. Install

2. Generate a report from your PyTorch trace

Examples & Notebooks

Supported Profile Formats

PyTorch

Compare PyTorch reports

Multi-rank collective report

rocprofv3 JSON

pftrace (rocprofv3 / Perfetto)

Documentation

Development

Contributing

Additional Resources

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

AMD-AGI/TraceLens

Folders and files

Latest commit

History

Repository files navigation

TraceLens

Key Features

Quick Start

1. Install

2. Generate a report from your PyTorch trace

Examples & Notebooks

Supported Profile Formats

PyTorch

Compare PyTorch reports

Multi-rank collective report

rocprofv3 JSON

pftrace (rocprofv3 / Perfetto)

Documentation

Development

Contributing

Additional Resources

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages