You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
runtime/pprof: add debug=3 goroutine profile with goid and labels
This adds a new goroutine profile debug mode, debug=3.
This mode emits in the same binary proto output format as debug=0, with
the only difference being that it does not aggregate matching stack/label
combinations into a single count, and instead mits a sample per goroutine
with additional synthesized labels to communicate some of the details of
each goroutine, specifically:
- go::goroutine: the goroutine's ID
- go::goroutine_created_by: ID of the goroutine's creator (if any)
- go::goroutine_state: current state of the goroutine (e.g. runnable)
- go::goroutine_wait_minutes: approximate minutes goroutine has spent
waiting (if applicable)
These are synthesized as labels, side-by-side with any user-set labels
thus the 'go::' prefix to namespace these synthetic labels somewhat.
We aim to minimize collisions with user-set labels as while the format
does allow for multi-valued labels, meaning a collision would not
immediately overwrite a user-set label, the upstream proto does warn
that not all tools handle multi-valued labels well, so the namespace
prefix aims to avoid them if possible. The form 'go::' is chosen as the
upstream proto reserves 'pprof::', which seems to establish this format
as a convention.
Previously the debug=2 mode was the only way to get this kind of
per-goroutine information that is sometimes vital to understanding the
state of a process. However debug=2 had two major drawbacks:
1) its collection incurs a potentially lengthy and disruptive
stop-the-world pause due to it not (and perhaps being unable to)
utilizing the concurrent collection mechanism used by debug=0/1, and
2) it does not (currently) include user-set labels in the same profile.
This new mode that uses the same concurrent collection mechanism as 0/1,
with its minimal STW penalty, but also includes the per-goroutine details
that make debug=2 so useful. Changing debug=2, which already included
this per-goroutine detail, in-place to use the concurrent collection
mechanism could avoid introducing a new mode, but it is not clear that
this is possible with breaking changes to its output: debug=2 includes
argument _values_ in its output would would require collection in both a
concurrent and consistent manner. Thus the use of a new mode instead.
The difference in service latency observed by running goroutines during
profile collection is demonstrated by BenchmarkGoroutineProfileLatencyImpact:
│ debug=2 │ debug=3 │
│ max_latency_ns │ max_latency_ns vs base │
goroutines=100x3-14 508.4k ± 17% 166.7k ± 45% -67.21% (p=0.002 n=6)
goroutines=100x10-14 727.4k ± 9% 176.3k ± 34% -75.76% (p=0.002 n=6)
goroutines=100x50-14 1979.1k ± 6% 201.7k ± 18% -89.81% (p=0.002 n=6)
goroutines=1000x3-14 6454.9k ± 3% 235.6k ± 36% -96.35% (p=0.002 n=6)
goroutines=1000x10-14 10620.8k ± 47% 259.8k ± 165% -97.55% (p=0.002 n=6)
goroutines=1000x50-14 35170.8k ± 1% 228.7k ± 108% -99.35% (p=0.002 n=6)
goroutines=10000x3-14 63.836M ± 49% 1.961M ± 470% -96.93% (p=0.002 n=6)
goroutines=10000x10-14 58.962M ± 82% 4.073M ± 134% -93.09% (p=0.002 n=6)
goroutines=10000x50-14 259.723M ± 33% 4.730M ± 91% -98.18% (p=0.002 n=6)
geomean 10.62M 527.0k -95.04%
0 commit comments