You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
runtime/pprof: add debug=3 goroutine profile with goid and labels
This adds a new goroutine profile debug mode, debug=3.
This mode emits in the same binary proto output format as debug=0, with
the only difference being that it does not aggregate matching
stack/label combinations into a single count, and instead emits a sample
per goroutine with additional synthesized labels to communicate some of
the details of each goroutine, specifically:
- go::goroutine: the goroutine's ID
- go::goroutine_created_by: ID of the goroutine's creator (if any)
- go::goroutine_state: current state of the goroutine (e.g. runnable)
- go::goroutine_wait_minutes: approximate minutes goroutine has spent
waiting (if applicable)
These are synthesized as labels, side-by-side with any user-set labels
thus the 'go::' prefix to namespace these synthetic labels somewhat.
We aim to minimize collisions with user-set labels as while the format
does allow for multi-valued labels, meaning a collision would not
immediately overwrite a user-set label, the upstream proto does warn
that not all tools handle multi-valued labels well, so the namespace
prefix aims to avoid them if possible. The form 'go::' is chosen as the
upstream proto reserves 'pprof::', which seems to establish this format
as a convention.
Previously the debug=2 mode was the only way to get this kind of
per-goroutine information that is sometimes vital to understanding the
state of a process. However debug=2 had two major drawbacks:
1) its collection incurs a potentially lengthy and disruptive
stop-the-world pause due to it not (and perhaps being unable to)
utilizing the concurrent collection mechanism used by debug=0/1, and
2) it does not (currently) include user-set labels in the same profile.
This new mode that uses the same concurrent collection mechanism as 0/1,
with its minimal STW penalty, but also includes the per-goroutine
details that make debug=2 so useful. Changing debug=2, which already
included this per-goroutine detail, in-place to use the concurrent
collection mechanism could avoid introducing a new mode, but it is not
clear that this is possible with breaking changes to its output: debug=2
includes argument _values_ in its output would require collection in
both a concurrent and consistent manner. Thus the use of a new mode
instead.
The difference in service latency observed by running goroutines during
profile collection is demonstrated by BenchmarkGoroutineProfileLatencyImpact:
│ debug=2 │ debug=3 │
│ max_latency_ns │ max_latency_ns vs base │
goroutines=100x3-14 422.2k ± 13% 190.3k ± 38% -54.93% (p=0.002 n=6)
goroutines=100x10-14 619.7k ± 10% 171.1k ± 43% -72.38% (p=0.002 n=6)
goroutines=100x50-14 1423.6k ± 7% 174.3k ± 44% -87.76% (p=0.002 n=6)
goroutines=1000x3-14 2424.8k ± 8% 298.6k ± 106% -87.68% (p=0.002 n=6)
goroutines=1000x10-14 7378.4k ± 2% 268.2k ± 146% -96.36% (p=0.002 n=6)
goroutines=1000x50-14 23372.5k ± 10% 330.1k ± 173% -98.59% (p=0.002 n=6)
goroutines=10000x3-14 42.802M ± 47% 1.991M ± 105% -95.35% (p=0.002 n=6)
goroutines=10000x10-14 36668.2k ± 95% 743.1k ± 72% -97.97% (p=0.002 n=6)
goroutines=10000x50-14 120639.1k ± 2% 188.2k ± 2582% -99.84% (p=0.002 n=6)
geomean 6.760M 326.2k -95.18%
0 commit comments