Skip to content

Commit ce667bc

Browse files
nickrobinson251vilterpkpamnanyNHDalyvchuravy
authored
Add per-Task cpu time metric (JuliaLang#56320) (#194)
* Add per-task metrics (JuliaLang#56320) Close JuliaLang#47351 (builds on top of JuliaLang#48416) Adds two per-task metrics: - running time = amount of time the task was actually running (according to our scheduler). Note: currently inclusive of GC time, but would be good to be able to separate that out (in a future PR) - wall time = amount of time between the scheduler becoming aware of this task and the task entering a terminal state (i.e. done or failed). We record running time in `wait()`, where the scheduler stops running the task as well as in `yield(t)`, `yieldto(t)` and `throwto(t)`, which bypass the scheduler. Other places where a task stops running (for `Channel`, `ReentrantLock`, `Event`, `Timer` and `Semaphore` are all implemented in terms of `wait(Condition)`, which in turn calls `wait()`. `LibuvStream` similarly calls `wait()`. This should capture everything (albeit, slightly over-counting task CPU time by including any enqueuing work done before we hit `wait()`). The various metrics counters could be a separate inlined struct if we think that's a useful abstraction, but for now i've just put them directly in `jl_task_t`. They are all atomic, except the `metrics_enabled` flag itself (which we now have to check on task start/switch/done even if metrics are not enabled) which is set on task construction and marked `const` on the julia side. In future PRs we could add more per-task metrics, e.g. compilation time, GC time, allocations, potentially a wait-time breakdown (time waiting on locks, channels, in the scheduler run queue, etc.), potentially the number of yields. Perhaps in future there could be ways to enable this on a per-thread and per-task basis. And potentially in future these same timings could be used by `@time` (e.g. writing this same timing data to a ScopedValue like in JuliaLang#55103 but only for tasks lexically scoped to inside the `@time` block). Timings are off by default but can be turned on globally via starting Julia with `--task-metrics=yes` or calling `Base.Experimental.task_metrics(true)`. Metrics are collected for all tasks created when metrics are enabled. In other words, enabling/disabling timings via `Base.Experimental.task_metrics` does not affect existing `Task`s, only new `Task`s. The other new APIs are `Base.Experimental.task_running_time_ns(::Task)` and `Base.Experimental.task_wall_time_ns(::Task)` for retrieving the new metrics. These are safe to call on any task (including the current task, or a task running on another thread). All these are in `Base.Experimental` to give us room to change up the APIs as we add more metrics in future PRs (without worrying about release timelines). cc @NHDaly @kpamnany @d-netto --------- Co-authored-by: Pete Vilter <[email protected]> Co-authored-by: K Pamnany <[email protected]> Co-authored-by: Nathan Daly <[email protected]> Co-authored-by: Valentin Churavy <[email protected]> * Address review comments --------- Co-authored-by: Pete Vilter <[email protected]> Co-authored-by: K Pamnany <[email protected]> Co-authored-by: Nathan Daly <[email protected]> Co-authored-by: Valentin Churavy <[email protected]>
1 parent 9344ba8 commit ce667bc

19 files changed

+551
-37
lines changed

base/boot.jl

+23-5
Original file line numberDiff line numberDiff line change
@@ -156,15 +156,33 @@
156156
#end
157157

158158
#mutable struct Task
159-
# parent::Task
159+
# next::Any
160+
# queue::Any
160161
# storage::Any
161-
# state::Symbol
162162
# donenotify::Any
163163
# result::Any
164-
# exception::Any
165-
# backtrace::Any
166-
# logstate::Any
164+
# scope::Any
167165
# code::Any
166+
# @atomic _state::UInt8
167+
# sticky::UInt8
168+
# priority::UInt16
169+
# @atomic _isexception::UInt8
170+
# pad00::UInt8
171+
# pad01::UInt8
172+
# pad02::UInt8
173+
# rngState0::UInt64
174+
# rngState1::UInt64
175+
# rngState2::UInt64
176+
# rngState3::UInt64
177+
# rngState4::UInt64
178+
# const metrics_enabled::Bool
179+
# pad10::UInt8
180+
# pad11::UInt8
181+
# pad12::UInt8
182+
# @atomic first_enqueued_at::UInt64
183+
# @atomic last_started_running_at::UInt64
184+
# @atomic running_time_ns::UInt64
185+
# @atomic finished_at::UInt64
168186
#end
169187

170188
export

base/experimental.jl

+74
Original file line numberDiff line numberDiff line change
@@ -368,4 +368,78 @@ adding them to the global method table.
368368
"""
369369
:@MethodTable
370370

371+
### Task metrics
372+
373+
"""
374+
Base.Experimental.task_metrics(::Bool)
375+
376+
Enable or disable the collection of per-task metrics.
377+
A `Task` created when `Base.Experimental.task_metrics(true)` is in effect will have
378+
[`Base.Experimental.task_running_time_ns`](@ref) and [`Base.Experimental.task_wall_time_ns`](@ref)
379+
timing information available.
380+
381+
!!! note
382+
Task metrics can be enabled at start-up via the `--task-metrics=yes` command line option.
383+
"""
384+
function task_metrics(b::Bool)
385+
if b
386+
ccall(:jl_task_metrics_enable, Cvoid, ())
387+
else
388+
ccall(:jl_task_metrics_disable, Cvoid, ())
389+
end
390+
return nothing
391+
end
392+
393+
"""
394+
Base.Experimental.task_running_time_ns(t::Task) -> Union{UInt64, Nothing}
395+
396+
Return the total nanoseconds that the task `t` has spent running.
397+
This metric is only updated when `t` yields or completes unless `t` is the current task, in
398+
which it will be updated continuously.
399+
See also [`Base.Experimental.task_wall_time_ns`](@ref).
400+
401+
Returns `nothing` if task timings are not enabled.
402+
See [`Base.Experimental.task_metrics`](@ref).
403+
404+
!!! note "This metric is from the Julia scheduler"
405+
A task may be running on an OS thread that is descheduled by the OS
406+
scheduler, this time still counts towards the metric.
407+
408+
!!! compat "Julia 1.12"
409+
This method was added in Julia 1.12.
410+
"""
411+
function task_running_time_ns(t::Task=current_task())
412+
t.metrics_enabled || return nothing
413+
if t == current_task()
414+
# These metrics fields can't update while we're running.
415+
# But since we're running we need to include the time since we last started running!
416+
return t.running_time_ns + (time_ns() - t.last_started_running_at)
417+
else
418+
return t.running_time_ns
419+
end
371420
end
421+
422+
"""
423+
Base.Experimental.task_wall_time_ns(t::Task) -> Union{UInt64, Nothing}
424+
425+
Return the total nanoseconds that the task `t` was runnable.
426+
This is the time since the task first entered the run queue until the time at which it
427+
completed, or until the current time if the task has not yet completed.
428+
See also [`Base.Experimental.task_running_time_ns`](@ref).
429+
430+
Returns `nothing` if task timings are not enabled.
431+
See [`Base.Experimental.task_metrics`](@ref).
432+
433+
!!! compat "Julia 1.12"
434+
This method was added in Julia 1.12.
435+
"""
436+
function task_wall_time_ns(t::Task=current_task())
437+
t.metrics_enabled || return nothing
438+
start_at = t.first_enqueued_at
439+
start_at == 0 && return UInt64(0)
440+
end_at = t.finished_at
441+
end_at == 0 && return time_ns() - start_at
442+
return end_at - start_at
443+
end
444+
445+
end # module

base/options.jl

+1
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ struct JLOptions
6060
heap_size_hint::UInt64
6161
trace_compile_timing::Int8
6262
safe_crash_log_file::Ptr{UInt8}
63+
task_metrics::Int8
6364
end
6465

6566
# This runs early in the sysimage != is not defined yet

base/task.jl

+50-3
Original file line numberDiff line numberDiff line change
@@ -810,7 +810,11 @@ function enq_work(t::Task)
810810
return t
811811
end
812812

813-
schedule(t::Task) = enq_work(t)
813+
function schedule(t::Task)
814+
# [task] created -scheduled-> wait_time
815+
maybe_record_enqueued!(t)
816+
enq_work(t)
817+
end
814818

815819
"""
816820
schedule(t::Task, [val]; error=false)
@@ -857,6 +861,8 @@ function schedule(t::Task, @nospecialize(arg); error=false)
857861
t.queue === nothing || Base.error("schedule: Task not runnable")
858862
setfield!(t, :result, arg)
859863
end
864+
# [task] created -scheduled-> wait_time
865+
maybe_record_enqueued!(t)
860866
enq_work(t)
861867
return t
862868
end
@@ -888,9 +894,15 @@ A fast, unfair-scheduling version of `schedule(t, arg); yield()` which
888894
immediately yields to `t` before calling the scheduler.
889895
"""
890896
function yield(t::Task, @nospecialize(x=nothing))
891-
(t._state === task_state_runnable && t.queue === nothing) || error("yield: Task not runnable")
897+
ct = current_task()
898+
t === ct && throw(ConcurrencyViolationError("Cannot yield to currently running task!"))
899+
(t._state === task_state_runnable && t.queue === nothing) || throw(ConcurrencyViolationError("yield: Task not runnable"))
900+
# [task] user_time -yield-> wait_time
901+
record_running_time!(ct)
902+
# [task] created -scheduled-> wait_time
903+
maybe_record_enqueued!(t)
892904
t.result = x
893-
enq_work(current_task())
905+
enq_work(ct)
894906
set_next_task(t)
895907
return try_yieldto(ensure_rescheduled)
896908
end
@@ -904,13 +916,18 @@ call to `yieldto`. This is a low-level call that only switches tasks, not consid
904916
or scheduling in any way. Its use is discouraged.
905917
"""
906918
function yieldto(t::Task, @nospecialize(x=nothing))
919+
ct = current_task()
907920
# TODO: these are legacy behaviors; these should perhaps be a scheduler
908921
# state error instead.
909922
if t._state === task_state_done
910923
return x
911924
elseif t._state === task_state_failed
912925
throw(t.result)
913926
end
927+
# [task] user_time -yield-> wait_time
928+
record_running_time!(ct)
929+
# [task] created -scheduled-unfairly-> wait_time
930+
maybe_record_enqueued!(t)
914931
t.result = x
915932
set_next_task(t)
916933
return try_yieldto(identity)
@@ -924,6 +941,10 @@ function try_yieldto(undo)
924941
rethrow()
925942
end
926943
ct = current_task()
944+
# [task] wait_time -(re)started-> user_time
945+
if ct.metrics_enabled
946+
@atomic :monotonic ct.last_started_running_at = time_ns()
947+
end
927948
if ct._isexception
928949
exc = ct.result
929950
ct.result = nothing
@@ -937,6 +958,11 @@ end
937958

938959
# yield to a task, throwing an exception in it
939960
function throwto(t::Task, @nospecialize exc)
961+
ct = current_task()
962+
# [task] user_time -yield-> wait_time
963+
record_running_time!(ct)
964+
# [task] created -scheduled-unfairly-> wait_time
965+
maybe_record_enqueued!(t)
940966
t.result = exc
941967
t._isexception = true
942968
set_next_task(t)
@@ -989,6 +1015,9 @@ checktaskempty = Partr.multiq_check_empty
9891015
end
9901016

9911017
function wait()
1018+
ct = current_task()
1019+
# [task] user_time -yield-or-done-> wait_time
1020+
record_running_time!(ct)
9921021
GC.safepoint()
9931022
W = workqueue_for(Threads.threadid())
9941023
poptask(W)
@@ -1003,3 +1032,21 @@ if Sys.iswindows()
10031032
else
10041033
pause() = ccall(:pause, Cvoid, ())
10051034
end
1035+
1036+
# update the `running_time_ns` field of `t` to include the time since it last started running.
1037+
function record_running_time!(t::Task)
1038+
if t.metrics_enabled && !istaskdone(t)
1039+
@atomic :monotonic t.running_time_ns += time_ns() - t.last_started_running_at
1040+
end
1041+
return t
1042+
end
1043+
1044+
# if this is the first time `t` has been added to the run queue
1045+
# (or the first time it has been unfairly yielded to without being added to the run queue)
1046+
# then set the `first_enqueued_at` field to the current time.
1047+
function maybe_record_enqueued!(t::Task)
1048+
if t.metrics_enabled && t.first_enqueued_at == 0
1049+
@atomic :monotonic t.first_enqueued_at = time_ns()
1050+
end
1051+
return t
1052+
end

doc/man/julia.1

+4
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,10 @@ Print precompile statements for methods compiled during execution or save to a p
270270
--trace-compile-timing=
271271
If --trace-compile is enabled show how long each took to compile in ms
272272

273+
.TP
274+
--task-metrics={yes|no*}
275+
Enable the collection of per-task metrics.
276+
273277
.TP
274278
-image-codegen
275279
Force generate code in imaging mode

doc/src/base/multi-threading.md

+8
Original file line numberDiff line numberDiff line change
@@ -71,3 +71,11 @@ These building blocks are used to create the regular synchronization objects.
7171
```@docs
7272
Base.Threads.SpinLock
7373
```
74+
75+
## Task metrics (Experimental)
76+
77+
```@docs
78+
Base.Experimental.task_metrics
79+
Base.Experimental.task_running_time_ns
80+
Base.Experimental.task_wall_time_ns
81+
```

doc/src/manual/command-line-interface.md

+1
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,7 @@ The following is a complete list of command-line switches available when launchi
130130
|`--code-coverage=tracefile.info` |Append coverage information to the LCOV tracefile (filename supports format tokens).|
131131
|`--track-allocation[={none*\|user\|all}]` |Count bytes allocated by each source line (omitting setting is equivalent to "user")|
132132
|`--track-allocation=@<path>` |Count bytes but only in files that fall under the given file path/directory. The `@` prefix is required to select this option. A `@` with no path will track the current directory.|
133+
|`--task-metrics={yes\|no*}` |Enable the collection of per-task metrics|
133134
|`--bug-report=KIND` |Launch a bug report session. It can be used to start a REPL, run a script, or evaluate expressions. It first tries to use BugReporting.jl installed in current environment and falls back to the latest compatible BugReporting.jl if not. For more information, see `--bug-report=help`.|
134135
|`--compile={yes*\|no\|all\|min}` |Enable or disable JIT compiler, or request exhaustive or minimal compilation|
135136
|`--output-o <name>` |Generate an object file (including system image data)|

src/init.c

+4
Original file line numberDiff line numberDiff line change
@@ -851,6 +851,10 @@ JL_DLLEXPORT void julia_init(JL_IMAGE_SEARCH rel)
851851
#if defined(_COMPILER_GCC_) && __GNUC__ >= 12
852852
#pragma GCC diagnostic ignored "-Wdangling-pointer"
853853
#endif
854+
if (jl_options.task_metrics == JL_OPTIONS_TASK_METRICS_ON) {
855+
// enable before creating the root task so it gets timings too.
856+
jl_atomic_fetch_add(&jl_task_metrics_enabled, 1);
857+
}
854858
// warning: this changes `jl_current_task`, so be careful not to call that from this function
855859
jl_task_t *ct = jl_init_root_task(ptls, stack_lo, stack_hi);
856860
#pragma GCC diagnostic pop

src/jlapi.c

+30
Original file line numberDiff line numberDiff line change
@@ -509,6 +509,36 @@ JL_DLLEXPORT uint64_t jl_cumulative_recompile_time_ns(void)
509509
return jl_atomic_load_relaxed(&jl_cumulative_recompile_time);
510510
}
511511

512+
/**
513+
* @brief Enable per-task timing.
514+
*/
515+
JL_DLLEXPORT void jl_task_metrics_enable(void)
516+
{
517+
// Increment the flag to allow reentrant callers.
518+
jl_atomic_fetch_add(&jl_task_metrics_enabled, 1);
519+
}
520+
521+
/**
522+
* @brief Disable per-task timing.
523+
*/
524+
JL_DLLEXPORT void jl_task_metrics_disable(void)
525+
{
526+
// Prevent decrementing the counter below zero
527+
uint8_t enabled = jl_atomic_load_relaxed(&jl_task_metrics_enabled);
528+
while (enabled > 0) {
529+
if (jl_atomic_cmpswap(&jl_task_metrics_enabled, &enabled, enabled-1))
530+
break;
531+
}
532+
}
533+
534+
/**
535+
* @brief Retrieve floating-point environment constants.
536+
*
537+
* Populates an array with constants related to the floating-point environment,
538+
* such as rounding modes and exception flags.
539+
*
540+
* @param ret An array of integers to be populated with floating-point environment constants.
541+
*/
512542
JL_DLLEXPORT void jl_get_fenv_consts(int *ret)
513543
{
514544
ret[0] = FE_INEXACT;

src/jloptions.c

+27-11
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@ JL_DLLEXPORT void jl_init_options(void)
9393
0, // heap-size-hint
9494
0, // trace_compile_timing
9595
NULL, // safe_crash_log_file
96+
0, // task_metrics
9697
};
9798
jl_options_initialized = 1;
9899
}
@@ -206,17 +207,22 @@ static const char opts_hidden[] =
206207
" --strip-metadata Remove docstrings and source location info from system image\n"
207208
" --strip-ir Remove IR (intermediate representation) of compiled functions\n\n"
208209

209-
// compiler debugging (see the devdocs for tips on using these options)
210-
" --output-unopt-bc <name> Generate unoptimized LLVM bitcode (.bc)\n"
211-
" --output-bc <name> Generate LLVM bitcode (.bc)\n"
212-
" --output-asm <name> Generate an assembly file (.s)\n"
213-
" --output-incremental={yes|no*}\n"
214-
" Generate an incremental output file (rather than complete)\n"
215-
" --trace-compile={stderr,name}\n"
216-
" Print precompile statements for methods compiled during execution or save to a path\n"
217-
" --trace-compile-timing If --trace-compile is enabled show how long each took to compile in ms\n"
218-
" --image-codegen Force generate code in imaging mode\n"
219-
" --permalloc-pkgimg={yes|no*} Copy the data section of package images into memory\n"
210+
// compiler debugging and experimental (see the devdocs for tips on using these options)
211+
" --experimental Enable the use of experimental (alpha) features\n"
212+
" --output-unopt-bc <name> Generate unoptimized LLVM bitcode (.bc)\n"
213+
" --output-bc <name> Generate LLVM bitcode (.bc)\n"
214+
" --output-asm <name> Generate an assembly file (.s)\n"
215+
" --output-incremental={yes|no*} Generate an incremental output file (rather than\n"
216+
" complete)\n"
217+
" --trace-compile={stderr|name} Print precompile statements for methods compiled\n"
218+
" during execution or save to stderr or a path. Methods that\n"
219+
" were recompiled are printed in yellow or with a trailing\n"
220+
" comment if color is not supported\n"
221+
" --trace-compile-timing If --trace-compile is enabled show how long each took to\n"
222+
" compile in ms\n"
223+
" --task-metrics={yes|no*} Enable collection of per-task timing data.\n"
224+
" --image-codegen Force generate code in imaging mode\n"
225+
" --permalloc-pkgimg={yes|no*} Copy the data section of package images into memory\n"
220226
;
221227

222228
JL_DLLEXPORT void jl_parse_opts(int *argcp, char ***argvp)
@@ -239,6 +245,7 @@ JL_DLLEXPORT void jl_parse_opts(int *argcp, char ***argvp)
239245
opt_trace_compile,
240246
opt_trace_compile_timing,
241247
opt_trace_dispatch,
248+
opt_task_metrics,
242249
opt_math_mode,
243250
opt_worker,
244251
opt_bind_to,
@@ -317,6 +324,7 @@ JL_DLLEXPORT void jl_parse_opts(int *argcp, char ***argvp)
317324
{ "trace-compile", required_argument, 0, opt_trace_compile },
318325
{ "trace-compile-timing", no_argument, 0, opt_trace_compile_timing },
319326
{ "trace-dispatch", required_argument, 0, opt_trace_dispatch },
327+
{ "task-metrics", required_argument, 0, opt_task_metrics },
320328
{ "math-mode", required_argument, 0, opt_math_mode },
321329
{ "handle-signals", required_argument, 0, opt_handle_signals },
322330
// hidden command line options
@@ -873,6 +881,14 @@ JL_DLLEXPORT void jl_parse_opts(int *argcp, char ***argvp)
873881
if (jl_options.safe_crash_log_file == NULL)
874882
jl_error("julia: failed to allocate memory for --safe-crash-log-file");
875883
break;
884+
case opt_task_metrics:
885+
if (!strcmp(optarg, "no"))
886+
jl_options.task_metrics = JL_OPTIONS_TASK_METRICS_OFF;
887+
else if (!strcmp(optarg, "yes"))
888+
jl_options.task_metrics = JL_OPTIONS_TASK_METRICS_ON;
889+
else
890+
jl_errorf("julia: invalid argument to --task-metrics={yes|no} (%s)", optarg);
891+
break;
876892
default:
877893
jl_errorf("julia: unhandled option -- %c\n"
878894
"This is a bug, please report it.", c);

src/jloptions.h

+1
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ typedef struct {
6464
uint64_t heap_size_hint;
6565
int8_t trace_compile_timing;
6666
const char *safe_crash_log_file;
67+
int8_t task_metrics;
6768
} jl_options_t;
6869

6970
#endif

0 commit comments

Comments
 (0)