Skip to content

Commit 27688c9

Browse files
staticfloatmaleadt
authored andcommitted
Instrument GC with memory profiler implementation
This adds C support for a memory profiler within the GC, tracking locations of allocations, deallocations, etc... It operates in a similar manner as the time profiler with single large buffers setup beforehand through an initialization function, reducing the need for expensive allocations while the program being measured is running. The memory profiler instruments the GC in all locations that the GC statistics themselves are being modified (e.g. `gc_num.allocd` and `gc_num.freed`) by introducing new helper functions `jl_gc_count_{allocd,freed,reallocd}()`. Those utility functions call the `jl_memprofile_track_{de,}alloc()` method to register an address, a size and a tag with the memory profiler. We also track type information as this can be critically helpful when debugging, and to do so without breaking API guarantees we insert methods to set the type of a chunk of memory after allocating it where necessary. The tagging system allows the memory profiler to disambiguate, at profile time, between e.g. pooled allocations and the "big" allocator. It also allows the memory allocator to support tracking multiple "memory domains", e.g. a GPU support package could manually call `jl_memprofile_track_alloc()` any time a chunk of memory is allocated on the GPU so as to use the same system. By default, all values are tracked, however one can set a `memprof_tag_filter` value to track only the values you are most interested in. (E.g. only CPU domain big allocations) To disambiguate the memory and time profilers, we split them out into separate modules.
1 parent c47f8a3 commit 27688c9

24 files changed

+1887
-1029
lines changed

base/error.jl

Lines changed: 0 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -85,31 +85,6 @@ function _reformat_bt(bt, bt2)
8585
ret
8686
end
8787

88-
# Same as above, but inspects a single, stacked array to parse out interepreter frames
89-
function _reformat_bt(bt)
90-
get_ip(frame::Ptr) = UInt(frame)
91-
get_ip(frame::UInt) = frame
92-
get_ptr(frame::UInt) = Ptr{Cvoid}(frame)
93-
get_ptr(frame::Ptr) = frame
94-
95-
ret = Vector{Union{InterpreterIP,Ptr{Cvoid},UInt}}()
96-
i = 1
97-
while i <= length(bt)
98-
# Find all interpreter frames
99-
if get_ip(bt[i]) == (-1)%UInt
100-
push!(ret, InterpreterIP(
101-
unsafe_pointer_to_objref(get_ptr(bt[i+1])),
102-
bt[i+2])
103-
)
104-
i += 3
105-
else
106-
push!(ret, bt[i])
107-
i += 1
108-
end
109-
end
110-
return ret
111-
end
112-
11388
"""
11489
backtrace()
11590

doc/src/manual/profile.md

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -53,15 +53,16 @@ Now we're ready to profile this function:
5353

5454
```julia-repl
5555
julia> using Profile
56+
julia> using Profile: Time
5657
57-
julia> @profile myfunc()
58+
julia> Time.@profile myfunc()
5859
```
5960

6061
To see the profiling results, there is a [graphical browser](https://github.com/timholy/ProfileView.jl)
6162
available, but here we'll use the text-based display that comes with the standard library:
6263

6364
```julia-repl
64-
julia> Profile.print()
65+
julia> Profile.Time.print()
6566
80 ./event.jl:73; (::Base.REPL.##1#2{Base.REPL.REPLBackend})()
6667
80 ./REPL.jl:97; macro expansion
6768
80 ./REPL.jl:66; eval_user_input(::Any, ::Base.REPL.REPLBackend)
@@ -93,7 +94,7 @@ In this example, we can see that the top level function called is in the file `e
9394
function that runs the REPL when you launch Julia. If you examine line 97 of `REPL.jl`,
9495
you'll see this is where the function `eval_user_input()` is called. This is the function that evaluates
9596
what you type at the REPL, and since we're working interactively these functions were invoked
96-
when we entered `@profile myfunc()`. The next line reflects actions taken in the [`@profile`](@ref)
97+
when we entered `Time.@profile myfunc()`. The next line reflects actions taken in the [`Profile.Time.@profile`](@ref)
9798
macro.
9899

99100
The first line shows that 80 backtraces were taken at line 73 of `event.jl`, but it's not that
@@ -128,9 +129,9 @@ as finding the maximum element. We could increase our confidence in this result
128129
collecting more samples:
129130

130131
```julia-repl
131-
julia> @profile (for i = 1:100; myfunc(); end)
132+
julia> Profile.Time.@profile (for i = 1:100; myfunc(); end)
132133
133-
julia> Profile.print()
134+
julia> Profile.Time.print()
134135
[....]
135136
3821 ./REPL[1]:2; myfunc()
136137
3511 ./random.jl:431; rand!(::MersenneTwister, ::Array{Float64,3}, ::Int64, ::Type...
@@ -152,7 +153,7 @@ This illustrates the default "tree" dump; an alternative is the "flat" dump, whi
152153
counts independent of their nesting:
153154

154155
```julia-repl
155-
julia> Profile.print(format=:flat)
156+
julia> Profile.Time.print(format=:flat)
156157
Count File Line Function
157158
6714 ./<missing> -1 anonymous
158159
6714 ./REPL.jl 66 eval_user_input(::Any, ::Base.REPL.REPLBackend)
@@ -200,13 +201,13 @@ useful way to view the results.
200201

201202
## Accumulation and clearing
202203

203-
Results from [`@profile`](@ref) accumulate in a buffer; if you run multiple pieces of code under
204-
[`@profile`](@ref), then [`Profile.print()`](@ref) will show you the combined results. This can
205-
be very useful, but sometimes you want to start fresh; you can do so with [`Profile.clear()`](@ref).
204+
Results from [`Profile.Time.@profile`](@ref) accumulate in a buffer; if you run multiple pieces of code under
205+
[`Profile.Time.@profile`](@ref), then [`Profile.Time.print()`](@ref) will show you the combined results. This can
206+
be very useful, but sometimes you want to start fresh; you can do so with [`Profile.Time.clear()`](@ref).
206207

207208
## Options for controlling the display of profile results
208209

209-
[`Profile.print`](@ref) has more options than we've described so far. Let's see the full declaration:
210+
[`Profile.Time.print`](@ref) has more options than we've described so far. Let's see the full declaration:
210211

211212
```julia
212213
function print(io::IO = stdout, data = fetch(); kwargs...)
@@ -216,15 +217,15 @@ Let's first discuss the two positional arguments, and later the keyword argument
216217

217218
* `io` -- Allows you to save the results to a buffer, e.g. a file, but the default is to print to `stdout`
218219
(the console).
219-
* `data` -- Contains the data you want to analyze; by default that is obtained from [`Profile.fetch()`](@ref),
220+
* `data` -- Contains the data you want to analyze; by default that is obtained from [`Profile.Time.fetch()`](@ref),
220221
which pulls out the backtraces from a pre-allocated buffer. For example, if you want to profile
221222
the profiler, you could say:
222223

223224
```julia
224-
data = copy(Profile.fetch())
225-
Profile.clear()
226-
@profile Profile.print(stdout, data) # Prints the previous results
227-
Profile.print() # Prints results from Profile.print()
225+
data = copy(Profile.Time.fetch())
226+
Profile.Time.clear()
227+
Profile.Time.@profile Profile.Time.print(stdout, data) # Prints the previous results
228+
Profile.Time.print() # Prints results from Profile.Time.print()
228229
```
229230

230231
The keyword arguments can be any combination of:
@@ -233,7 +234,7 @@ The keyword arguments can be any combination of:
233234
with (default, `:tree`) or without (`:flat`) indentation indicating tree
234235
structure.
235236
* `C` -- If `true`, backtraces from C and Fortran code are shown (normally they are excluded). Try running the introductory
236-
example with `Profile.print(C = true)`. This can be extremely helpful in deciding whether it's
237+
example with `Profile.Time.print(C = true)`. This can be extremely helpful in deciding whether it's
237238
Julia code or C code that is causing a bottleneck; setting `C = true` also improves the interpretability
238239
of the nesting, at the cost of longer profile dumps.
239240
* `combine` -- Some lines of code contain multiple operations; for example, `s += A[i]` contains both an array
@@ -256,13 +257,13 @@ to save to a file using a wide `displaysize` in an [`IOContext`](@ref):
256257

257258
```julia
258259
open("/tmp/prof.txt", "w") do s
259-
Profile.print(IOContext(s, :displaysize => (24, 500)))
260+
Profile.Time.print(IOContext(s, :displaysize => (24, 500)))
260261
end
261262
```
262263

263264
## Configuration
264265

265-
[`@profile`](@ref) just accumulates backtraces, and the analysis happens when you call [`Profile.print()`](@ref).
266+
[`Profile.Time.@profile`](@ref) just accumulates backtraces, and the analysis happens when you call [`Profile.Time.print()`](@ref).
266267
For a long-running computation, it's entirely possible that the pre-allocated buffer for storing
267268
backtraces will be filled. If that happens, the backtraces stop but your computation continues.
268269
As a consequence, you may miss some important profiling data (you will get a warning when that
@@ -271,8 +272,8 @@ happens).
271272
You can obtain and configure the relevant parameters this way:
272273

273274
```julia
274-
Profile.init() # returns the current settings
275-
Profile.init(n = 10^7, delay = 0.01)
275+
Profile.Time.init() # returns the current settings
276+
Profile.Time.init(n = 10^7, delay = 0.01)
276277
```
277278

278279
`n` is the total number of instruction pointers you can store, with a default value of `10^6`.

src/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ RUNTIME_C_SRCS := \
4545
simplevector runtime_intrinsics precompile \
4646
threading partr stackwalk gc gc-debug gc-pages gc-stacks method \
4747
jlapi signal-handling safepoint jloptions timing subtype \
48-
crc32c
48+
crc32c memprofile
4949
RUNTIME_SRCS := APInt-C runtime_ccall processor rtutils $(RUNTIME_C_SRCS)
5050
SRCS := $(RUNTIME_SRCS)
5151

src/array.c

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ static jl_array_t *_new_array_(jl_value_t *atype, uint32_t ndims, size_t *dims,
9898
tsz += tot;
9999
tsz = JL_ARRAY_ALIGN(tsz, JL_SMALL_BYTE_ALIGNMENT); // align whole object
100100
a = (jl_array_t*)jl_gc_alloc(ptls, tsz, atype);
101+
jl_memprofile_set_typeof(a, atype);
101102
// No allocation or safepoint allowed after this
102103
a->flags.how = 0;
103104
data = (char*)a + doffs;
@@ -107,12 +108,14 @@ static jl_array_t *_new_array_(jl_value_t *atype, uint32_t ndims, size_t *dims,
107108
else {
108109
tsz = JL_ARRAY_ALIGN(tsz, JL_CACHE_BYTE_ALIGNMENT); // align whole object
109110
data = jl_gc_managed_malloc(tot);
111+
jl_memprofile_set_typeof(data, atype);
110112
// Allocate the Array **after** allocating the data
111113
// to make sure the array is still young
112114
a = (jl_array_t*)jl_gc_alloc(ptls, tsz, atype);
113115
// No allocation or safepoint allowed after this
114116
a->flags.how = 2;
115117
jl_gc_track_malloced_array(ptls, a);
118+
jl_memprofile_set_typeof(a, atype);
116119
if (!isunboxed || isunion)
117120
// need to zero out isbits union array selector bytes to ensure a valid type index
118121
memset(data, 0, tot);
@@ -334,7 +337,9 @@ JL_DLLEXPORT jl_array_t *jl_ptr_to_array_1d(jl_value_t *atype, void *data,
334337
if (own_buffer) {
335338
a->flags.how = 2;
336339
jl_gc_track_malloced_array(ptls, a);
337-
jl_gc_count_allocd(nel*elsz + (elsz == 1 ? 1 : 0));
340+
jl_gc_count_allocd(a, nel*elsz + (elsz == 1 ? 1 : 0),
341+
JL_MEMPROF_TAG_DOMAIN_CPU | JL_MEMPROF_TAG_ALLOC_STDALLOC);
342+
jl_memprofile_set_typeof(a, atype);
338343
}
339344
else {
340345
a->flags.how = 0;
@@ -401,7 +406,9 @@ JL_DLLEXPORT jl_array_t *jl_ptr_to_array(jl_value_t *atype, void *data,
401406
if (own_buffer) {
402407
a->flags.how = 2;
403408
jl_gc_track_malloced_array(ptls, a);
404-
jl_gc_count_allocd(nel*elsz + (elsz == 1 ? 1 : 0));
409+
jl_gc_count_allocd(a, nel*elsz + (elsz == 1 ? 1 : 0),
410+
JL_MEMPROF_TAG_DOMAIN_CPU | JL_MEMPROF_TAG_ALLOC_STDALLOC);
411+
jl_memprofile_set_typeof(a, atype);
405412
}
406413
else {
407414
a->flags.how = 0;
@@ -669,6 +676,7 @@ static int NOINLINE array_resize_buffer(jl_array_t *a, size_t newlen)
669676
a->flags.how = 1;
670677
jl_gc_wb_buf(a, a->data, nbytes);
671678
}
679+
jl_memprofile_set_typeof(a->data, jl_typeof(a));
672680
}
673681
if (JL_ARRAY_IMPL_NUL && elsz == 1 && !isbitsunion)
674682
memset((char*)a->data + oldnbytes - 1, 0, nbytes - oldnbytes + 1);

0 commit comments

Comments
 (0)