Skip to content

Commit 9c7b930

Browse files
committed
update performance
1 parent ab0283a commit 9c7b930

File tree

1 file changed

+77
-198
lines changed

1 file changed

+77
-198
lines changed

PERFORMANCE.md

Lines changed: 77 additions & 198 deletions
Original file line numberDiff line numberDiff line change
@@ -80,219 +80,98 @@ The `malloc_cmp_test` build target will build 2 different versions of the test/t
8080
The following test was run in an Ubuntu 20.04.3 LTS (Focal Fossa) for ARM64 docker container with libc version 2.31-0ubuntu9.2 on a MacOS host. The kernel used was `Linux f7f23ca7dc44 5.10.76-linuxkit`.
8181

8282
```
83-
Running IsoAlloc Performance Test
84-
83+
IsoAlloc
8584
build/tests
86-
iso_alloc/iso_free 1441616 tests completed in 0.168293 seconds
87-
iso_calloc/iso_free 1441616 tests completed in 0.171274 seconds
88-
iso_realloc/iso_free 1441616 tests completed in 0.231350 seconds
89-
90-
Running glibc/ptmalloc Performance Test
91-
92-
malloc/free 1441616 tests completed in 0.166813 seconds
93-
calloc/free 1441616 tests completed in 0.223232 seconds
94-
realloc/free 1441616 tests completed in 0.306684 seconds
95-
96-
Running jemalloc Performance Test
97-
98-
LD_PRELOAD=/code/mimalloc-bench/extern/jemalloc/lib/libjemalloc.so build/malloc_tests
99-
malloc/free 1441616 tests completed in 0.064520 seconds
100-
calloc/free 1441616 tests completed in 0.178228 seconds
101-
realloc/free 1441616 tests completed in 0.271620 seconds
102-
103-
Running mimalloc Performance Test
104-
105-
LD_PRELOAD=/code/mimalloc-bench/extern/mimalloc/out/release/libmimalloc.so build/malloc_tests
106-
malloc/free 1441616 tests completed in 0.085471 seconds
107-
calloc/free 1441616 tests completed in 0.099644 seconds
108-
realloc/free 1441616 tests completed in 0.143821 seconds
109-
110-
Running mimalloc-secure Performance Test
111-
112-
LD_PRELOAD=/code/mimalloc-bench/extern/mimalloc/out/secure/libmimalloc-secure.so build/malloc_tests
113-
malloc/free 1441616 tests completed in 0.128479 seconds
114-
calloc/free 1441616 tests completed in 0.148797 seconds
115-
realloc/free 1441616 tests completed in 0.191719 seconds
116-
117-
Running tcmalloc Performance Test
118-
119-
LD_PRELOAD=/code/mimalloc-bench/extern/gperftools/.libs/libtcmalloc_minimal.so build/malloc_tests
120-
malloc/free 1441616 tests completed in 0.093779 seconds
121-
calloc/free 1441616 tests completed in 0.103634 seconds
122-
realloc/free 1441616 tests completed in 0.131152 seconds
123-
124-
Running scudo Performance Test
125-
126-
LD_PRELOAD=/code/mimalloc-bench/extern/scudo/compiler-rt/lib/scudo/standalone/libscudo.so build/malloc_tests
127-
malloc/free 1441616 tests completed in 0.227757 seconds
128-
calloc/free 1441616 tests completed in 0.204610 seconds
129-
realloc/free 1441616 tests completed in 0.258962 seconds
85+
iso_alloc/iso_free 1834784 tests completed in 0.081202 seconds
86+
iso_calloc/iso_free 1834784 tests completed in 1.041517 seconds
87+
iso_realloc/iso_free 1834784 tests completed in 0.828665 seconds
88+
89+
jemalloc
90+
LD_PRELOAD=./libjemalloc.so build/tests
91+
iso_alloc/iso_free 1834784 tests completed in 0.084586 seconds
92+
iso_calloc/iso_free 1834784 tests completed in 1.461562 seconds
93+
iso_realloc/iso_free 1834784 tests completed in 0.779396 seconds
94+
95+
scudo
96+
LD_PRELOAD=./libscudo.so build/malloc_tests
97+
malloc/free 1834784 tests completed in 0.717936 seconds
98+
calloc/free 1834784 tests completed in 2.706141 seconds
99+
realloc/free 1834784 tests completed in 1.840283 seconds
100+
101+
system malloc
102+
malloc/free 1834784 tests completed in 0.662565 seconds
103+
calloc/free 1834784 tests completed in 2.728955 seconds
104+
realloc/free 1834784 tests completed in 1.943556 seconds
130105
131106
```
132107

133-
The same test run on an AWS t2.xlarge Ubuntu 20.04 instance with 4 `Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz` CPUs and 16 gb of memory:
134-
135-
```
136-
Running IsoAlloc Performance Test
137-
138-
iso_alloc/iso_free 1441616 tests completed in 0.147336 seconds
139-
iso_calloc/iso_free 1441616 tests completed in 0.161482 seconds
140-
iso_realloc/iso_free 1441616 tests completed in 0.244981 seconds
141-
142-
Running glibc malloc Performance Test
143-
144-
malloc/free 1441616 tests completed in 0.182437 seconds
145-
calloc/free 1441616 tests completed in 0.246065 seconds
146-
realloc/free 1441616 tests completed in 0.332292 seconds
147-
```
148-
149-
Here is the same test as above on Mac OS 12.1
150-
151-
```
152-
Running IsoAlloc Performance Test
153-
154-
build/tests
155-
iso_alloc/iso_free 1441616 tests completed in 0.149818 seconds
156-
iso_calloc/iso_free 1441616 tests completed in 0.183772 seconds
157-
iso_realloc/iso_free 1441616 tests completed in 0.274413 seconds
158-
159-
Running system malloc Performance Test
160-
161-
build/malloc_tests
162-
malloc/free 1441616 tests completed in 0.084803 seconds
163-
calloc/free 1441616 tests completed in 0.194901 seconds
164-
realloc/free 1441616 tests completed in 0.240934 seconds
165-
```
166-
167108
This same test can be used with the `perf` utility to measure basic stats like page faults and CPU utilization using both heap implementations. The output below is on the same AWS t2.xlarge instance as above.
168109

169110
```
170-
$ perf stat build/tests
111+
$ sudo perf stat build/tests
171112
172-
iso_alloc/iso_free 1441616 tests completed in 0.416603 seconds
173-
iso_calloc/iso_free 1441616 tests completed in 0.575822 seconds
174-
iso_realloc/iso_free 1441616 tests completed in 0.679546 seconds
113+
iso_alloc/iso_free 1834784 tests completed in 0.075247 seconds
114+
iso_calloc/iso_free 1834784 tests completed in 1.100221 seconds
115+
iso_realloc/iso_free 1834784 tests completed in 0.901481 seconds
175116
176117
Performance counter stats for 'build/tests':
177118
178-
1709.07 msec task-clock # 1.000 CPUs utilized
179-
7 context-switches # 0.004 K/sec
180-
0 cpu-migrations # 0.000 K/sec
181-
145562 page-faults # 0.085 M/sec
182-
183-
1.709414837 seconds time elapsed
184-
185-
1.405068000 seconds user
186-
0.304239000 seconds sys
187-
188-
$ perf stat build/malloc_tests
189-
190-
malloc/free 1441616 tests completed in 0.359380 seconds
191-
calloc/free 1441616 tests completed in 0.569044 seconds
192-
realloc/free 1441616 tests completed in 0.597936 seconds
193-
194-
Performance counter stats for 'build/malloc_tests':
195-
196-
1528.51 msec task-clock # 1.000 CPUs utilized
197-
5 context-switches # 0.003 K/sec
198-
0 cpu-migrations # 0.000 K/sec
199-
433055 page-faults # 0.283 M/sec
200-
201-
1.528795324 seconds time elapsed
202-
203-
0.724352000 seconds user
204-
0.804371000 seconds sys
205-
119+
2,082,958,624 task-clock # 0.874 CPUs utilized
120+
26 context-switches # 12.482 /sec
121+
0 cpu-migrations # 0.000 /sec
122+
576,000 page-faults # 276.530 K/sec
123+
<not counted> armv8_pmuv3_0/instructions/ (0.00%)
124+
10,522,826,144 armv8_pmuv3_1/instructions/ # 1.33 insn per cycle
125+
# 0.61 stalled cycles per insn
126+
<not counted> armv8_pmuv3_0/cycles/ (0.00%)
127+
7,910,463,871 armv8_pmuv3_1/cycles/ # 3.798 GHz
128+
<not counted> armv8_pmuv3_0/stalled-cycles-frontend/ (0.00%)
129+
149,093,549 armv8_pmuv3_1/stalled-cycles-frontend/ # 1.88% frontend cycles idle
130+
<not counted> armv8_pmuv3_0/stalled-cycles-backend/ (0.00%)
131+
6,432,153,136 armv8_pmuv3_1/stalled-cycles-backend/ # 81.31% backend cycles idle
132+
<not counted> armv8_pmuv3_0/branches/ (0.00%)
133+
1,914,734,216 armv8_pmuv3_1/branches/ # 919.238 M/sec
134+
<not counted> armv8_pmuv3_0/branch-misses/ (0.00%)
135+
3,870,559 armv8_pmuv3_1/branch-misses/ # 0.20% of all branches
136+
137+
2.382450831 seconds time elapsed
138+
139+
1.325971000 seconds user
140+
1.055181000 seconds sys
206141
```
207142

208-
The following benchmarks were collected from [mimalloc-bench](https://github.com/daanx/mimalloc-bench) with the default configuration of IsoAlloc. As you can see from the data IsoAlloc is competitive with jemalloc, tcmalloc, and glibc/ptmalloc for some benchmarks but clearly falls behind in the Redis benchmark. For any benchmark that IsoAlloc scores poorly on I was able to tweak its build to improve the CPU time and memory consumption. It's worth noting that IsoAlloc was able to stay competitive even with performing many security checks not present in other allocators. Please note these are 'best case' measurements, not averages.
143+
The following benchmarks were collected from [mimalloc-bench](https://github.com/daanx/mimalloc-bench) with the default configuration of IsoAlloc. As you can see from the data IsoAlloc is competitive with other allocators for some benchmarks but clearly falls behind on others. For any benchmark that IsoAlloc scores poorly on I was able to tweak its build to improve the CPU time and memory consumption. It's worth noting that IsoAlloc was able to stay competitive even with performing many security checks not present in other allocators. Please note these are 'best case' measurements, not averages.
209144

210145
```
211-
# benchmark allocator elapsed rss user sys page-faults page-reclaims
212-
213-
cfrac jemalloc 03.47 3948 3.46 0.00 0 422
214-
cfrac mimalloc 03.19 2688 3.18 0.00 0 337
215-
cfrac smimalloc 03.57 2860 3.56 0.00 0 375
216-
cfrac tcmalloc 03.25 7392 3.24 0.00 0 1325
217-
cfrac scudo 06.00 3920 5.99 0.00 0 561
218-
cfrac isoalloc 05.69 12920 5.58 0.10 0 3016
219-
220-
espresso jemalloc 03.61 4508 3.56 0.00 5 553
221-
espresso mimalloc 03.43 3828 3.40 0.01 1 1299
222-
espresso smimalloc 03.65 5760 3.60 0.01 0 2682
223-
espresso tcmalloc 03.43 8132 3.39 0.01 0 1485
224-
espresso scudo 04.53 4028 4.49 0.01 0 514
225-
espresso isoalloc 04.59 48984 4.49 0.07 0 24276
226-
227-
barnes jemalloc 01.93 59412 1.91 0.01 3 16646
228-
barnes mimalloc 01.91 57860 1.89 0.01 0 16539
229-
barnes smimalloc 01.98 57928 1.96 0.01 0 16557
230-
barnes tcmalloc 01.91 62664 1.89 0.01 0 17515
231-
barnes scudo 01.92 58940 1.91 0.01 0 16595
232-
barnes isoalloc 01.92 58328 1.91 0.01 0 16714
233-
234-
redis jemalloc 5.019 31280 2.37 0.17 0 7268
235-
redis mimalloc 4.487 29204 2.07 0.20 0 6825
236-
redis smimalloc 4.909 30992 2.28 0.20 0 7410
237-
redis tcmalloc 4.675 37336 2.17 0.20 0 8682
238-
redis scudo 6.105 36968 2.85 0.23 0 8623
239-
redis iso 7.967 105332 3.48 0.54 0 112953
240-
241-
cache-thrash1 jemalloc 01.28 3648 1.27 0.00 1 240
242-
cache-thrash1 mimalloc 01.28 3408 1.28 0.00 0 197
243-
cache-thrash1 smimalloc 01.28 3256 1.27 0.00 0 202
244-
cache-thrash1 tcmalloc 01.27 7100 1.26 0.00 0 1127
245-
cache-thrash1 scudo 01.27 3240 1.26 0.00 0 200
246-
cache-thrash1 isoalloc 01.26 3460 1.26 0.00 0 363
247-
248-
cache-thrashN jemalloc 00.21 3936 1.64 0.00 0 360
249-
cache-thrashN mimalloc 00.21 3516 1.63 0.00 0 239
250-
cache-thrashN smimalloc 00.22 3584 1.68 0.01 0 249
251-
cache-thrashN tcmalloc 02.74 6992 20.36 0.00 0 1151
252-
cache-thrashN scudo 00.61 3164 2.53 0.00 0 237
253-
cache-thrashN isoalloc 00.71 4032 5.63 0.00 0 472
254-
255-
larsonN jemalloc 4.892 84172 39.71 0.20 1 52478
256-
larsonN mimalloc 4.360 98504 39.61 0.17 0 26372
257-
larsonN smimalloc 6.546 105724 39.77 0.16 3 27432
258-
larsonN tcmalloc 4.450 63464 39.57 0.21 0 15299
259-
larsonN scudo 44.707 33104 28.92 4.80 0 7826
260-
larsonN isoalloc 249.791 63996 7.09 17.51 0 15567
261-
262-
larsonN-sized jemalloc 4.872 84428 39.56 0.22 1 52874
263-
larsonN-sized mimalloc 4.335 95388 39.82 0.13 0 25625
264-
larsonN-sized smimalloc 6.332 106372 39.71 0.17 0 27642
265-
larsonN-sized tcmalloc 4.230 64956 39.59 0.15 0 15669
266-
larsonN-sized scudo 44.601 32900 28.68 4.65 0 7793
267-
larsonN-sized isoalloc 363.176 70240 39.59 0.29 0 17222
268-
269-
mstressN jemalloc 00.92 139772 2.10 1.76 1 984466
270-
mstressN mimalloc 00.43 352132 1.56 0.15 0 88171
271-
mstressN smimalloc 00.62 352204 1.85 0.67 0 95538
272-
mstressN tcmalloc 00.51 147680 1.80 0.25 0 37111
273-
mstressN scudo 01.38 142068 3.23 1.63 0 616639
274-
mstressN isoalloc 03.11 225352 4.90 5.91 0 722991
275-
276-
xmalloc-testN jemalloc 2.307 64460 25.13 5.14 1 22975
277-
xmalloc-testN mimalloc 0.513 82212 36.03 1.05 0 26689
278-
xmalloc-testN smimalloc 0.857 73504 36.58 1.05 0 28285
279-
xmalloc-testN tcmalloc 6.055 40824 9.31 18.77 0 9642
280-
xmalloc-testN scudo 13.416 56708 10.06 14.30 0 16560
281-
xmalloc-testN isoalloc 10.049 15268 7.19 20.91 0 3668
282-
283-
glibc-simple jemalloc 01.96 2984 1.95 0.00 1 313
284-
glibc-simple mimalloc 01.50 1900 1.49 0.00 0 212
285-
glibc-simple smimalloc 01.77 2032 1.76 0.00 0 229
286-
glibc-simple tcmalloc 01.52 6880 1.52 0.00 0 1212
287-
glibc-simple scudo 04.58 2776 4.58 0.00 0 281
288-
glibc-simple isoalloc 04.21 10892 4.12 0.09 0 4674
289-
290-
glibc-thread jemalloc 6.772 4160 15.98 0.00 1 457
291-
glibc-thread mimalloc 3.759 3320 15.98 0.00 0 585
292-
glibc-thread smimalloc 9.012 17144 15.89 0.02 0 4018
293-
glibc-thread tcmalloc 10.434 8508 15.99 0.00 0 1580
294-
glibc-thread scudo 80.979 4076 15.90 0.01 0 582
295-
glibc-thread isoalloc 374.692 2240 2.56 5.14 0 348
146+
#------------------------------------------------------------------
147+
# test alloc time rss user sys page-faults page-reclaims
148+
cfrac je 02.99 4912 2.99 0.00 0 454
149+
cfrac mi 03.01 2484 3.00 0.00 0 346
150+
cfrac iso 05.84 26616 5.75 0.09 0 6502
151+
152+
espresso je 02.52 4872 2.50 0.01 0 538
153+
espresso mi 02.46 3060 2.45 0.01 0 3637
154+
espresso iso 03.65 69876 3.56 0.09 0 21695
155+
156+
barnes je 01.62 60268 1.59 0.02 0 16687
157+
barnes mi 01.71 57672 1.68 0.02 0 16550
158+
barnes iso 01.66 74628 1.62 0.03 0 20851
159+
160+
gs je 00.16 37592 0.15 0.01 0 5808
161+
gs mi 00.16 32588 0.13 0.02 0 5109
162+
gs iso 00.23 71152 0.16 0.07 0 19698
163+
164+
larsonN je 1.171 266596 98.81 0.92 0 409842
165+
larsonN mi 1.016 299768 99.38 0.44 0 83755
166+
larsonN iso 918.582 126528 99.64 0.37 0 31368
167+
168+
rocksdb je 02.48 162424 2.05 0.63 0 38384
169+
rocksdb mi 02.48 159812 2.04 0.66 0 37464
170+
rocksdb iso 02.74 197220 2.49 0.55 0 46815
171+
172+
redis je 3.180 9496 0.14 0.02 0 1538
173+
redis mi 3.080 7088 0.12 0.03 0 1256
174+
redis iso 6.880 52816 0.31 0.05 0 16317
296175
```
297176

298177
IsoAlloc isn't quite ready for performance sensitive server workloads. However it's more than fast enough for client side mobile/desktop applications with risky C/C++ attack surfaces. These environments have threat models similar to what IsoAlloc was designed for.

0 commit comments

Comments
 (0)