-
Notifications
You must be signed in to change notification settings - Fork 35
Postpone freeing a tracker entry, add a reference counter #1270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Postpone freeing a tracker entry, add a reference counter #1270
Conversation
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): |
Compute Benchmarks run ( --compare 'Baseline_PVC'): Summary(Emphasized values are the best results) Improved 40 (threshold 2.00%)
Regressed 6 (threshold 2.00%)
Performance change in benchmark groupsUMFRelative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 (6)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 (6)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 (6)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 (6)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 (6)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 (6)
DetailsBenchmark details contain too many chars to display |
"Exception: The directory /home/test-user/bench_workdir_umf exists but is not a benchmark work directory." |
c23a254
to
fa661de
Compare
The last commit accde78 is not finished yet ! |
fa661de
to
accde78
Compare
What is the purpose of this PR, could you please describe the scenario and how these changes help? |
But how is it possible that one thread has some pointer which belongs to the entry in the memory tracker and another thread removes that entry from the tracker? The first thing that came to my mind is the following:
But it is an ill-formed client's code. What is the real scenario? |
Pool nesting detection. We are checking if there is region in the tracker that contains region that we are adding. So we are looking for region with address lower then our - we are finding one, and in the exact moment other thread removes this region which we found. |
That is exactly my question. How such a scenario might happen: if T1 has a pointer P, that belongs to some region R, and T2 removes this region R. |
@vinser52 @lplewa The explanation is that the pointer P (of T1) does not belong to the region R (of T2). This is the real scenario: |
Thank you, @ldorau, for the clear description of the scenario. That makes sense. |
947a237
to
a2f62b5
Compare
65c4126
to
f0a0881
Compare
@lplewa please review |
1 similar comment
@lplewa please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be continued....
src/critnib/critnib.c
Outdated
@@ -189,6 +200,15 @@ struct critnib *critnib_new(void) { | |||
*/ | |||
static void delete_node(struct critnib *c, struct critnib_node *__restrict n) { | |||
if (is_leaf(n)) { | |||
// call the callback freeing the leaf | |||
if (c->cb_free_leaf && to_leaf(n)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need to_leaf(n) here - are we afraid 0x1 ptr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it is not required here. to_leaf(n)
removed.
Done
src/critnib/critnib.c
Outdated
@@ -90,7 +90,7 @@ | |||
#define SLNODES (1 << SLICE) | |||
|
|||
typedef uintptr_t word; | |||
typedef unsigned char sh_t; | |||
typedef uint64_t sh_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that we could use 64-bit atomic load/store on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are atomic functions for each variable size. If we are missing one, in our utils, we can always add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so please tell me how can I do 8-bit atomic load on Windows?
There is InterlockedExchange8
function to set an 8-bit variable to the specified value as an atomic operation,
but there is no InterlockedCompareExchange8
function ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is _InterlockedCompareExchange8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Done.
src/critnib/critnib.c
Outdated
utils_atomic_load_acquire_ptr(&k->to_be_freed, &to_be_freed); | ||
if (to_be_freed) { | ||
utils_atomic_store_release_ptr(&k->to_be_freed, NULL); | ||
if (c->cb_free_leaf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we do refcounting if CB is NULL? Is it just a waste of the cpu cycles for all refcounting?
imho If CB is null, we just should never update refcount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, changed.
Done
src/critnib/critnib.c
Outdated
utils_atomic_load_acquire_u64(&k->ref_count, &ref_count); | ||
if (ref_count == 0) { | ||
return -1; | ||
} | ||
|
||
if (utils_atomic_increment_u64(&k->ref_count) == 1) { | ||
utils_atomic_decrement_u64(&k->ref_count); | ||
return -1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we check twice if ref_count is zero?
also isn't it a race, between increment and decrement? If between both operation we have very long sleep, this node might be all ready reused, and count be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ref_count
can be incremented only if ref_count > 0
- that's why the first check is needed.
After having incremented ref_count
, we have to check if ref_count == 1
to make sure it wasn't equal 0 before incrementing. If ref_count == 1
we have to decrement it and exit. There is no other way to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why we have first check second check anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also you should use compare and swap instead increment and decrement to eliminate race
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean like that?
static inline int increment_ref_count(struct critnib_leaf *k) {
uint64_t expected;
uint64_t desired;
do {
utils_atomic_load_acquire_u64(&k->ref_count, &expected);
if (expected == 0) {
return -1;
}
desired = expected + 1;
} while (!utils_compare_exchange_u64(&k->ref_count, &expected, &desired));
return 0;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
9b93669
to
3fd0f1b
Compare
@lplewa re-review please |
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): |
Compute Benchmarks run ( --compare 'Baseline_PVC'): Summary(Emphasized values are the best results) Improved 1 (threshold 2.00%)
Regressed 5 (threshold 2.00%)
Performance change in benchmark groupsUMFRelative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
DetailsBenchmark details contain too many chars to display |
3fd0f1b
to
3937f38
Compare
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): |
@lplewa re-review please |
Compute Benchmarks run ( --compare 'Baseline_PVC'): Summary(Emphasized values are the best results) Improved 2 (threshold 2.00%)
Regressed 11 (threshold 2.00%)
Performance change in benchmark groupsUMFRelative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
DetailsBenchmark details contain too many chars to display |
3937f38
to
74f0d0e
Compare
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): |
Compute Benchmarks run ( --compare 'Baseline_PVC'): Failures
Summary(Emphasized values are the best results) Improved 4 (threshold 2.00%)
Regressed 7 (threshold 2.00%)
Performance change in benchmark groupsUMFRelative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
DetailsBenchmark details - environment, command...multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so |
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): |
Compute Benchmarks run ( --compare 'Baseline_PVC'): Failures
Summary(Emphasized values are the best results) Improved 4 (threshold 2.00%)
Regressed 8 (threshold 2.00%)
Performance change in benchmark groupsUMFRelative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
DetailsBenchmark details - environment, command...multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so |
Add utils_atomic_load_acquire_u8 and utils_atomic_store_release_u8 to utils_concurrency.h. Signed-off-by: Lukasz Dorau <[email protected]>
Signed-off-by: Lukasz Dorau <[email protected]>
Signed-off-by: Lukasz Dorau <[email protected]>
Add a reference counter and critnib_release() function. When cb_free_leaf() is SET in critnib_new() the following 4 functions: - critnib_remove(), - critnib_get(), - critnib_find_le() and - critnib_find() return a reference (void *ref) to the returned value, that MUST be released by calling critnib_release() when it is no longer used and can be freed using the cb_free_leaf() callback. Fixes: oneapi-src#1233 Signed-off-by: Lukasz Dorau <[email protected]>
74f0d0e
to
97bf8c3
Compare
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): |
@lplewa re-review please |
Compute Benchmarks run ( --compare 'Baseline_PVC'): Summary(Emphasized values are the best results) Improved 4 (threshold 2.00%)
Regressed 20 (threshold 2.00%)
Performance change in benchmark groupsUMFRelative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
DetailsBenchmark details contain too many chars to display |
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): |
Compute Benchmarks run ( --compare 'Baseline_PVC'): Failures
Summary(Emphasized values are the best results) Improved 3 (threshold 2.00%)
Regressed 6 (threshold 2.00%)
Performance change in benchmark groupsUMFRelative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
DetailsBenchmark details - environment, command...multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemallocCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libjemalloc.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxyCommand:/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv Environment Variables:LD_PRELOAD=libtbbmalloc_proxy.so |
if (c->cb_free_leaf) { | ||
uint64_t ref_count; | ||
utils_atomic_load_acquire_u64(&k->ref_count, &ref_count); | ||
if (ref_count == 0) { | ||
utils_atomic_store_release_ptr(&k->value, c->deleted_leaf); | ||
utils_atomic_store_release_ptr((void **)&c->deleted_leaf, k); | ||
return; | ||
} | ||
|
||
// ref_count > 0, so add k to the pending_del_leaf list | ||
utils_atomic_store_release_ptr((void **)&k->key, c->pending_del_leaf); | ||
utils_atomic_store_release_ptr((void **)&c->pending_del_leaf, k); | ||
} else { | ||
utils_atomic_store_release_ptr((void **)&k->value, c->deleted_leaf); | ||
utils_atomic_store_release_ptr((void **)&c->deleted_leaf, k); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!cb_free_leaf)
ref_count = 0
else
atomic_load......
utils_atomic_store_release_ptr(&k->value, c->deleted_leaf); | ||
utils_atomic_store_release_ptr((void **)&c->deleted_leaf, k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not how corectly we update lock free single headed list.
If two threads adds to this list concurently they can have the same "next" node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the way critnib has been doing it so far and everybody have considered critnib very thread-safe so far.
How do you propose to fix it?
|
||
/* pool of freed nodes: singly linked list, next at child[0] */ | ||
struct critnib_node *deleted_node; | ||
struct critnib_leaf *deleted_leaf; | ||
struct critnib_leaf *pending_del_leaf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this list? We only add to this list, and clean only during critnib delete
int i_del = 0; | ||
uint64_t ref_count = 0; | ||
// We look for a leaf with ref_count == 0. | ||
// If we do not find one we take the first one that was checked. | ||
// In such case (extremely unlikely) we might not be able to free the value | ||
// and a memory leak might happen. Whether it really happened or not | ||
// it is verified and logged in alloc_leaf(). | ||
do { | ||
del = (utils_atomic_increment_u64(&c->remove_count) - 1) % | ||
DELETED_LIFE; | ||
if (++i_del == (DELETED_LIFE + 1)) { | ||
break; | ||
} | ||
|
||
k = c->pending_del_leaves[del]; | ||
if (k) { | ||
utils_atomic_load_acquire_u64(&k->ref_count, &ref_count); | ||
} | ||
} while (k && (ref_count > 0)); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not huge fan of this algoritm.
- As comment says it can memleak
- Causes that remove count can grow faster, what can reduce performance, as it will cause more retries.
Can we just use old algorithm then:
- if refcount is non zero we can just remove this node from deleted array(but not free value yet)
- and when refcount == 0, at last unref, we can add this node back to the deleted(free) list.
* The value of the leaf saved in to_be_freed is freed in critnib_release() | ||
* when the reference count of the leaf drops to 0. | ||
*/ | ||
void *to_be_freed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i do not see a reason/usecase to have two separate ptrs to keep the same data. Are you sure that we need two ptrs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is described above in the comment. If value is set it will be returned to a user. We have to move it somewhere else to be able to free it when the reference counter drops to 0. Yes, I am sure.
Description
Postpone freeing a tracker entry until it is really removed from the tracker.
Fixes: #1233
Checklist