Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use eight level of critnibs in the tracker #1143

Merged

Conversation

ldorau
Copy link
Contributor

@ldorau ldorau commented Feb 27, 2025

Description

Use eight level of critnibs in the tracker.
Multilevel maps are needed to support the case
when one memory pool acts as a memory provider
for another memory pool (nested memory pooling).

Depends on:

Requires:

Checklist

  • Code compiles without errors locally
  • All tests pass locally
  • CI workflows execute properly

@ldorau ldorau requested a review from a team as a code owner February 27, 2025 10:00
@ldorau ldorau force-pushed the Use_eight_level_of_critnibs_in_the_tracker branch from 48ba101 to 6c90359 Compare February 27, 2025 10:05
@ldorau ldorau force-pushed the Use_eight_level_of_critnibs_in_the_tracker branch from 6c90359 to a67d245 Compare February 27, 2025 10:15
@ldorau
Copy link
Contributor Author

ldorau commented Feb 27, 2025

@ldorau ldorau force-pushed the Use_eight_level_of_critnibs_in_the_tracker branch from a67d245 to 7a9b4b1 Compare February 27, 2025 14:11
@ldorau
Copy link
Contributor Author

ldorau commented Feb 27, 2025

More tests added in #1144

@ldorau ldorau force-pushed the Use_eight_level_of_critnibs_in_the_tracker branch from 7a9b4b1 to 17969da Compare February 28, 2025 10:27
@ldorau
Copy link
Contributor Author

ldorau commented Feb 28, 2025

Please re-review

@ldorau ldorau force-pushed the Use_eight_level_of_critnibs_in_the_tracker branch from 17969da to 1b2c84a Compare February 28, 2025 12:00
@ldorau ldorau requested a review from vinser52 February 28, 2025 12:01
@ldorau ldorau force-pushed the Use_eight_level_of_critnibs_in_the_tracker branch from 1b2c84a to 716b562 Compare February 28, 2025 13:38
@ldorau ldorau force-pushed the Use_eight_level_of_critnibs_in_the_tracker branch 3 times, most recently from 59a5d56 to 43481c1 Compare March 3, 2025 11:45
@ldorau
Copy link
Contributor Author

ldorau commented Mar 7, 2025

Lock added.

@vinser52
Copy link
Contributor

vinser52 commented Mar 7, 2025

@vinser52 @bratpiorka @lplewa @igchor It works correctly without a lock in 99%, but sometimes, sporadically, it segfaults like that: https://github.com/ldorau/unified-memory-framework/actions/runs/13696170520/job/38298989389

@vinser52 @bratpiorka @lplewa @igchor Frankly I do not know how to fix it without adding a lock. It seems that we have to add the lock to umfMemoryTrackerAdd() and umfMemoryTrackerRemove().

@ldorau do you know the root cause of the data race? Could you describe the scenario?

@ldorau
Copy link
Contributor Author

ldorau commented Mar 7, 2025

@ldorau do you know the root cause of the data race? Could you describe the scenario?

From the log:

[PID:1859749 TID:1859787 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fbb0000, size=65536
[PID:1859749 TID:1859785 DEBUG UMF] umfMemoryTrackerAddAtLevel: memory region is added, tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fba2000, size=4096
[PID:1859749 TID:1859786 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fb40000, size=262144
[PID:1859749 TID:1859784 DEBUG UMF] slab_find_first_available_chunk_idx: idx: 0
[PID:1859749 TID:1859785 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fbcc000, size=4096
[PID:1859749 TID:1859787 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fbd0000, size=65536
[PID:1859749 TID:1859784 DEBUG UMF] slab_free_chunk: chunk_idx: 0, num_chunks_allocated: 0, first_free_chunk_idx: 0
[PID:1859749 TID:1859784 DEBUG UMF] slab_free_chunk: chunk_idx: 0, num_chunks_allocated: 0, first_free_chunk_idx: 0
[PID:1859749 TID:1859785 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fba5000, size=4096
[PID:1859749 TID:1859784 DEBUG UMF] slab_free_chunk: chunk_idx: 0, num_chunks_allocated: 0, first_free_chunk_idx: 0
[PID:1859749 TID:1859784 DEBUG UMF] slab_free_chunk: chunk_idx: 0, num_chunks_allocated: 0, first_free_chunk_idx: 0
[PID:1859749 TID:1859786 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5f340000, size=262144
[PID:1859749 TID:1859787 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fa80000, size=65536
[PID:1859749 TID:1859786 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fac0000, size=262144
[PID:1859749 TID:1859785 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fba8000, size=4096
[PID:1859749 TID:1859786 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5b880000, size=262144
[PID:1859749 TID:1859787 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fa60000, size=65536
[PID:1859749 TID:1859785 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fba2000, size=4096
Segmentation fault (core dumped)

and the backtrace:

Reading symbols from ./test/test_memoryPool...
[New LWP 1859784]
[New LWP 1859785]
[New LWP 1859749]
[New LWP 1859788]
[New LWP 1859786]
[New LWP 1859787]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./test/test_memoryPool'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007ffb601dde0b in utils_atomic_load_acquire_ptr (ptr=0x78, out=0x7ffb56bf4a60) at /home/ldorau/work/unified-memory-framework/src/utils/utils_concurrency.h:193
193         __atomic_load((uintptr_t *)ptr, (uintptr_t *)out, memory_order_acquire);
[Current thread is 1 (Thread 0x7ffb56bf7640 (LWP 1859784))]
(gdb) bt
#0  0x00007ffb601dde0b in utils_atomic_load_acquire_ptr (ptr=0x78, out=0x7ffb56bf4a60) at /home/ldorau/work/unified-memory-framework/src/utils/utils_concurrency.h:193
#1  0x00007ffb601deb56 in find_predecessor (n=0x0) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:528
#2  0x00007ffb601ded60 in find_le (n=0x7ffb5fc31968, key=140717619724288) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:607
#3  0x00007ffb601decee in find_le (n=0x7ffb5fc36e68, key=140717619724288) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:587
#4  0x00007ffb601decee in find_le (n=0x7ffb5fc31c68, key=140717619724288) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:587
#5  0x00007ffb601decee in find_le (n=0x7ffb5fc34868, key=140717619724288) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:587
#6  0x00007ffb601df14d in critnib_find (c=0x7ffb601a2008, key=140717619724288, dir=FIND_LE, rkey=0x7ffb56bf4c88, rvalue=0x7ffb56bf4c80)
    at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:735
#7  0x00007ffb601db99f in umfMemoryTrackerAdd (hTracker=0x7ffb601a4068, pool=0x7ffb5fc32768, ptr=0x7ffb5fbcc000, size=4096)
    at /home/ldorau/work/unified-memory-framework/src/provider/provider_tracking.c:187
#8  0x00007ffb601dc130 in trackingAlloc (hProvider=0x7ffb601a90a8, size=4096, alignment=4096, _ptr=0x7ffb56bf4d80)
    at /home/ldorau/work/unified-memory-framework/src/provider/provider_tracking.c:369
#9  0x00007ffb601ce82e in umfMemoryProviderAlloc (hProvider=0x7ffb5fc32a68, size=4096, alignment=4096, ptr=0x7ffb56bf4d80)
    at /home/ldorau/work/unified-memory-framework/src/memory_provider.c:216
#10 0x00007ffb601e2498 in disjoint_pool_aligned_malloc (pool=0x7ffb601a4568, size=4096, alignment=4096) at /home/ldorau/work/unified-memory-framework/src/pool/pool_disjoint.c:719
#11 0x00007ffb601cddc6 in umfPoolAlignedMalloc (hPool=0x7ffb5fc32768, size=4096, alignment=4096) at /home/ldorau/work/unified-memory-framework/src/memory_pool.c:163
#12 0x000055a026e98132 in pow2AlignedAllocHelper (pool=0x7ffb5fc32768) at /home/ldorau/work/unified-memory-framework/test/poolFixtures.hpp:190

I see that a lot of umfMemoryTrackerRemove() -> critnib_remove() were called from many different threads (see the log) and then umfMemoryTrackerAdd() -> critnib_find() was called in the same time in another thread (see the backtrace from gdb) which called find_predecessor(NULL) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:528. It looks like critnib_find() conflicts with critnib_remove() when they are called in paralel in many threads.

@vinser52
Copy link
Contributor

vinser52 commented Mar 7, 2025

@ldorau do you know the root cause of the data race? Could you describe the scenario?

From the log:

[PID:1859749 TID:1859787 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fbb0000, size=65536
[PID:1859749 TID:1859785 DEBUG UMF] umfMemoryTrackerAddAtLevel: memory region is added, tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fba2000, size=4096
[PID:1859749 TID:1859786 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fb40000, size=262144
[PID:1859749 TID:1859784 DEBUG UMF] slab_find_first_available_chunk_idx: idx: 0
[PID:1859749 TID:1859785 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fbcc000, size=4096
[PID:1859749 TID:1859787 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fbd0000, size=65536
[PID:1859749 TID:1859784 DEBUG UMF] slab_free_chunk: chunk_idx: 0, num_chunks_allocated: 0, first_free_chunk_idx: 0
[PID:1859749 TID:1859784 DEBUG UMF] slab_free_chunk: chunk_idx: 0, num_chunks_allocated: 0, first_free_chunk_idx: 0
[PID:1859749 TID:1859785 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fba5000, size=4096
[PID:1859749 TID:1859784 DEBUG UMF] slab_free_chunk: chunk_idx: 0, num_chunks_allocated: 0, first_free_chunk_idx: 0
[PID:1859749 TID:1859784 DEBUG UMF] slab_free_chunk: chunk_idx: 0, num_chunks_allocated: 0, first_free_chunk_idx: 0
[PID:1859749 TID:1859786 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5f340000, size=262144
[PID:1859749 TID:1859787 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fa80000, size=65536
[PID:1859749 TID:1859786 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fac0000, size=262144
[PID:1859749 TID:1859785 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fba8000, size=4096
[PID:1859749 TID:1859786 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5b880000, size=262144
[PID:1859749 TID:1859787 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fa60000, size=65536
[PID:1859749 TID:1859785 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7ffb601a4068, level=0, pool=0x7ffb5fc32768, ptr=0x7ffb5fba2000, size=4096
Segmentation fault (core dumped)

and the backtrace:

Reading symbols from ./test/test_memoryPool...
[New LWP 1859784]
[New LWP 1859785]
[New LWP 1859749]
[New LWP 1859788]
[New LWP 1859786]
[New LWP 1859787]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./test/test_memoryPool'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007ffb601dde0b in utils_atomic_load_acquire_ptr (ptr=0x78, out=0x7ffb56bf4a60) at /home/ldorau/work/unified-memory-framework/src/utils/utils_concurrency.h:193
193         __atomic_load((uintptr_t *)ptr, (uintptr_t *)out, memory_order_acquire);
[Current thread is 1 (Thread 0x7ffb56bf7640 (LWP 1859784))]
(gdb) bt
#0  0x00007ffb601dde0b in utils_atomic_load_acquire_ptr (ptr=0x78, out=0x7ffb56bf4a60) at /home/ldorau/work/unified-memory-framework/src/utils/utils_concurrency.h:193
#1  0x00007ffb601deb56 in find_predecessor (n=0x0) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:528
#2  0x00007ffb601ded60 in find_le (n=0x7ffb5fc31968, key=140717619724288) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:607
#3  0x00007ffb601decee in find_le (n=0x7ffb5fc36e68, key=140717619724288) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:587
#4  0x00007ffb601decee in find_le (n=0x7ffb5fc31c68, key=140717619724288) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:587
#5  0x00007ffb601decee in find_le (n=0x7ffb5fc34868, key=140717619724288) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:587
#6  0x00007ffb601df14d in critnib_find (c=0x7ffb601a2008, key=140717619724288, dir=FIND_LE, rkey=0x7ffb56bf4c88, rvalue=0x7ffb56bf4c80)
    at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:735
#7  0x00007ffb601db99f in umfMemoryTrackerAdd (hTracker=0x7ffb601a4068, pool=0x7ffb5fc32768, ptr=0x7ffb5fbcc000, size=4096)
    at /home/ldorau/work/unified-memory-framework/src/provider/provider_tracking.c:187
#8  0x00007ffb601dc130 in trackingAlloc (hProvider=0x7ffb601a90a8, size=4096, alignment=4096, _ptr=0x7ffb56bf4d80)
    at /home/ldorau/work/unified-memory-framework/src/provider/provider_tracking.c:369
#9  0x00007ffb601ce82e in umfMemoryProviderAlloc (hProvider=0x7ffb5fc32a68, size=4096, alignment=4096, ptr=0x7ffb56bf4d80)
    at /home/ldorau/work/unified-memory-framework/src/memory_provider.c:216
#10 0x00007ffb601e2498 in disjoint_pool_aligned_malloc (pool=0x7ffb601a4568, size=4096, alignment=4096) at /home/ldorau/work/unified-memory-framework/src/pool/pool_disjoint.c:719
#11 0x00007ffb601cddc6 in umfPoolAlignedMalloc (hPool=0x7ffb5fc32768, size=4096, alignment=4096) at /home/ldorau/work/unified-memory-framework/src/memory_pool.c:163
#12 0x000055a026e98132 in pow2AlignedAllocHelper (pool=0x7ffb5fc32768) at /home/ldorau/work/unified-memory-framework/test/poolFixtures.hpp:190

I see that a lot of umfMemoryTrackerRemove() -> critnib_remove() were called from many different threads (see the log) and then umfMemoryTrackerAdd() -> critnib_find() was called in the same time in another thread (see the backtrace from gdb) which called find_predecessor(NULL) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:528. It looks like critnib_find() conflicts with critnib_remove() when they are called in paralel in many threads.

Instead of a lock, could you please try to increase the DELETED_LIFE in the critnib.c.

Copy link

github-actions bot commented Mar 7, 2025

Compute Benchmarks run (with params: ):
https://github.com/oneapi-src/unified-memory-framework/actions/runs/13719547199

@ldorau
Copy link
Contributor Author

ldorau commented Mar 7, 2025

Instead of a lock, could you please try to increase the DELETED_LIFE in the critnib.c.

I increased DELETED_LIFE from 16 to 64 and it still segfaults

Copy link

github-actions bot commented Mar 7, 2025

Compute Benchmarks run ():
https://github.com/oneapi-src/unified-memory-framework/actions/runs/13719547199
Job status: success. Test status: success.

Summary

(Emphasized values are the best results)

Improved 15 (threshold 2.00%)
Benchmark This PR baseline Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> 937.686000 ns 1823.830 ns 94.50%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> 1206.500000 ns 2276.500 ns 88.69%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc 1192.720000 ns 1888.790 ns 58.36%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy 1756.930000 ns 2284.250 ns 30.01%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<fixed_provider> 1252.610000 ns 1591.680 ns 27.07%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<fixed_provider> 1608.410000 ns 1976.860 ns 22.91%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy 1592.520000 ns 1937.540 ns 21.67%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<fixed_provider> 1508.990000 ns 1823.570 ns 20.85%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> 36580.900000 ns 40106.300 ns 9.64%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc 1461.640000 ns 1599.030 ns 9.40%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc 507.310000 ns 554.615 ns 9.32%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy 1268.100000 ns 1370.540 ns 8.08%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc 451.982000 ns 472.222 ns 4.48%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> 9421.710000 ns 9723.900 ns 3.21%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc 550.769000 ns 565.154 ns 2.61%
Regressed 11 (threshold 2.00%)
Benchmark This PR baseline Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> 1795.050 ns 1071.940000 ns -40.28%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 proxy_pool<fixed_provider> 4539.460 ns 3846.930000 ns -15.26%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<fixed_provider> 1858.890 ns 1722.190000 ns -7.35%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<os_provider> 11830.700 ns 11011.000000 ns -6.93%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> 2719.700 ns 2573.040000 ns -5.39%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy 733.849 ns 703.001000 ns -4.20%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> 49657.500 ns 47886.400000 ns -3.57%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<fixed_provider> 1086.140 ns 1050.190000 ns -3.31%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy 5386.530 ns 5214.550000 ns -3.19%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> 28088.200 ns 27349.100000 ns -2.63%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy 4713.800 ns 4602.730000 ns -2.36%

Performance change in benchmark groups

UMF
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (9)
Benchmark This PR baseline Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc 1192.720000 ns 1888.790 ns 58.36%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy 1592.520000 ns 1937.540 ns 21.67%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<fixed_provider> 1508.990000 ns 1823.570 ns 20.85%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> 36580.900000 ns 40106.300 ns 9.64%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc 451.982000 ns 472.222 ns 4.48%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<fixed_provider> 35758.700 ns 35742.100000 ns -0.05%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy 5386.530 ns 5214.550000 ns -3.19%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> 49657.500 ns 47886.400000 ns -3.57%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> 1795.050 ns 1071.940000 ns -40.28%
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (11)
Benchmark This PR baseline Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 glibc 364.493000 ns 369.927 ns 1.49%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> 1634.700000 ns 1647.770 ns 0.80%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy 495.417000 ns 497.886 ns 0.50%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> 615.791000 ns 616.342 ns 0.09%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<fixed_provider> 598.438 ns 598.297000 ns -0.02%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc 72.875 ns 72.658400 ns -0.30%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy 1054.240 ns 1048.620000 ns -0.53%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> 1125.420 ns 1114.780000 ns -0.95%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 fixed_provider 2174.500 ns 2139.710000 ns -1.60%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<fixed_provider> 1086.140 ns 1050.190000 ns -3.31%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 proxy_pool<fixed_provider> 4539.460 ns 3846.930000 ns -15.26%
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (9)
Benchmark This PR baseline Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> 1206.500000 ns 2276.500 ns 88.69%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy 1756.930000 ns 2284.250 ns 30.01%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<fixed_provider> 1608.410000 ns 1976.860 ns 22.91%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc 1461.640000 ns 1599.030 ns 9.40%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> 9421.710000 ns 9723.900 ns 3.21%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc 550.769000 ns 565.154 ns 2.61%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy 4991.050 ns 4953.580000 ns -0.75%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> 28088.200 ns 27349.100000 ns -2.63%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<fixed_provider> 19640.500000 ns -
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (9)
Benchmark This PR baseline Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> 630.771000 ns 635.656 ns 0.77%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<fixed_provider> 632.887000 ns 635.733 ns 0.45%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> 1127.920000 ns 1132.280 ns 0.39%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy 1094.250000 ns 1095.590 ns 0.12%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 glibc 840.792000 ns 841.782 ns 0.12%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy 555.262 ns 554.512000 ns -0.14%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc 156.886 ns 156.602000 ns -0.18%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> 2719.700 ns 2573.040000 ns -5.39%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<fixed_provider> 1858.890 ns 1722.190000 ns -7.35%
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (9)
Benchmark This PR baseline Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> 937.686000 ns 1823.830 ns 94.50%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<fixed_provider> 1252.610000 ns 1591.680 ns 27.07%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc 507.310000 ns 554.615 ns 9.32%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy 1268.100000 ns 1370.540 ns 8.08%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc 496.027000 ns 505.881 ns 1.99%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> 12356.800000 ns 12369.400 ns 0.10%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> 8918.900000 ns 8922.790 ns 0.04%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy 4713.800 ns 4602.730000 ns -2.36%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<fixed_provider> 12402.300000 ns -
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (9)
Benchmark This PR baseline Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 glibc 177.071000 ns 179.171 ns 1.19%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<fixed_provider> 378.101000 ns 381.762 ns 0.97%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> 377.795000 ns 380.365 ns 0.68%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> 874.222000 ns 877.584 ns 0.38%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<fixed_provider> 872.819000 ns 875.070 ns 0.26%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc 94.943400 ns 95.131 ns 0.20%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy 338.742 ns 338.287000 ns -0.13%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> 700.923 ns 699.454000 ns -0.21%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy 733.849 ns 703.001000 ns -4.20%
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (2)
Benchmark This PR baseline Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 os_provider 9422.500 ns 9406.070000 ns -0.17%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<os_provider> 11830.700 ns 11011.000000 ns -6.93%

Details

Benchmark details - environment, command...
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 glibc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 glibc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 glibc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 os_provider

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 proxy_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 fixed_provider

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=libjemalloc.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=libjemalloc.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=libjemalloc.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=libjemalloc.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=libjemalloc.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=libjemalloc.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

@ldorau
Copy link
Contributor Author

ldorau commented Mar 7, 2025

One step to reproduce the segfault:
$ while LD_PRELOAD=./lib/libumf_proxy.so ./test/test_memoryPool ; do date ; done
and wait 1-2 minutes at most.

@ldorau
Copy link
Contributor Author

ldorau commented Mar 7, 2025

Instead of a lock, could you please try to increase the DELETED_LIFE in the critnib.c.

I increased DELETED_LIFE from 16 to 64 and it still segfaults

It segfaults even with DELETED_LIFE equal 1024 ! in exactly the same way:

Core was generated by `./test/test_memoryPool'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f68b4341e0b in utils_atomic_load_acquire_ptr (ptr=0x78, out=0x7f68a7df0a90) at /home/ldorau/work/unified-memory-framework/src/utils/utils_concurrency.h:193
193         __atomic_load((uintptr_t *)ptr, (uintptr_t *)out, memory_order_acquire);
[Current thread is 1 (Thread 0x7f68a7df3640 (LWP 2066815))]
(gdb) bt
#0  0x00007f68b4341e0b in utils_atomic_load_acquire_ptr (ptr=0x78, out=0x7f68a7df0a90) at /home/ldorau/work/unified-memory-framework/src/utils/utils_concurrency.h:193
#1  0x00007f68b4342b6a in find_predecessor (n=0x0) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:528
#2  0x00007f68b4342ca0 in find_le (n=0x7f68b3d2c110, key=140087670162432) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:572
#3  0x00007f68b4342d02 in find_le (n=0x7f68b3d2b210, key=140087670162432) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:587
#4  0x00007f68b4342d02 in find_le (n=0x7f68b3d9ab68, key=140087670162432) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:587
#5  0x00007f68b4342d02 in find_le (n=0x7f68b3d9ad68, key=140087670162432) at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:587
#6  0x00007f68b4343164 in critnib_find (c=0x7f68b3d90008, key=140087670162432, dir=FIND_LE, rkey=0x7f68a7df0cb8, rvalue=0x7f68a7df0cb0)
    at /home/ldorau/work/unified-memory-framework/src/critnib/critnib.c:735

@ldorau ldorau force-pushed the Use_eight_level_of_critnibs_in_the_tracker branch 2 times, most recently from e7c26ba to e3fdd5b Compare March 10, 2025 12:10
@ldorau
Copy link
Contributor Author

ldorau commented Mar 10, 2025

Requires: #1176

@ldorau ldorau force-pushed the Use_eight_level_of_critnibs_in_the_tracker branch from e3fdd5b to 4d17c47 Compare March 10, 2025 13:16
@ldorau
Copy link
Contributor Author

ldorau commented Mar 10, 2025

Lock removed

@ldorau ldorau force-pushed the Use_eight_level_of_critnibs_in_the_tracker branch from 4d17c47 to 8531428 Compare March 10, 2025 13:24
@vinser52 vinser52 mentioned this pull request Mar 10, 2025
11 tasks
ldorau added 7 commits March 11, 2025 08:11
Use atomics in find_successor() like in find_predecessor().

Ref: oneapi-src#1175

Signed-off-by: Lukasz Dorau <[email protected]>
Multilevel maps are needed to support the case
when one memory pool acts as a memory provider
for another memory pool (nested memory pooling).

Signed-off-by: Lukasz Dorau <[email protected]>
@ldorau ldorau force-pushed the Use_eight_level_of_critnibs_in_the_tracker branch from 8531428 to c1b9f1b Compare March 11, 2025 07:11
@ldorau
Copy link
Contributor Author

ldorau commented Mar 11, 2025

@vinser52 Lock removed. Please re-review and resolve your issues.

@bratpiorka bratpiorka merged commit 53a318f into oneapi-src:main Mar 11, 2025
82 checks passed
@ldorau ldorau deleted the Use_eight_level_of_critnibs_in_the_tracker branch March 11, 2025 10:42
@ldorau ldorau mentioned this pull request Mar 13, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants