Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEST] test old UMF perf #17635

Draft
wants to merge 2 commits into
base: unify-benchmark-ci
Choose a base branch
from

Conversation

bratpiorka
Copy link
Contributor

test old UMF perf

@bratpiorka bratpiorka changed the base branch from sycl to unify-benchmark-ci March 25, 2025 13:10
@bratpiorka bratpiorka force-pushed the review/rrudnick/old_umf_perf branch 5 times, most recently from 00e26d8 to d38ee71 Compare March 25, 2025 14:46
Copy link
Contributor

Compute Benchmarks level_zero_v2 run (with params: ):
https://github.com/intel/llvm/actions/runs/14062662803

Copy link
Contributor

Benchmarks level_zero_v2 run ():
https://github.com/intel/llvm/actions/runs/14062662803
Job status: success. Test status: success.

Failures

Name Failure
SYCL-Bench Suite setup failure: Command '['git', 'checkout', '31fc70be6266193c4ba60eb1fe3ce26edee4ca5b']' returned non-zero exit status 128.
llama.cpp bench Suite setup failure: Command '['cmake', '--build', '/home/test-user/llvm_bench_workdir/llamacpp-build', '-j', '120']' returned non-zero exit status 2.
api_overhead_benchmark_sycl SubmitKernel out of order Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=0', '--DiscardEvents=0', '--MeasureCompletion=0', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_sycl SubmitKernel out of order with measure completion Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=0', '--DiscardEvents=0', '--MeasureCompletion=1', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_sycl SubmitKernel in order Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=1', '--DiscardEvents=0', '--MeasureCompletion=0', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_sycl SubmitKernel in order with measure completion Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=1', '--DiscardEvents=0', '--MeasureCompletion=1', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_ur SubmitKernel out of order Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=0', '--DiscardEvents=0', '--MeasureCompletion=0', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur SubmitKernel out of order with measure completion Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=0', '--DiscardEvents=0', '--MeasureCompletion=1', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur SubmitKernel in order Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=1', '--DiscardEvents=0', '--MeasureCompletion=0', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur SubmitKernel in order with measure completion Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=1', '--DiscardEvents=0', '--MeasureCompletion=1', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGSEGV: 11>.
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=0', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=0', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=0', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGSEGV: 11>.
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=0', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGSEGV: 11>.
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGSEGV: 11>.
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGSEGV: 11>.
ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl', '--test=EmptyKernel', '--csv', '--noHeaders', '--iterations=10000', '--wgs=256', '--wgc=256']' died with <Signals.SIGABRT: 6>.
ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl', '--test=KernelSwitch', '--csv', '--noHeaders', '--iterations=1000', '--count=8', '--kernelTime=200', '--barrier=0', '--hostVisible=0', '--ioq=1', '--ctrBasedEvents=1']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl', '--test=QueueInOrderMemcpy', '--csv', '--noHeaders', '--iterations=10000', '--IsCopyOnly=0', '--sourcePlacement=Device', '--destinationPlacement=Device', '--size=1024', '--count=100']' died with <Signals.SIGABRT: 6>.
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl', '--test=QueueInOrderMemcpy', '--csv', '--noHeaders', '--iterations=10000', '--IsCopyOnly=0', '--sourcePlacement=Host', '--destinationPlacement=Device', '--size=1024', '--count=100']' died with <Signals.SIGABRT: 6>.
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl', '--test=QueueMemcpy', '--csv', '--noHeaders', '--iterations=10000', '--sourcePlacement=Device', '--destinationPlacement=Device', '--size=1024']' died with <Signals.SIGABRT: 6>.
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl', '--test=StreamMemory', '--csv', '--noHeaders', '--iterations=10000', '--type=Triad', '--size=10240', '--memoryPlacement=Device', '--useEvents=0', '--contents=Zeros', '--multiplier=1', '--vectorSize=1']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=ExecImmediateCopyQueue', '--csv', '--noHeaders', '--iterations=100000', '--ioq=0', '--IsCopyOnly=1', '--MeasureCompletionTime=0', '--src=Device', '--dst=Device', '--size=1024']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=ExecImmediateCopyQueue', '--csv', '--noHeaders', '--iterations=100000', '--ioq=1', '--IsCopyOnly=1', '--MeasureCompletionTime=0', '--src=Host', '--dst=Host', '--size=1024']' died with <Signals.SIGABRT: 6>.
miscellaneous_benchmark_sycl VectorSum Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl', '--test=VectorSum', '--csv', '--noHeaders', '--iterations=1000', '--numberOfElementsX=512', '--numberOfElementsY=256', '--numberOfElementsZ=256']' died with <Signals.SIGABRT: 6>.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur', '--test=MemcpyExecute', '--csv', '--noHeaders', '--Ioq=1', '--UseEvents=1', '--MeasureCompletion=1', '--UseQueuePerThread=1', '--AllocSize=102400', '--NumThreads=1', '--NumOpsPerThread=400', '--iterations=10', '--SrcUSM=1', '--DstUSM=1']' died with <Signals.SIGSEGV: 11>.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur', '--test=MemcpyExecute', '--csv', '--noHeaders', '--Ioq=1', '--UseEvents=1', '--MeasureCompletion=1', '--UseQueuePerThread=1', '--AllocSize=102400', '--NumThreads=1', '--NumOpsPerThread=400', '--iterations=10', '--SrcUSM=0', '--DstUSM=1']' died with <Signals.SIGSEGV: 11>.
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur', '--test=MemcpyExecute', '--csv', '--noHeaders', '--Ioq=1', '--UseEvents=0', '--MeasureCompletion=1', '--UseQueuePerThread=1', '--AllocSize=1024', '--NumThreads=4', '--NumOpsPerThread=4096', '--iterations=10', '--SrcUSM=0', '--DstUSM=1']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:Both Benchmark run failure: Error parsing output: list index out of range
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:Both Benchmark run failure: Error parsing output: list index out of range
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:Both Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=UsmBatchMemoryAllocation', '--csv', '--noHeaders', '--type=Device', '--allocationCount=256', '--size=4096', '--measureMode=Both', '--iterations=1000']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:Both Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=UsmBatchMemoryAllocation', '--csv', '--noHeaders', '--type=Device', '--allocationCount=32', '--size=4194304', '--measureMode=Both', '--iterations=1000']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniform Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=UsmRandomMemoryAllocation', '--csv', '--noHeaders', '--type=Device', '--operationCount=256', '--minSize=4096', '--maxSize=33554432', '--sizeDistribution=LogUniform', '--iterations=1000']' died with <Signals.SIGSEGV: 11>.
Velocity-Bench Hashtable Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl', '--no-verify']' returned non-zero exit status 1.
Velocity-Bench Bitcracker Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/bitcracker/bitcracker', '-f', '/home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt', '-d', '/home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt', '-b', '60000']' died with <Signals.SIGABRT: 6>.
Velocity-Bench CudaSift Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/cudaSift/cudaSift']' died with <Signals.SIGABRT: 6>.
Velocity-Bench Easywave Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/easywave/easyWave_sycl', '-grid', '/home/test-user/llvm_bench_workdir/data/easywave/examples/e2Asean.grd', '-source', '/home/test-user/llvm_bench_workdir/data/easywave/examples/BengkuluSept2007.flt', '-time', '120']' died with <Signals.SIGABRT: 6>.
Velocity-Bench QuickSilver Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/QuickSilver/qs', '-i', '/home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp']' died with <Signals.SIGABRT: 6>.
Velocity-Bench Sobel Filter Benchmark run failure: {self.class.name}: Failed to parse benchmark output.
Velocity-Bench dl-cifar Benchmark run failure: Failed to parse benchmark output.
Velocity-Bench dl-mnist Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl', '-conv_algo', 'ONEDNN_AUTO']' died with <Signals.SIGABRT: 6>.
Velocity-Bench svm Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/svm/svm_sycl', '/home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a', '/home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m']' died with <Signals.SIGABRT: 6>.

Summary

(Emphasized values are the best results)
No diffs to calculate performance change

Performance change in benchmark groups

Compute Benchmarks
Relative perf in group SubmitKernel (2)
Benchmark This PR
api_overhead_benchmark_l0 SubmitKernel out of order 11.850000 μs
api_overhead_benchmark_l0 SubmitKernel in order 12.001000 μs
Relative perf in group SubmitKernel With Completion (2)
Benchmark This PR
api_overhead_benchmark_l0 SubmitKernel out of order with measure completion 15.540000 μs
api_overhead_benchmark_l0 SubmitKernel in order with measure completion 18.312000 μs
Relative perf in group SinKernelGraph 5 (2)
Benchmark This PR
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5 26.180000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5 28.682000 μs
Relative perf in group SinKernelGraph 100 (2)
Benchmark This PR
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100 253.956000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100 252.135000 μs
Relative perf in group EmptyKernel 1000 256 (1)
Benchmark This PR
ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256 4.232000 μs
Relative perf in group KernelSwitch 8 200 (1)
Benchmark This PR
ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200 1.165000 μs

Details

Benchmark details - environment, command...
api_overhead_benchmark_l0 SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256

ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1

@bratpiorka bratpiorka force-pushed the review/rrudnick/old_umf_perf branch from d38ee71 to 6c126a5 Compare March 25, 2025 14:55
Copy link
Contributor

Compute Benchmarks level_zero_v2 run (with params: ):
https://github.com/intel/llvm/actions/runs/14062867842

Copy link
Contributor

Benchmarks level_zero_v2 run ():
https://github.com/intel/llvm/actions/runs/14062867842
Job status: success. Test status: success.

Failures

Name Failure
SYCL-Bench Suite setup failure: Command '['git', 'checkout', '31fc70be6266193c4ba60eb1fe3ce26edee4ca5b']' returned non-zero exit status 128.
llama.cpp bench Suite setup failure: Command '['cmake', '--build', '/home/test-user/llvm_bench_workdir/llamacpp-build', '-j', '120']' returned non-zero exit status 2.
api_overhead_benchmark_sycl SubmitKernel out of order Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=0', '--DiscardEvents=0', '--MeasureCompletion=0', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_sycl SubmitKernel out of order with measure completion Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=0', '--DiscardEvents=0', '--MeasureCompletion=1', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_sycl SubmitKernel in order Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=1', '--DiscardEvents=0', '--MeasureCompletion=0', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_sycl SubmitKernel in order with measure completion Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=1', '--DiscardEvents=0', '--MeasureCompletion=1', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_ur SubmitKernel out of order Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=0', '--DiscardEvents=0', '--MeasureCompletion=0', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur SubmitKernel out of order with measure completion Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=0', '--DiscardEvents=0', '--MeasureCompletion=1', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur SubmitKernel in order Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=1', '--DiscardEvents=0', '--MeasureCompletion=0', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur SubmitKernel in order with measure completion Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=1', '--DiscardEvents=0', '--MeasureCompletion=1', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGSEGV: 11>.
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=0', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=0', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=0', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGSEGV: 11>.
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=0', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGSEGV: 11>.
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGSEGV: 11>.
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGSEGV: 11>.
ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl', '--test=EmptyKernel', '--csv', '--noHeaders', '--iterations=10000', '--wgs=256', '--wgc=256']' died with <Signals.SIGABRT: 6>.
ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl', '--test=KernelSwitch', '--csv', '--noHeaders', '--iterations=1000', '--count=8', '--kernelTime=200', '--barrier=0', '--hostVisible=0', '--ioq=1', '--ctrBasedEvents=1']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl', '--test=QueueInOrderMemcpy', '--csv', '--noHeaders', '--iterations=10000', '--IsCopyOnly=0', '--sourcePlacement=Device', '--destinationPlacement=Device', '--size=1024', '--count=100']' died with <Signals.SIGABRT: 6>.
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl', '--test=QueueInOrderMemcpy', '--csv', '--noHeaders', '--iterations=10000', '--IsCopyOnly=0', '--sourcePlacement=Host', '--destinationPlacement=Device', '--size=1024', '--count=100']' died with <Signals.SIGABRT: 6>.
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl', '--test=QueueMemcpy', '--csv', '--noHeaders', '--iterations=10000', '--sourcePlacement=Device', '--destinationPlacement=Device', '--size=1024']' died with <Signals.SIGABRT: 6>.
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl', '--test=StreamMemory', '--csv', '--noHeaders', '--iterations=10000', '--type=Triad', '--size=10240', '--memoryPlacement=Device', '--useEvents=0', '--contents=Zeros', '--multiplier=1', '--vectorSize=1']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=ExecImmediateCopyQueue', '--csv', '--noHeaders', '--iterations=100000', '--ioq=0', '--IsCopyOnly=1', '--MeasureCompletionTime=0', '--src=Device', '--dst=Device', '--size=1024']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=ExecImmediateCopyQueue', '--csv', '--noHeaders', '--iterations=100000', '--ioq=1', '--IsCopyOnly=1', '--MeasureCompletionTime=0', '--src=Host', '--dst=Host', '--size=1024']' died with <Signals.SIGABRT: 6>.
miscellaneous_benchmark_sycl VectorSum Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl', '--test=VectorSum', '--csv', '--noHeaders', '--iterations=1000', '--numberOfElementsX=512', '--numberOfElementsY=256', '--numberOfElementsZ=256']' died with <Signals.SIGABRT: 6>.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur', '--test=MemcpyExecute', '--csv', '--noHeaders', '--Ioq=1', '--UseEvents=1', '--MeasureCompletion=1', '--UseQueuePerThread=1', '--AllocSize=102400', '--NumThreads=1', '--NumOpsPerThread=400', '--iterations=10', '--SrcUSM=1', '--DstUSM=1']' died with <Signals.SIGSEGV: 11>.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur', '--test=MemcpyExecute', '--csv', '--noHeaders', '--Ioq=1', '--UseEvents=1', '--MeasureCompletion=1', '--UseQueuePerThread=1', '--AllocSize=102400', '--NumThreads=1', '--NumOpsPerThread=400', '--iterations=10', '--SrcUSM=0', '--DstUSM=1']' died with <Signals.SIGSEGV: 11>.
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur', '--test=MemcpyExecute', '--csv', '--noHeaders', '--Ioq=1', '--UseEvents=0', '--MeasureCompletion=1', '--UseQueuePerThread=1', '--AllocSize=1024', '--NumThreads=4', '--NumOpsPerThread=4096', '--iterations=10', '--SrcUSM=0', '--DstUSM=1']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:Both Benchmark run failure: Error parsing output: list index out of range
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:Both Benchmark run failure: Error parsing output: list index out of range
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:Both Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=UsmBatchMemoryAllocation', '--csv', '--noHeaders', '--type=Device', '--allocationCount=256', '--size=4096', '--measureMode=Both', '--iterations=1000']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:Both Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=UsmBatchMemoryAllocation', '--csv', '--noHeaders', '--type=Device', '--allocationCount=32', '--size=4194304', '--measureMode=Both', '--iterations=1000']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniform Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=UsmRandomMemoryAllocation', '--csv', '--noHeaders', '--type=Device', '--operationCount=256', '--minSize=4096', '--maxSize=33554432', '--sizeDistribution=LogUniform', '--iterations=1000']' died with <Signals.SIGSEGV: 11>.
Velocity-Bench Hashtable Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl', '--no-verify']' returned non-zero exit status 1.
Velocity-Bench Bitcracker Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/bitcracker/bitcracker', '-f', '/home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt', '-d', '/home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt', '-b', '60000']' died with <Signals.SIGABRT: 6>.
Velocity-Bench CudaSift Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/cudaSift/cudaSift']' died with <Signals.SIGABRT: 6>.
Velocity-Bench Easywave Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/easywave/easyWave_sycl', '-grid', '/home/test-user/llvm_bench_workdir/data/easywave/examples/e2Asean.grd', '-source', '/home/test-user/llvm_bench_workdir/data/easywave/examples/BengkuluSept2007.flt', '-time', '120']' died with <Signals.SIGABRT: 6>.
Velocity-Bench QuickSilver Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/QuickSilver/qs', '-i', '/home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp']' died with <Signals.SIGABRT: 6>.
Velocity-Bench Sobel Filter Benchmark run failure: {self.class.name}: Failed to parse benchmark output.
Velocity-Bench dl-cifar Benchmark run failure: Failed to parse benchmark output.
Velocity-Bench dl-mnist Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl', '-conv_algo', 'ONEDNN_AUTO']' died with <Signals.SIGABRT: 6>.
Velocity-Bench svm Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/svm/svm_sycl', '/home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a', '/home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m']' died with <Signals.SIGABRT: 6>.

Summary

(Emphasized values are the best results)
No diffs to calculate performance change

Performance change in benchmark groups

Compute Benchmarks
Relative perf in group SubmitKernel (2)
Benchmark This PR
api_overhead_benchmark_l0 SubmitKernel out of order 12.027000 μs
api_overhead_benchmark_l0 SubmitKernel in order 11.821000 μs
Relative perf in group SubmitKernel With Completion (2)
Benchmark This PR
api_overhead_benchmark_l0 SubmitKernel out of order with measure completion 15.385000 μs
api_overhead_benchmark_l0 SubmitKernel in order with measure completion 18.382000 μs
Relative perf in group SinKernelGraph 5 (2)
Benchmark This PR
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5 25.876000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5 29.114000 μs
Relative perf in group SinKernelGraph 100 (2)
Benchmark This PR
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100 247.817000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100 258.402000 μs
Relative perf in group EmptyKernel 1000 256 (1)
Benchmark This PR
ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256 4.298000 μs
Relative perf in group KernelSwitch 8 200 (1)
Benchmark This PR
ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200 1.028000 μs

Details

Benchmark details - environment, command...
api_overhead_benchmark_l0 SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256

ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1

@bratpiorka bratpiorka force-pushed the review/rrudnick/old_umf_perf branch 2 times, most recently from f141db2 to a42a419 Compare March 26, 2025 18:18
Copy link
Contributor

Compute Benchmarks level_zero_v2 run (with params: ):
https://github.com/intel/llvm/actions/runs/14090647730

Copy link
Contributor

Benchmarks level_zero_v2 run ():
https://github.com/intel/llvm/actions/runs/14090647730
Job status: success. Test status: success.

Failures

Name Failure
SYCL-Bench Suite setup failure: Command '['git', 'checkout', '31fc70be6266193c4ba60eb1fe3ce26edee4ca5b']' returned non-zero exit status 128.
llama.cpp bench Suite setup failure: Command '['cmake', '--build', '/home/test-user/llvm_bench_workdir/llamacpp-build', '-j', '120']' returned non-zero exit status 2.
api_overhead_benchmark_sycl SubmitKernel out of order Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=0', '--DiscardEvents=0', '--MeasureCompletion=0', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_sycl SubmitKernel out of order with measure completion Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=0', '--DiscardEvents=0', '--MeasureCompletion=1', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_sycl SubmitKernel in order Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=1', '--DiscardEvents=0', '--MeasureCompletion=0', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_sycl SubmitKernel in order with measure completion Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=1', '--DiscardEvents=0', '--MeasureCompletion=1', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_ur SubmitKernel out of order Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=0', '--DiscardEvents=0', '--MeasureCompletion=0', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur SubmitKernel out of order with measure completion Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=0', '--DiscardEvents=0', '--MeasureCompletion=1', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur SubmitKernel in order Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=1', '--DiscardEvents=0', '--MeasureCompletion=0', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur SubmitKernel in order with measure completion Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=SubmitKernel', '--csv', '--noHeaders', '--Ioq=1', '--DiscardEvents=0', '--MeasureCompletion=1', '--iterations=100000', '--Profiling=0', '--NumKernels=10', '--KernelExecTime=1']' died with <Signals.SIGSEGV: 11>.
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=0', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=0', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=0', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGSEGV: 11>.
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=0', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGSEGV: 11>.
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGSEGV: 11>.
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGSEGV: 11>.
ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl', '--test=EmptyKernel', '--csv', '--noHeaders', '--iterations=10000', '--wgs=256', '--wgc=256']' died with <Signals.SIGABRT: 6>.
ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl', '--test=KernelSwitch', '--csv', '--noHeaders', '--iterations=1000', '--count=8', '--kernelTime=200', '--barrier=0', '--hostVisible=0', '--ioq=1', '--ctrBasedEvents=1']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1 Benchmark run failure: Error parsing output: could not convert string to float: 'ERROR'
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl', '--test=QueueInOrderMemcpy', '--csv', '--noHeaders', '--iterations=10000', '--IsCopyOnly=0', '--sourcePlacement=Device', '--destinationPlacement=Device', '--size=1024', '--count=100']' died with <Signals.SIGABRT: 6>.
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl', '--test=QueueInOrderMemcpy', '--csv', '--noHeaders', '--iterations=10000', '--IsCopyOnly=0', '--sourcePlacement=Host', '--destinationPlacement=Device', '--size=1024', '--count=100']' died with <Signals.SIGABRT: 6>.
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl', '--test=QueueMemcpy', '--csv', '--noHeaders', '--iterations=10000', '--sourcePlacement=Device', '--destinationPlacement=Device', '--size=1024']' died with <Signals.SIGABRT: 6>.
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl', '--test=StreamMemory', '--csv', '--noHeaders', '--iterations=10000', '--type=Triad', '--size=10240', '--memoryPlacement=Device', '--useEvents=0', '--contents=Zeros', '--multiplier=1', '--vectorSize=1']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=ExecImmediateCopyQueue', '--csv', '--noHeaders', '--iterations=100000', '--ioq=0', '--IsCopyOnly=1', '--MeasureCompletionTime=0', '--src=Device', '--dst=Device', '--size=1024']' died with <Signals.SIGABRT: 6>.
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl', '--test=ExecImmediateCopyQueue', '--csv', '--noHeaders', '--iterations=100000', '--ioq=1', '--IsCopyOnly=1', '--MeasureCompletionTime=0', '--src=Host', '--dst=Host', '--size=1024']' died with <Signals.SIGABRT: 6>.
miscellaneous_benchmark_sycl VectorSum Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl', '--test=VectorSum', '--csv', '--noHeaders', '--iterations=1000', '--numberOfElementsX=512', '--numberOfElementsY=256', '--numberOfElementsZ=256']' died with <Signals.SIGABRT: 6>.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur', '--test=MemcpyExecute', '--csv', '--noHeaders', '--Ioq=1', '--UseEvents=1', '--MeasureCompletion=1', '--UseQueuePerThread=1', '--AllocSize=102400', '--NumThreads=1', '--NumOpsPerThread=400', '--iterations=10', '--SrcUSM=1', '--DstUSM=1']' died with <Signals.SIGSEGV: 11>.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur', '--test=MemcpyExecute', '--csv', '--noHeaders', '--Ioq=1', '--UseEvents=1', '--MeasureCompletion=1', '--UseQueuePerThread=1', '--AllocSize=102400', '--NumThreads=1', '--NumOpsPerThread=400', '--iterations=10', '--SrcUSM=0', '--DstUSM=1']' died with <Signals.SIGSEGV: 11>.
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur', '--test=MemcpyExecute', '--csv', '--noHeaders', '--Ioq=1', '--UseEvents=0', '--MeasureCompletion=1', '--UseQueuePerThread=1', '--AllocSize=1024', '--NumThreads=4', '--NumOpsPerThread=4096', '--iterations=10', '--SrcUSM=0', '--DstUSM=1']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:Both Benchmark run failure: Error parsing output: list index out of range
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:Both Benchmark run failure: Error parsing output: list index out of range
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:Both Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=UsmBatchMemoryAllocation', '--csv', '--noHeaders', '--type=Device', '--allocationCount=256', '--size=4096', '--measureMode=Both', '--iterations=1000']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:Both Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=UsmBatchMemoryAllocation', '--csv', '--noHeaders', '--type=Device', '--allocationCount=32', '--size=4194304', '--measureMode=Both', '--iterations=1000']' died with <Signals.SIGSEGV: 11>.
api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniform Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur', '--test=UsmRandomMemoryAllocation', '--csv', '--noHeaders', '--type=Device', '--operationCount=256', '--minSize=4096', '--maxSize=33554432', '--sizeDistribution=LogUniform', '--iterations=1000']' died with <Signals.SIGSEGV: 11>.
Velocity-Bench Hashtable Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl', '--no-verify']' returned non-zero exit status 1.
Velocity-Bench Bitcracker Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/bitcracker/bitcracker', '-f', '/home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt', '-d', '/home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt', '-b', '60000']' died with <Signals.SIGABRT: 6>.
Velocity-Bench CudaSift Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/cudaSift/cudaSift']' died with <Signals.SIGABRT: 6>.
Velocity-Bench Easywave Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/easywave/easyWave_sycl', '-grid', '/home/test-user/llvm_bench_workdir/data/easywave/examples/e2Asean.grd', '-source', '/home/test-user/llvm_bench_workdir/data/easywave/examples/BengkuluSept2007.flt', '-time', '120']' died with <Signals.SIGABRT: 6>.
Velocity-Bench QuickSilver Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/QuickSilver/qs', '-i', '/home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp']' died with <Signals.SIGABRT: 6>.
Velocity-Bench Sobel Filter Benchmark run failure: {self.class.name}: Failed to parse benchmark output.
Velocity-Bench dl-cifar Benchmark run failure: Failed to parse benchmark output.
Velocity-Bench dl-mnist Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl', '-conv_algo', 'ONEDNN_AUTO']' died with <Signals.SIGABRT: 6>.
Velocity-Bench svm Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/svm/svm_sycl', '/home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a', '/home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m']' died with <Signals.SIGABRT: 6>.

Summary

(Emphasized values are the best results)
No diffs to calculate performance change

Performance change in benchmark groups

Compute Benchmarks
Relative perf in group SubmitKernel (2)
Benchmark This PR
api_overhead_benchmark_l0 SubmitKernel out of order 12.115000 μs
api_overhead_benchmark_l0 SubmitKernel in order 12.149000 μs
Relative perf in group SubmitKernel With Completion (2)
Benchmark This PR
api_overhead_benchmark_l0 SubmitKernel out of order with measure completion 15.858000 μs
api_overhead_benchmark_l0 SubmitKernel in order with measure completion 18.904000 μs
Relative perf in group SinKernelGraph 5 (2)
Benchmark This PR
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5 26.472000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5 28.980000 μs
Relative perf in group SinKernelGraph 100 (2)
Benchmark This PR
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100 243.046000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100 255.722000 μs
Relative perf in group EmptyKernel 1000 256 (1)
Benchmark This PR
ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256 4.192000 μs
Relative perf in group KernelSwitch 8 200 (1)
Benchmark This PR
ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200 0.982000 μs

Details

Benchmark details - environment, command...
api_overhead_benchmark_l0 SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256

ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1

Copy link
Contributor

Compute Benchmarks level_zero run (with params: ):
https://github.com/intel/llvm/actions/runs/14173987893

Copy link
Contributor

Benchmarks level_zero run ():
https://github.com/intel/llvm/actions/runs/14173987893
Job status: success. Test status: success.

Failures

Name Failure
SYCL-Bench Suite setup failure: Command '['git', 'checkout', '31fc70be6266193c4ba60eb1fe3ce26edee4ca5b']' returned non-zero exit status 128.
llama.cpp bench Suite setup failure: Command '['cmake', '--build', '/home/test-user/llvm_bench_workdir/llamacpp-build', '-j', '120']' returned non-zero exit status 2.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
Velocity-Bench Easywave Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/easywave/easyWave_sycl', '-grid', '/home/test-user/llvm_bench_workdir/data/easywave/examples/e2Asean.grd', '-source', '/home/test-user/llvm_bench_workdir/data/easywave/examples/BengkuluSept2007.flt', '-time', '120']' returned non-zero exit status 2.

Summary

(Emphasized values are the best results)
No diffs to calculate performance change

Performance change in benchmark groups

Compute Benchmarks
Relative perf in group SubmitKernel Out Of Order (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel out of order 21.928000 μs
api_overhead_benchmark_l0 SubmitKernel out of order 11.808000 μs
api_overhead_benchmark_ur SubmitKernel out of order 16.230000 μs
Relative perf in group SubmitKernel Out Of Order With Completion (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel out of order with measure completion 26.621000 μs
api_overhead_benchmark_l0 SubmitKernel out of order with measure completion 15.498000 μs
api_overhead_benchmark_ur SubmitKernel out of order with measure completion 20.386000 μs
Relative perf in group SubmitKernel In Order (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel in order 23.189000 μs
api_overhead_benchmark_l0 SubmitKernel in order 12.181000 μs
api_overhead_benchmark_ur SubmitKernel in order 17.030000 μs
Relative perf in group SubmitKernel In Order With Completion (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel in order with measure completion 27.855000 μs
api_overhead_benchmark_l0 SubmitKernel in order with measure completion 19.069000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion 21.448000 μs
Relative perf in group SubmitKernel Out Of Order CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel out of order CPU count 107464.000000 instr
Relative perf in group SubmitKernel Out Of Order With Completion CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU count 136178.000000 instr
Relative perf in group SubmitKernel In Order CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel in order CPU count 113318.000000 instr
Relative perf in group SubmitKernel In Order With Completion CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count 125959.000000 instr
Relative perf in group SinKernelGraph 5 (5)
Benchmark This PR
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5 29.483000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5 25.956000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5 28.942000 μs
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5 33.130000 μs
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5 52.284000 μs
Relative perf in group SinKernelGraph 100 (5)
Benchmark This PR
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 283.154000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100 244.672000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100 246.839000 μs
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100 271.262000 μs
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100 311.316000 μs
Relative perf in group EmptyKernel 1000 256 (2)
Benchmark This PR
ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256 5.572000 μs
ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256 4.237000 μs
Relative perf in group KernelSwitch 8 200 (2)
Benchmark This PR
ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200 0.640000 μs
ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200 1.005000 μs
Relative perf in group SubmitGraph 4 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0 6.566000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1 30.881000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0 6.894000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1 38.459000 μs
Relative perf in group SubmitGraph 10 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0 6.585000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1 33.495000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0 6.839000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1 57.314000 μs
Relative perf in group SubmitGraph 32 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0 6.368000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1 42.383000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0 6.799000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1 113.908000 μs
Relative perf in group Other (10)
Benchmark This PR
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 251.753000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 125.443000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.738000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.295000 GB/s
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.117000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.633000 μs
miscellaneous_benchmark_sycl VectorSum 863.322000 bw GB/s
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 6910.311000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 7550.835000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events 116780.061000 μs
Relative perf in group UsmMemoryAllocation Device 4096 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:Both 0.169000 μs
Relative perf in group UsmMemoryAllocation Device 4194304 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:Both 11.178000 μs
Relative perf in group UsmBatchMemoryAllocation Device 256 4096 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:Both 187.510000 μs
Relative perf in group UsmBatchMemoryAllocation Device 32 4194304 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:Both 367.409000 μs
Relative perf in group UsmRandomMemoryAllocation Device 256 4096 33554432 LogUniform (1)
Benchmark This PR
api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniform 0.322000 μs
Velocity Bench
Relative perf in group Other (8)
Benchmark This PR
Velocity-Bench Hashtable 337.947286 M keys/sec
Velocity-Bench Bitcracker 35.376300 s
Velocity-Bench CudaSift 206.691000 ms
Velocity-Bench QuickSilver 117.560000 MMS/CTT
Velocity-Bench Sobel Filter 671.033000 ms
Velocity-Bench dl-cifar 23.739300 s
Velocity-Bench dl-mnist 2.670000 s
Velocity-Bench svm 0.150300 s

Details

Benchmark details - environment, command...
api_overhead_benchmark_sycl SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256

ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1

ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256

ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1 --vectorSize=1

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

miscellaneous_benchmark_sycl VectorSum

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4096 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4194304 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=256 --size=4096 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=32 --size=4194304 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniform

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmRandomMemoryAllocation --csv --noHeaders --type=Device --operationCount=256 --minSize=4096 --maxSize=33554432 --sizeDistribution=LogUniform --iterations=1000

Velocity-Bench Hashtable

Command:

/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl --no-verify

Velocity-Bench Bitcracker

Command:

/home/test-user/llvm_bench_workdir/bitcracker/bitcracker -f /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Velocity-Bench CudaSift

Command:

/home/test-user/llvm_bench_workdir/cudaSift/cudaSift

Velocity-Bench QuickSilver

Command:

/home/test-user/llvm_bench_workdir/QuickSilver/qs -i /home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Environment Variables:

QS_DEVICE=GPU

Velocity-Bench Sobel Filter

Command:

/home/test-user/llvm_bench_workdir/sobel_filter/sobel_filter -i /home/test-user/llvm_bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Velocity-Bench dl-cifar

Command:

/home/test-user/llvm_bench_workdir/dl-cifar/dl-cifar_sycl

Velocity-Bench dl-mnist

Command:

/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Velocity-Bench svm

Command:

/home/test-user/llvm_bench_workdir/svm/svm_sycl /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Copy link
Contributor

Compute Benchmarks level_zero run (with params: ):
https://github.com/intel/llvm/actions/runs/14174478553

Copy link
Contributor

Benchmarks level_zero run ():
https://github.com/intel/llvm/actions/runs/14174478553
Job status: success. Test status: success.

Failures

Name Failure
SYCL-Bench Suite setup failure: Command '['git', 'checkout', '31fc70be6266193c4ba60eb1fe3ce26edee4ca5b']' returned non-zero exit status 128.
llama.cpp bench Suite setup failure: Command '['cmake', '--build', '/home/test-user/llvm_bench_workdir/llamacpp-build', '-j', '120']' returned non-zero exit status 2.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
Velocity-Bench Easywave Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/easywave/easyWave_sycl', '-grid', '/home/test-user/llvm_bench_workdir/data/easywave/examples/e2Asean.grd', '-source', '/home/test-user/llvm_bench_workdir/data/easywave/examples/BengkuluSept2007.flt', '-time', '120']' returned non-zero exit status 2.

Summary

(Emphasized values are the best results)
No diffs to calculate performance change

Performance change in benchmark groups

Compute Benchmarks
Relative perf in group SubmitKernel Out Of Order (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel out of order 22.331000 μs
api_overhead_benchmark_l0 SubmitKernel out of order 11.995000 μs
api_overhead_benchmark_ur SubmitKernel out of order 16.397000 μs
Relative perf in group SubmitKernel Out Of Order With Completion (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel out of order with measure completion 27.004000 μs
api_overhead_benchmark_l0 SubmitKernel out of order with measure completion 15.655000 μs
api_overhead_benchmark_ur SubmitKernel out of order with measure completion 20.704000 μs
Relative perf in group SubmitKernel In Order (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel in order 23.320000 μs
api_overhead_benchmark_l0 SubmitKernel in order 11.796000 μs
api_overhead_benchmark_ur SubmitKernel in order 16.994000 μs
Relative perf in group SubmitKernel In Order With Completion (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel in order with measure completion 27.749000 μs
api_overhead_benchmark_l0 SubmitKernel in order with measure completion 18.295000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion 21.841000 μs
Relative perf in group SubmitKernel Out Of Order CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel out of order CPU count 107464.000000 instr
Relative perf in group SubmitKernel Out Of Order With Completion CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU count 135416.000000 instr
Relative perf in group SubmitKernel In Order CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel in order CPU count 113318.000000 instr
Relative perf in group SubmitKernel In Order With Completion CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count 125959.000000 instr
Relative perf in group SinKernelGraph 5 (5)
Benchmark This PR
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5 28.670000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5 25.997000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5 28.780000 μs
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5 33.090000 μs
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5 51.922000 μs
Relative perf in group SinKernelGraph 100 (5)
Benchmark This PR
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 284.659000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100 249.973000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100 248.391000 μs
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100 271.065000 μs
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100 310.886000 μs
Relative perf in group EmptyKernel 1000 256 (2)
Benchmark This PR
ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256 5.713000 μs
ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256 4.324000 μs
Relative perf in group KernelSwitch 8 200 (2)
Benchmark This PR
ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200 0.594000 μs
ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200 1.028000 μs
Relative perf in group SubmitGraph 4 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0 6.425000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1 31.711000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0 7.232000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1 37.906000 μs
Relative perf in group SubmitGraph 10 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0 6.412000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1 33.471000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0 6.824000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1 55.726000 μs
Relative perf in group SubmitGraph 32 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0 6.445000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1 43.022000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0 7.169000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1 112.710000 μs
Relative perf in group Other (10)
Benchmark This PR
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 252.026000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 123.053000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.721000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.274000 GB/s
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.109000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.649000 μs
miscellaneous_benchmark_sycl VectorSum 860.076000 bw GB/s
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 6927.450000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 7504.809000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events 116312.918000 μs
Relative perf in group UsmMemoryAllocation Device 4096 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:Both 0.170000 μs
Relative perf in group UsmMemoryAllocation Device 4194304 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:Both 11.008000 μs
Relative perf in group UsmBatchMemoryAllocation Device 256 4096 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:Both 187.833000 μs
Relative perf in group UsmBatchMemoryAllocation Device 32 4194304 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:Both 361.999000 μs
Relative perf in group UsmRandomMemoryAllocation Device 256 4096 33554432 LogUniform (1)
Benchmark This PR
api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniform 0.309000 μs
Velocity Bench
Relative perf in group Other (8)
Benchmark This PR
Velocity-Bench Hashtable 350.208307 M keys/sec
Velocity-Bench Bitcracker 35.377700 s
Velocity-Bench CudaSift 207.485000 ms
Velocity-Bench QuickSilver 118.400000 MMS/CTT
Velocity-Bench Sobel Filter 633.329000 ms
Velocity-Bench dl-cifar 23.691000 s
Velocity-Bench dl-mnist 2.640000 s
Velocity-Bench svm 0.149400 s

Details

Benchmark details - environment, command...
api_overhead_benchmark_sycl SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256

ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1

ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256

ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1 --vectorSize=1

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

miscellaneous_benchmark_sycl VectorSum

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4096 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4194304 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=256 --size=4096 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=32 --size=4194304 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniform

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmRandomMemoryAllocation --csv --noHeaders --type=Device --operationCount=256 --minSize=4096 --maxSize=33554432 --sizeDistribution=LogUniform --iterations=1000

Velocity-Bench Hashtable

Command:

/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl --no-verify

Velocity-Bench Bitcracker

Command:

/home/test-user/llvm_bench_workdir/bitcracker/bitcracker -f /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Velocity-Bench CudaSift

Command:

/home/test-user/llvm_bench_workdir/cudaSift/cudaSift

Velocity-Bench QuickSilver

Command:

/home/test-user/llvm_bench_workdir/QuickSilver/qs -i /home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Environment Variables:

QS_DEVICE=GPU

Velocity-Bench Sobel Filter

Command:

/home/test-user/llvm_bench_workdir/sobel_filter/sobel_filter -i /home/test-user/llvm_bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Velocity-Bench dl-cifar

Command:

/home/test-user/llvm_bench_workdir/dl-cifar/dl-cifar_sycl

Velocity-Bench dl-mnist

Command:

/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Velocity-Bench svm

Command:

/home/test-user/llvm_bench_workdir/svm/svm_sycl /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants