Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compute USM allocation benchmarks #17623

Open
wants to merge 1 commit into
base: unify-benchmark-ci
Choose a base branch
from

Conversation

staniewzki
Copy link
Contributor

This PR introduces USM memory allocation benchmark scenarios to compute benchmark suite.

@staniewzki staniewzki requested a review from a team as a code owner March 25, 2025 00:05
@staniewzki staniewzki force-pushed the benchmark-compute-usm branch 2 times, most recently from 6f63c11 to 7f13420 Compare March 25, 2025 00:51
@pbalcer
Copy link
Contributor

pbalcer commented Mar 25, 2025

Let's wait until #17617 is merged.

)

def explicit_group(self):
return f"UsmMemoryAllocation {self.usm_memory_placement} {self.size} {self.measure_mode}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the results of all these benchmarks? Can't we use a single explicit group for all UsmMemoryAllocation?
The primary consideration is whether the results are within similar range, so that the bar charts look ok.

@@ -175,6 +175,11 @@ def benchmarks(self) -> list[Benchmark]:
MemcpyExecute(self, 400, 1, 102400, 10, 1, 1, 1),
MemcpyExecute(self, 400, 1, 102400, 10, 0, 1, 1),
MemcpyExecute(self, 4096, 4, 1024, 10, 0, 1, 0),
UsmMemoryAllocation(self, RUNTIMES.UR, "Device", 4 * 1024, "Both"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long does it all take?

f"--type={self.usm_memory_placement}",
f"--size={self.size}",
f"--measureMode={self.measure_mode}",
"--iterations=1000",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these benchmarks stable with 1000 iterations?

Comment on lines +180 to +182
UsmBatchMemoryAllocation(self, RUNTIMES.UR, "Device", 256, 4 * 1024, "Both"),
UsmBatchMemoryAllocation(self, RUNTIMES.UR, "Device", 32, 4 * 1024 * 1024, "Both"),
UsmRandomMemoryAllocation(self, RUNTIMES.UR, "Device", 256, 4 * 1024, 32 * 1024 * 1024, "LogUniform"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4kb and 4mb is a lot. I think we should be testing something small (like 256 bytes), medium-ish (16Kb), and large (over 64kb, like 512Kb).

Copy link
Contributor

Compute Benchmarks level_zero run (with params: ):
https://github.com/intel/llvm/actions/runs/14057378417

@pbalcer
Copy link
Contributor

pbalcer commented Mar 25, 2025

This is missing title tags - [CI][Benchmark].

Copy link
Contributor

Benchmarks level_zero run ():
https://github.com/intel/llvm/actions/runs/14057378417
Job status: success. Test status: success.

Failures

Name Failure
SYCL-Bench Suite setup failure: Command '['git', 'checkout', '31fc70be6266193c4ba60eb1fe3ce26edee4ca5b']' returned non-zero exit status 128.
llama.cpp bench Suite setup failure: Command '['cmake', '--build', '/home/test-user/llvm_bench_workdir/llamacpp-build', '-j', '120']' returned non-zero exit status 2.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
Velocity-Bench Easywave Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/easywave/easyWave_sycl', '-grid', '/home/test-user/llvm_bench_workdir/data/easywave/examples/e2Asean.grd', '-source', '/home/test-user/llvm_bench_workdir/data/easywave/examples/BengkuluSept2007.flt', '-time', '120']' returned non-zero exit status 2.

Summary

(Emphasized values are the best results)
No diffs to calculate performance change

Performance change in benchmark groups

Compute Benchmarks
Relative perf in group SubmitKernel (6)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel out of order 22.101000 μs
api_overhead_benchmark_sycl SubmitKernel in order 22.623000 μs
api_overhead_benchmark_l0 SubmitKernel out of order 11.942000 μs
api_overhead_benchmark_l0 SubmitKernel in order 11.853000 μs
api_overhead_benchmark_ur SubmitKernel out of order 16.184000 μs
api_overhead_benchmark_ur SubmitKernel in order 17.246000 μs
Relative perf in group SubmitKernel With Completion (6)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel out of order with measure completion 26.567000 μs
api_overhead_benchmark_sycl SubmitKernel in order with measure completion 27.599000 μs
api_overhead_benchmark_l0 SubmitKernel out of order with measure completion 15.353000 μs
api_overhead_benchmark_l0 SubmitKernel in order with measure completion 18.551000 μs
api_overhead_benchmark_ur SubmitKernel out of order with measure completion 20.252000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion 21.697000 μs
Relative perf in group SubmitKernel CPU count (2)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel out of order CPU count 107464.000000 instr
api_overhead_benchmark_ur SubmitKernel in order CPU count 113318.000000 instr
Relative perf in group SubmitKernel With Completion CPU count (2)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU count 135543.000000 instr
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count 125959.000000 instr
Relative perf in group SinKernelGraph 5 (5)
Benchmark This PR
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5 29.025000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5 25.830000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5 28.779000 μs
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5 33.294000 μs
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5 51.096000 μs
Relative perf in group SinKernelGraph 100 (5)
Benchmark This PR
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 283.426000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100 247.305000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100 251.040000 μs
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100 271.349000 μs
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100 311.997000 μs
Relative perf in group EmptyKernel 1000 256 (2)
Benchmark This PR
ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256 5.622000 μs
ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256 4.345000 μs
Relative perf in group KernelSwitch 8 200 (2)
Benchmark This PR
ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200 0.617000 μs
ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200 1.005000 μs
Relative perf in group SubmitGraph 4 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0 6.303000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1 31.040000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0 6.896000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1 38.462000 μs
Relative perf in group SubmitGraph 10 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0 6.399000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1 32.688000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0 7.061000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1 55.259000 μs
Relative perf in group SubmitGraph 32 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0 6.332000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1 43.885000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0 6.713000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1 113.466000 μs
Relative perf in group Other (10)
Benchmark This PR
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 250.712000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 123.284000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.660000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.296000 GB/s
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.140000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.610000 μs
miscellaneous_benchmark_sycl VectorSum 860.370000 bw GB/s
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 6924.114000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 7490.823000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events 115650.166000 μs
Relative perf in group UsmMemoryAllocation Device 4096 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:Both 0.148000 μs
Relative perf in group UsmMemoryAllocation Device 4194304 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:Both 10.565000 μs
Relative perf in group UsmBatchMemoryAllocation Device 256 4096 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:Both 188.928000 μs
Relative perf in group UsmBatchMemoryAllocation Device 32 4194304 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:Both 354.462000 μs
Relative perf in group UsmRandomMemoryAllocation Device 256 4096 33554432 LogUniform (1)
Benchmark This PR
api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniform 0.303000 μs
Velocity Bench
Relative perf in group Other (8)
Benchmark This PR
Velocity-Bench Hashtable 316.183280 M keys/sec
Velocity-Bench Bitcracker 35.391200 s
Velocity-Bench CudaSift 206.614000 ms
Velocity-Bench QuickSilver 117.840000 MMS/CTT
Velocity-Bench Sobel Filter 723.981000 ms
Velocity-Bench dl-cifar 24.231700 s
Velocity-Bench dl-mnist 2.700000 s
Velocity-Bench svm 0.151200 s

Details

Benchmark details - environment, command...
api_overhead_benchmark_sycl SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256

ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1

ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256

ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1 --vectorSize=1

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

miscellaneous_benchmark_sycl VectorSum

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4096 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4194304 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=256 --size=4096 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=32 --size=4194304 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniform

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmRandomMemoryAllocation --csv --noHeaders --type=Device --operationCount=256 --minSize=4096 --maxSize=33554432 --sizeDistribution=LogUniform --iterations=1000

Velocity-Bench Hashtable

Command:

/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl --no-verify

Velocity-Bench Bitcracker

Command:

/home/test-user/llvm_bench_workdir/bitcracker/bitcracker -f /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Velocity-Bench CudaSift

Command:

/home/test-user/llvm_bench_workdir/cudaSift/cudaSift

Velocity-Bench QuickSilver

Command:

/home/test-user/llvm_bench_workdir/QuickSilver/qs -i /home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Environment Variables:

QS_DEVICE=GPU

Velocity-Bench Sobel Filter

Command:

/home/test-user/llvm_bench_workdir/sobel_filter/sobel_filter -i /home/test-user/llvm_bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Velocity-Bench dl-cifar

Command:

/home/test-user/llvm_bench_workdir/dl-cifar/dl-cifar_sycl

Velocity-Bench dl-mnist

Command:

/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Velocity-Bench svm

Command:

/home/test-user/llvm_bench_workdir/svm/svm_sycl /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Copy link
Contributor

Compute Benchmarks level_zero run (with params: ):
https://github.com/intel/llvm/actions/runs/14059995091

Copy link
Contributor

Benchmarks level_zero run ():
https://github.com/intel/llvm/actions/runs/14059995091
Job status: success. Test status: success.

Failures

Name Failure
SYCL-Bench Suite setup failure: Command '['git', 'checkout', '31fc70be6266193c4ba60eb1fe3ce26edee4ca5b']' returned non-zero exit status 128.
llama.cpp bench Suite setup failure: Command '['cmake', '--build', '/home/test-user/llvm_bench_workdir/llamacpp-build', '-j', '120']' returned non-zero exit status 2.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
Velocity-Bench Easywave Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/easywave/easyWave_sycl', '-grid', '/home/test-user/llvm_bench_workdir/data/easywave/examples/e2Asean.grd', '-source', '/home/test-user/llvm_bench_workdir/data/easywave/examples/BengkuluSept2007.flt', '-time', '120']' returned non-zero exit status 2.

Summary

(Emphasized values are the best results)
No diffs to calculate performance change

Performance change in benchmark groups

Compute Benchmarks
Relative perf in group SubmitKernel (6)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel out of order 22.286000 μs
api_overhead_benchmark_sycl SubmitKernel in order 22.725000 μs
api_overhead_benchmark_l0 SubmitKernel out of order 11.843000 μs
api_overhead_benchmark_l0 SubmitKernel in order 12.125000 μs
api_overhead_benchmark_ur SubmitKernel out of order 16.421000 μs
api_overhead_benchmark_ur SubmitKernel in order 17.140000 μs
Relative perf in group SubmitKernel With Completion (6)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel out of order with measure completion 26.661000 μs
api_overhead_benchmark_sycl SubmitKernel in order with measure completion 27.730000 μs
api_overhead_benchmark_l0 SubmitKernel out of order with measure completion 15.483000 μs
api_overhead_benchmark_l0 SubmitKernel in order with measure completion 18.239000 μs
api_overhead_benchmark_ur SubmitKernel out of order with measure completion 20.544000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion 21.534000 μs
Relative perf in group SubmitKernel CPU count (2)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel out of order CPU count 107464.000000 instr
api_overhead_benchmark_ur SubmitKernel in order CPU count 113318.000000 instr
Relative perf in group SubmitKernel With Completion CPU count (2)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU count 135797.000000 instr
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count 125959.000000 instr
Relative perf in group SinKernelGraph 5 (5)
Benchmark This PR
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5 29.155000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5 25.627000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5 28.275000 μs
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5 33.104000 μs
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5 52.046000 μs
Relative perf in group SinKernelGraph 100 (5)
Benchmark This PR
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 285.768000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100 250.225000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100 249.410000 μs
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100 271.501000 μs
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100 313.362000 μs
Relative perf in group EmptyKernel 1000 256 (2)
Benchmark This PR
ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256 5.664000 μs
ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256 4.237000 μs
Relative perf in group KernelSwitch 8 200 (2)
Benchmark This PR
ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200 0.640000 μs
ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200 1.005000 μs
Relative perf in group SubmitGraph 4 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0 6.475000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1 31.688000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0 6.756000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1 38.559000 μs
Relative perf in group SubmitGraph 10 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0 6.507000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1 33.529000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0 7.043000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1 56.008000 μs
Relative perf in group SubmitGraph 32 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0 6.369000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1 43.097000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0 7.028000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1 115.216000 μs
Relative perf in group Other (10)
Benchmark This PR
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 253.043000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 123.864000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.648000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.263000 GB/s
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.121000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.638000 μs
miscellaneous_benchmark_sycl VectorSum 863.618000 bw GB/s
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 6929.863000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 7488.907000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events 116340.503000 μs
Relative perf in group UsmMemoryAllocation Device 4096 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:Both 0.151000 μs
Relative perf in group UsmMemoryAllocation Device 4194304 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:Both 11.021000 μs
Relative perf in group UsmBatchMemoryAllocation Device 256 4096 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:Both 191.010000 μs
Relative perf in group UsmBatchMemoryAllocation Device 32 4194304 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:Both 358.759000 μs
Relative perf in group UsmRandomMemoryAllocation Device 256 4096 33554432 LogUniform (1)
Benchmark This PR
api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniform 0.306000 μs
Velocity Bench
Relative perf in group Other (8)
Benchmark This PR
Velocity-Bench Hashtable 322.728173 M keys/sec
Velocity-Bench Bitcracker 35.349600 s
Velocity-Bench CudaSift 206.799000 ms
Velocity-Bench QuickSilver 117.920000 MMS/CTT
Velocity-Bench Sobel Filter 696.402000 ms
Velocity-Bench dl-cifar 24.069300 s
Velocity-Bench dl-mnist 2.650000 s
Velocity-Bench svm 0.151100 s

Details

Benchmark details - environment, command...
api_overhead_benchmark_sycl SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256

ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1

ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256

ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1 --vectorSize=1

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

miscellaneous_benchmark_sycl VectorSum

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4096 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4194304 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=256 --size=4096 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=32 --size=4194304 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniform

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmRandomMemoryAllocation --csv --noHeaders --type=Device --operationCount=256 --minSize=4096 --maxSize=33554432 --sizeDistribution=LogUniform --iterations=1000

Velocity-Bench Hashtable

Command:

/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl --no-verify

Velocity-Bench Bitcracker

Command:

/home/test-user/llvm_bench_workdir/bitcracker/bitcracker -f /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Velocity-Bench CudaSift

Command:

/home/test-user/llvm_bench_workdir/cudaSift/cudaSift

Velocity-Bench QuickSilver

Command:

/home/test-user/llvm_bench_workdir/QuickSilver/qs -i /home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Environment Variables:

QS_DEVICE=GPU

Velocity-Bench Sobel Filter

Command:

/home/test-user/llvm_bench_workdir/sobel_filter/sobel_filter -i /home/test-user/llvm_bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Velocity-Bench dl-cifar

Command:

/home/test-user/llvm_bench_workdir/dl-cifar/dl-cifar_sycl

Velocity-Bench dl-mnist

Command:

/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Velocity-Bench svm

Command:

/home/test-user/llvm_bench_workdir/svm/svm_sycl /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Copy link
Contributor

Compute Benchmarks level_zero run (with params: ):
https://github.com/intel/llvm/actions/runs/14174461678

Copy link
Contributor

Benchmarks level_zero run ():
https://github.com/intel/llvm/actions/runs/14174461678
Job status: success. Test status: success.

Failures

Name Failure
SYCL-Bench Suite setup failure: Command '['git', 'checkout', '31fc70be6266193c4ba60eb1fe3ce26edee4ca5b']' returned non-zero exit status 128.
llama.cpp bench Suite setup failure: Command '['cmake', '--build', '/home/test-user/llvm_bench_workdir/llamacpp-build', '-j', '120']' returned non-zero exit status 2.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
Velocity-Bench Easywave Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/easywave/easyWave_sycl', '-grid', '/home/test-user/llvm_bench_workdir/data/easywave/examples/e2Asean.grd', '-source', '/home/test-user/llvm_bench_workdir/data/easywave/examples/BengkuluSept2007.flt', '-time', '120']' returned non-zero exit status 2.

Summary

(Emphasized values are the best results)
No diffs to calculate performance change

Performance change in benchmark groups

Compute Benchmarks
Relative perf in group SubmitKernel Out Of Order (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel out of order 21.953000 μs
api_overhead_benchmark_l0 SubmitKernel out of order 11.780000 μs
api_overhead_benchmark_ur SubmitKernel out of order 16.230000 μs
Relative perf in group SubmitKernel Out Of Order With Completion (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel out of order with measure completion 26.481000 μs
api_overhead_benchmark_l0 SubmitKernel out of order with measure completion 15.368000 μs
api_overhead_benchmark_ur SubmitKernel out of order with measure completion 20.175000 μs
Relative perf in group SubmitKernel In Order (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel in order 22.769000 μs
api_overhead_benchmark_l0 SubmitKernel in order 11.759000 μs
api_overhead_benchmark_ur SubmitKernel in order 17.108000 μs
Relative perf in group SubmitKernel In Order With Completion (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel in order with measure completion 27.319000 μs
api_overhead_benchmark_l0 SubmitKernel in order with measure completion 19.033000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion 21.540000 μs
Relative perf in group SubmitKernel Out Of Order CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel out of order CPU count 107464.000000 instr
Relative perf in group SubmitKernel Out Of Order With Completion CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU count 135670.000000 instr
Relative perf in group SubmitKernel In Order CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel in order CPU count 113318.000000 instr
Relative perf in group SubmitKernel In Order With Completion CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count 125959.000000 instr
Relative perf in group SinKernelGraph 5 (5)
Benchmark This PR
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5 29.308000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5 25.523000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5 28.632000 μs
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5 33.334000 μs
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5 51.293000 μs
Relative perf in group SinKernelGraph 100 (5)
Benchmark This PR
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 285.449000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100 250.149000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100 252.357000 μs
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100 271.083000 μs
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100 310.757000 μs
Relative perf in group EmptyKernel 1000 256 (2)
Benchmark This PR
ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256 5.614000 μs
ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256 4.274000 μs
Relative perf in group KernelSwitch 8 200 (2)
Benchmark This PR
ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200 0.617000 μs
ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200 1.051000 μs
Relative perf in group SubmitGraph 4 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0 6.457000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1 31.530000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0 6.713000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1 38.035000 μs
Relative perf in group SubmitGraph 10 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0 6.383000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1 34.049000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0 6.751000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1 55.695000 μs
Relative perf in group SubmitGraph 32 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0 6.340000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1 43.034000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0 6.846000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1 114.104000 μs
Relative perf in group Other (10)
Benchmark This PR
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 251.932000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 123.361000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.690000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.284000 GB/s
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.084000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.596000 μs
miscellaneous_benchmark_sycl VectorSum 858.609000 bw GB/s
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 6930.803000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 7509.967000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events 115152.397000 μs
Relative perf in group UsmMemoryAllocation Device 4096 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:Both 0.148000 μs
Relative perf in group UsmMemoryAllocation Device 4194304 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:Both 10.942000 μs
Relative perf in group UsmBatchMemoryAllocation Device 256 4096 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:Both 180.060000 μs
Relative perf in group UsmBatchMemoryAllocation Device 32 4194304 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:Both 348.994000 μs
Relative perf in group UsmRandomMemoryAllocation Device 256 4096 33554432 LogUniform (1)
Benchmark This PR
api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniform 0.290000 μs
Velocity Bench
Relative perf in group Other (8)
Benchmark This PR
Velocity-Bench Hashtable 331.390205 M keys/sec
Velocity-Bench Bitcracker 35.423000 s
Velocity-Bench CudaSift 206.615000 ms
Velocity-Bench QuickSilver 117.570000 MMS/CTT
Velocity-Bench Sobel Filter 670.659000 ms
Velocity-Bench dl-cifar 24.117600 s
Velocity-Bench dl-mnist 2.680000 s
Velocity-Bench svm 0.151300 s

Details

Benchmark details - environment, command...
api_overhead_benchmark_sycl SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256

ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1

ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256

ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1 --vectorSize=1

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

miscellaneous_benchmark_sycl VectorSum

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4096 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4194304 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=256 --size=4096 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=32 --size=4194304 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniform

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmRandomMemoryAllocation --csv --noHeaders --type=Device --operationCount=256 --minSize=4096 --maxSize=33554432 --sizeDistribution=LogUniform --iterations=1000

Velocity-Bench Hashtable

Command:

/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl --no-verify

Velocity-Bench Bitcracker

Command:

/home/test-user/llvm_bench_workdir/bitcracker/bitcracker -f /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Velocity-Bench CudaSift

Command:

/home/test-user/llvm_bench_workdir/cudaSift/cudaSift

Velocity-Bench QuickSilver

Command:

/home/test-user/llvm_bench_workdir/QuickSilver/qs -i /home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Environment Variables:

QS_DEVICE=GPU

Velocity-Bench Sobel Filter

Command:

/home/test-user/llvm_bench_workdir/sobel_filter/sobel_filter -i /home/test-user/llvm_bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Velocity-Bench dl-cifar

Command:

/home/test-user/llvm_bench_workdir/dl-cifar/dl-cifar_sycl

Velocity-Bench dl-mnist

Command:

/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Velocity-Bench svm

Command:

/home/test-user/llvm_bench_workdir/svm/svm_sycl /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Copy link
Contributor

Compute Benchmarks level_zero run (with params: ):
https://github.com/intel/llvm/actions/runs/14174471047

Copy link
Contributor

Benchmarks level_zero run ():
https://github.com/intel/llvm/actions/runs/14174471047
Job status: success. Test status: success.

Failures

Name Failure
SYCL-Bench Suite setup failure: Command '['git', 'checkout', '31fc70be6266193c4ba60eb1fe3ce26edee4ca5b']' returned non-zero exit status 128.
llama.cpp bench Suite setup failure: Command '['cmake', '--build', '/home/test-user/llvm_bench_workdir/llamacpp-build', '-j', '120']' returned non-zero exit status 2.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:5 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=5', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl', '--test=SinKernelGraph', '--csv', '--noHeaders', '--iterations=10000', '--numKernels=100', '--withGraphs=1', '--withCopyOffload=1', '--immediateAppendCmdList=0']' died with <Signals.SIGABRT: 6>.
Velocity-Bench Easywave Benchmark run failure: Command '['/home/test-user/llvm_bench_workdir/easywave/easyWave_sycl', '-grid', '/home/test-user/llvm_bench_workdir/data/easywave/examples/e2Asean.grd', '-source', '/home/test-user/llvm_bench_workdir/data/easywave/examples/BengkuluSept2007.flt', '-time', '120']' returned non-zero exit status 2.

Summary

(Emphasized values are the best results)
No diffs to calculate performance change

Performance change in benchmark groups

Compute Benchmarks
Relative perf in group SubmitKernel Out Of Order (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel out of order 22.481000 μs
api_overhead_benchmark_l0 SubmitKernel out of order 11.817000 μs
api_overhead_benchmark_ur SubmitKernel out of order 17.079000 μs
Relative perf in group SubmitKernel Out Of Order With Completion (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel out of order with measure completion 27.243000 μs
api_overhead_benchmark_l0 SubmitKernel out of order with measure completion 15.988000 μs
api_overhead_benchmark_ur SubmitKernel out of order with measure completion 20.964000 μs
Relative perf in group SubmitKernel In Order (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel in order 22.659000 μs
api_overhead_benchmark_l0 SubmitKernel in order 11.776000 μs
api_overhead_benchmark_ur SubmitKernel in order 17.719000 μs
Relative perf in group SubmitKernel In Order With Completion (3)
Benchmark This PR
api_overhead_benchmark_sycl SubmitKernel in order with measure completion 27.331000 μs
api_overhead_benchmark_l0 SubmitKernel in order with measure completion 18.521000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion 21.648000 μs
Relative perf in group SubmitKernel Out Of Order CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel out of order CPU count 107464.000000 instr
Relative perf in group SubmitKernel Out Of Order With Completion CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU count 135797.000000 instr
Relative perf in group SubmitKernel In Order CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel in order CPU count 113318.000000 instr
Relative perf in group SubmitKernel In Order With Completion CPU count (1)
Benchmark This PR
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count 125959.000000 instr
Relative perf in group SinKernelGraph 5 (5)
Benchmark This PR
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5 29.438000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5 26.136000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5 28.657000 μs
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5 33.132000 μs
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5 52.510000 μs
Relative perf in group SinKernelGraph 100 (5)
Benchmark This PR
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 282.135000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100 249.910000 μs
graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100 248.637000 μs
graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100 276.517000 μs
graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100 313.298000 μs
Relative perf in group EmptyKernel 1000 256 (2)
Benchmark This PR
ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256 5.617000 μs
ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256 4.320000 μs
Relative perf in group KernelSwitch 8 200 (2)
Benchmark This PR
ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200 0.640000 μs
ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200 1.028000 μs
Relative perf in group SubmitGraph 4 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0 6.615000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1 31.908000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0 6.699000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1 38.910000 μs
Relative perf in group SubmitGraph 10 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0 6.569000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1 34.311000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0 6.786000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1 57.054000 μs
Relative perf in group SubmitGraph 32 (4)
Benchmark This PR
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0 6.668000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1 43.192000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0 7.261000 μs
graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1 116.242000 μs
Relative perf in group Other (10)
Benchmark This PR
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 252.126000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 122.775000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.679000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.275000 GB/s
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.158000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.666000 μs
miscellaneous_benchmark_sycl VectorSum 862.139000 bw GB/s
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 6932.790000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 7542.236000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events 117253.518000 μs
Relative perf in group UsmMemoryAllocation Device 4096 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:Both 0.148000 μs
Relative perf in group UsmMemoryAllocation Device 4194304 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:Both 10.930000 μs
Relative perf in group UsmBatchMemoryAllocation Device 256 4096 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:Both 181.157000 μs
Relative perf in group UsmBatchMemoryAllocation Device 32 4194304 Both (1)
Benchmark This PR
api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:Both 346.849000 μs
Relative perf in group UsmRandomMemoryAllocation Device 256 4096 33554432 LogUniform (1)
Benchmark This PR
api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniform 0.289000 μs
Velocity Bench
Relative perf in group Other (8)
Benchmark This PR
Velocity-Bench Hashtable 349.608926 M keys/sec
Velocity-Bench Bitcracker 35.340000 s
Velocity-Bench CudaSift 206.506000 ms
Velocity-Bench QuickSilver 117.870000 MMS/CTT
Velocity-Bench Sobel Filter 619.336000 ms
Velocity-Bench dl-cifar 24.069700 s
Velocity-Bench dl-mnist 2.660000 s
Velocity-Bench svm 0.150100 s

Details

Benchmark details - environment, command...
api_overhead_benchmark_sycl SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0

ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256

ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1

ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256

ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1 --vectorSize=1

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

miscellaneous_benchmark_sycl VectorSum

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4096 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4194304 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=256 --size=4096 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:Both

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=32 --size=4194304 --measureMode=Both --iterations=1000

api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniform

Command:

/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmRandomMemoryAllocation --csv --noHeaders --type=Device --operationCount=256 --minSize=4096 --maxSize=33554432 --sizeDistribution=LogUniform --iterations=1000

Velocity-Bench Hashtable

Command:

/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl --no-verify

Velocity-Bench Bitcracker

Command:

/home/test-user/llvm_bench_workdir/bitcracker/bitcracker -f /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Velocity-Bench CudaSift

Command:

/home/test-user/llvm_bench_workdir/cudaSift/cudaSift

Velocity-Bench QuickSilver

Command:

/home/test-user/llvm_bench_workdir/QuickSilver/qs -i /home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Environment Variables:

QS_DEVICE=GPU

Velocity-Bench Sobel Filter

Command:

/home/test-user/llvm_bench_workdir/sobel_filter/sobel_filter -i /home/test-user/llvm_bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Velocity-Bench dl-cifar

Command:

/home/test-user/llvm_bench_workdir/dl-cifar/dl-cifar_sycl

Velocity-Bench dl-mnist

Command:

/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Velocity-Bench svm

Command:

/home/test-user/llvm_bench_workdir/svm/svm_sycl /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants