-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add compute USM allocation benchmarks #17623
base: unify-benchmark-ci
Are you sure you want to change the base?
Add compute USM allocation benchmarks #17623
Conversation
6f63c11
to
7f13420
Compare
Let's wait until #17617 is merged. |
) | ||
|
||
def explicit_group(self): | ||
return f"UsmMemoryAllocation {self.usm_memory_placement} {self.size} {self.measure_mode}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the results of all these benchmarks? Can't we use a single explicit group for all UsmMemoryAllocation
?
The primary consideration is whether the results are within similar range, so that the bar charts look ok.
@@ -175,6 +175,11 @@ def benchmarks(self) -> list[Benchmark]: | |||
MemcpyExecute(self, 400, 1, 102400, 10, 1, 1, 1), | |||
MemcpyExecute(self, 400, 1, 102400, 10, 0, 1, 1), | |||
MemcpyExecute(self, 4096, 4, 1024, 10, 0, 1, 0), | |||
UsmMemoryAllocation(self, RUNTIMES.UR, "Device", 4 * 1024, "Both"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how long does it all take?
f"--type={self.usm_memory_placement}", | ||
f"--size={self.size}", | ||
f"--measureMode={self.measure_mode}", | ||
"--iterations=1000", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these benchmarks stable with 1000 iterations?
UsmBatchMemoryAllocation(self, RUNTIMES.UR, "Device", 256, 4 * 1024, "Both"), | ||
UsmBatchMemoryAllocation(self, RUNTIMES.UR, "Device", 32, 4 * 1024 * 1024, "Both"), | ||
UsmRandomMemoryAllocation(self, RUNTIMES.UR, "Device", 256, 4 * 1024, 32 * 1024 * 1024, "LogUniform"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4kb and 4mb is a lot. I think we should be testing something small (like 256 bytes), medium-ish (16Kb), and large (over 64kb, like 512Kb).
Compute Benchmarks level_zero run (with params: ): |
This is missing title tags - |
Benchmarks level_zero run (): Failures
Summary(Emphasized values are the best results) Performance change in benchmark groupsCompute BenchmarksRelative perf in group SubmitKernel (6)
Relative perf in group SubmitKernel With Completion (6)
Relative perf in group SubmitKernel CPU count (2)
Relative perf in group SubmitKernel With Completion CPU count (2)
Relative perf in group SinKernelGraph 5 (5)
Relative perf in group SinKernelGraph 100 (5)
Relative perf in group EmptyKernel 1000 256 (2)
Relative perf in group KernelSwitch 8 200 (2)
Relative perf in group SubmitGraph 4 (4)
Relative perf in group SubmitGraph 10 (4)
Relative perf in group SubmitGraph 32 (4)
Relative perf in group Other (10)
Relative perf in group UsmMemoryAllocation Device 4096 Both (1)
Relative perf in group UsmMemoryAllocation Device 4194304 Both (1)
Relative perf in group UsmBatchMemoryAllocation Device 256 4096 Both (1)
Relative perf in group UsmBatchMemoryAllocation Device 32 4194304 Both (1)
Relative perf in group UsmRandomMemoryAllocation Device 256 4096 33554432 LogUniform (1)
Velocity BenchRelative perf in group Other (8)
DetailsBenchmark details - environment, command...api_overhead_benchmark_sycl SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel out of order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel in order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel out of order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel in order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of order CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256 ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1 ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256 ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1 --vectorSize=1 api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 miscellaneous_benchmark_sycl VectorSumCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without eventsCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1 api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4096 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4194304 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=256 --size=4096 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=32 --size=4194304 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniformCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmRandomMemoryAllocation --csv --noHeaders --type=Device --operationCount=256 --minSize=4096 --maxSize=33554432 --sizeDistribution=LogUniform --iterations=1000 Velocity-Bench HashtableCommand:/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl --no-verify Velocity-Bench BitcrackerCommand:/home/test-user/llvm_bench_workdir/bitcracker/bitcracker -f /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Velocity-Bench CudaSiftCommand:/home/test-user/llvm_bench_workdir/cudaSift/cudaSift Velocity-Bench QuickSilverCommand:/home/test-user/llvm_bench_workdir/QuickSilver/qs -i /home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp Environment Variables:QS_DEVICE=GPU Velocity-Bench Sobel FilterCommand:/home/test-user/llvm_bench_workdir/sobel_filter/sobel_filter -i /home/test-user/llvm_bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5 Environment Variables:OPENCV_IO_MAX_IMAGE_PIXELS=1677721600 Velocity-Bench dl-cifarCommand:/home/test-user/llvm_bench_workdir/dl-cifar/dl-cifar_sycl Velocity-Bench dl-mnistCommand:/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO Environment Variables:NEOReadDebugKeys=1 Velocity-Bench svmCommand:/home/test-user/llvm_bench_workdir/svm/svm_sycl /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m |
Compute Benchmarks level_zero run (with params: ): |
Benchmarks level_zero run (): Failures
Summary(Emphasized values are the best results) Performance change in benchmark groupsCompute BenchmarksRelative perf in group SubmitKernel (6)
Relative perf in group SubmitKernel With Completion (6)
Relative perf in group SubmitKernel CPU count (2)
Relative perf in group SubmitKernel With Completion CPU count (2)
Relative perf in group SinKernelGraph 5 (5)
Relative perf in group SinKernelGraph 100 (5)
Relative perf in group EmptyKernel 1000 256 (2)
Relative perf in group KernelSwitch 8 200 (2)
Relative perf in group SubmitGraph 4 (4)
Relative perf in group SubmitGraph 10 (4)
Relative perf in group SubmitGraph 32 (4)
Relative perf in group Other (10)
Relative perf in group UsmMemoryAllocation Device 4096 Both (1)
Relative perf in group UsmMemoryAllocation Device 4194304 Both (1)
Relative perf in group UsmBatchMemoryAllocation Device 256 4096 Both (1)
Relative perf in group UsmBatchMemoryAllocation Device 32 4194304 Both (1)
Relative perf in group UsmRandomMemoryAllocation Device 256 4096 33554432 LogUniform (1)
Velocity BenchRelative perf in group Other (8)
DetailsBenchmark details - environment, command...api_overhead_benchmark_sycl SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel out of order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel in order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel out of order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel in order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of order CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256 ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1 ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256 ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1 --vectorSize=1 api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 miscellaneous_benchmark_sycl VectorSumCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without eventsCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1 api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4096 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4194304 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=256 --size=4096 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=32 --size=4194304 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniformCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmRandomMemoryAllocation --csv --noHeaders --type=Device --operationCount=256 --minSize=4096 --maxSize=33554432 --sizeDistribution=LogUniform --iterations=1000 Velocity-Bench HashtableCommand:/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl --no-verify Velocity-Bench BitcrackerCommand:/home/test-user/llvm_bench_workdir/bitcracker/bitcracker -f /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Velocity-Bench CudaSiftCommand:/home/test-user/llvm_bench_workdir/cudaSift/cudaSift Velocity-Bench QuickSilverCommand:/home/test-user/llvm_bench_workdir/QuickSilver/qs -i /home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp Environment Variables:QS_DEVICE=GPU Velocity-Bench Sobel FilterCommand:/home/test-user/llvm_bench_workdir/sobel_filter/sobel_filter -i /home/test-user/llvm_bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5 Environment Variables:OPENCV_IO_MAX_IMAGE_PIXELS=1677721600 Velocity-Bench dl-cifarCommand:/home/test-user/llvm_bench_workdir/dl-cifar/dl-cifar_sycl Velocity-Bench dl-mnistCommand:/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO Environment Variables:NEOReadDebugKeys=1 Velocity-Bench svmCommand:/home/test-user/llvm_bench_workdir/svm/svm_sycl /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m |
Compute Benchmarks level_zero run (with params: ): |
Benchmarks level_zero run (): Failures
Summary(Emphasized values are the best results) Performance change in benchmark groupsCompute BenchmarksRelative perf in group SubmitKernel Out Of Order (3)
Relative perf in group SubmitKernel Out Of Order With Completion (3)
Relative perf in group SubmitKernel In Order (3)
Relative perf in group SubmitKernel In Order With Completion (3)
Relative perf in group SubmitKernel Out Of Order CPU count (1)
Relative perf in group SubmitKernel Out Of Order With Completion CPU count (1)
Relative perf in group SubmitKernel In Order CPU count (1)
Relative perf in group SubmitKernel In Order With Completion CPU count (1)
Relative perf in group SinKernelGraph 5 (5)
Relative perf in group SinKernelGraph 100 (5)
Relative perf in group EmptyKernel 1000 256 (2)
Relative perf in group KernelSwitch 8 200 (2)
Relative perf in group SubmitGraph 4 (4)
Relative perf in group SubmitGraph 10 (4)
Relative perf in group SubmitGraph 32 (4)
Relative perf in group Other (10)
Relative perf in group UsmMemoryAllocation Device 4096 Both (1)
Relative perf in group UsmMemoryAllocation Device 4194304 Both (1)
Relative perf in group UsmBatchMemoryAllocation Device 256 4096 Both (1)
Relative perf in group UsmBatchMemoryAllocation Device 32 4194304 Both (1)
Relative perf in group UsmRandomMemoryAllocation Device 256 4096 33554432 LogUniform (1)
Velocity BenchRelative perf in group Other (8)
DetailsBenchmark details - environment, command...api_overhead_benchmark_sycl SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel out of order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel in order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel out of order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel in order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of order CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256 ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1 ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256 ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1 --vectorSize=1 api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 miscellaneous_benchmark_sycl VectorSumCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without eventsCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1 api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4096 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4194304 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=256 --size=4096 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=32 --size=4194304 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniformCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmRandomMemoryAllocation --csv --noHeaders --type=Device --operationCount=256 --minSize=4096 --maxSize=33554432 --sizeDistribution=LogUniform --iterations=1000 Velocity-Bench HashtableCommand:/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl --no-verify Velocity-Bench BitcrackerCommand:/home/test-user/llvm_bench_workdir/bitcracker/bitcracker -f /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Velocity-Bench CudaSiftCommand:/home/test-user/llvm_bench_workdir/cudaSift/cudaSift Velocity-Bench QuickSilverCommand:/home/test-user/llvm_bench_workdir/QuickSilver/qs -i /home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp Environment Variables:QS_DEVICE=GPU Velocity-Bench Sobel FilterCommand:/home/test-user/llvm_bench_workdir/sobel_filter/sobel_filter -i /home/test-user/llvm_bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5 Environment Variables:OPENCV_IO_MAX_IMAGE_PIXELS=1677721600 Velocity-Bench dl-cifarCommand:/home/test-user/llvm_bench_workdir/dl-cifar/dl-cifar_sycl Velocity-Bench dl-mnistCommand:/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO Environment Variables:NEOReadDebugKeys=1 Velocity-Bench svmCommand:/home/test-user/llvm_bench_workdir/svm/svm_sycl /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m |
Compute Benchmarks level_zero run (with params: ): |
Benchmarks level_zero run (): Failures
Summary(Emphasized values are the best results) Performance change in benchmark groupsCompute BenchmarksRelative perf in group SubmitKernel Out Of Order (3)
Relative perf in group SubmitKernel Out Of Order With Completion (3)
Relative perf in group SubmitKernel In Order (3)
Relative perf in group SubmitKernel In Order With Completion (3)
Relative perf in group SubmitKernel Out Of Order CPU count (1)
Relative perf in group SubmitKernel Out Of Order With Completion CPU count (1)
Relative perf in group SubmitKernel In Order CPU count (1)
Relative perf in group SubmitKernel In Order With Completion CPU count (1)
Relative perf in group SinKernelGraph 5 (5)
Relative perf in group SinKernelGraph 100 (5)
Relative perf in group EmptyKernel 1000 256 (2)
Relative perf in group KernelSwitch 8 200 (2)
Relative perf in group SubmitGraph 4 (4)
Relative perf in group SubmitGraph 10 (4)
Relative perf in group SubmitGraph 32 (4)
Relative perf in group Other (10)
Relative perf in group UsmMemoryAllocation Device 4096 Both (1)
Relative perf in group UsmMemoryAllocation Device 4194304 Both (1)
Relative perf in group UsmBatchMemoryAllocation Device 256 4096 Both (1)
Relative perf in group UsmBatchMemoryAllocation Device 32 4194304 Both (1)
Relative perf in group UsmRandomMemoryAllocation Device 256 4096 33554432 LogUniform (1)
Velocity BenchRelative perf in group Other (8)
DetailsBenchmark details - environment, command...api_overhead_benchmark_sycl SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel out of order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel in order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel out of order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel in order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of order CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of order with measure completion CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU countCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order with measure completionCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_ur SinKernelGraph graphs:1, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_ur --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 ulls_benchmark_sycl EmptyKernel wgc:1000, wgs:256Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256 ulls_benchmark_sycl KernelSwitch count 8 kernelTime 200Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_sycl --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1 ulls_benchmark_l0 EmptyKernel wgc:1000, wgs:256Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=EmptyKernel --csv --noHeaders --iterations=10000 --wgs=256 --wgc=256 ulls_benchmark_l0 KernelSwitch count 8 kernelTime 200Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/ulls_benchmark_l0 --test=KernelSwitch --csv --noHeaders --iterations=1000 --count=8 --kernelTime=200 --barrier=0 --hostVisible=0 --ioq=1 --ctrBasedEvents=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 0 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 0 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 0 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=0 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:4 ioq 1 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=4 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:10 ioq 1 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=10 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 0Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=0 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 graph_api_benchmark_sycl SubmitGraph numKernels:32 ioq 1 measureCompletion 1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitGraph --csv --noHeaders --iterations=10000 --NumKernels=32 --MeasureCompletionTime=1 --InOrderQueue=1 --Profiling=0 --KernelExecutionTime=1 memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1 --vectorSize=1 api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 miscellaneous_benchmark_sycl VectorSumCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without eventsCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1 api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4096 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4096 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmMemoryAllocation usmMemoryPlacement:Device size:4194304 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmMemoryAllocation --csv --noHeaders --type=Device --size=4194304 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:256 size:4096 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=256 --size=4096 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmBatchMemoryAllocation usmMemoryPlacement:Device allocationCount:32 size:4194304 measureMode:BothCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmBatchMemoryAllocation --csv --noHeaders --type=Device --allocationCount=32 --size=4194304 --measureMode=Both --iterations=1000 api_overhead_benchmark_ur UsmRandomMemoryAllocation usmMemoryPlacement:Device operationCount:256 minSize:4096 maxSize:33554432 sizeDistribution:LogUniformCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=UsmRandomMemoryAllocation --csv --noHeaders --type=Device --operationCount=256 --minSize=4096 --maxSize=33554432 --sizeDistribution=LogUniform --iterations=1000 Velocity-Bench HashtableCommand:/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl --no-verify Velocity-Bench BitcrackerCommand:/home/test-user/llvm_bench_workdir/bitcracker/bitcracker -f /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Velocity-Bench CudaSiftCommand:/home/test-user/llvm_bench_workdir/cudaSift/cudaSift Velocity-Bench QuickSilverCommand:/home/test-user/llvm_bench_workdir/QuickSilver/qs -i /home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp Environment Variables:QS_DEVICE=GPU Velocity-Bench Sobel FilterCommand:/home/test-user/llvm_bench_workdir/sobel_filter/sobel_filter -i /home/test-user/llvm_bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5 Environment Variables:OPENCV_IO_MAX_IMAGE_PIXELS=1677721600 Velocity-Bench dl-cifarCommand:/home/test-user/llvm_bench_workdir/dl-cifar/dl-cifar_sycl Velocity-Bench dl-mnistCommand:/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO Environment Variables:NEOReadDebugKeys=1 Velocity-Bench svmCommand:/home/test-user/llvm_bench_workdir/svm/svm_sycl /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m |
This PR introduces USM memory allocation benchmark scenarios to compute benchmark suite.