-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Closed
Labels
performanceMust go fasterMust go faster
Description
Given the impact of the splatting penalty in other areas, I've long wondered why these definitions aren't problematic. It turns out that, in some circumstances, they are:
julia> using FixedPointNumbers
julia> x = 0.2f0
0.2f0
julia> y = N0f8(0.2)
0.2N0f8
julia> @code_llvm x*y
define float @"julia_*_68175"(float, %Normed*) #0 !dbg !5 {
top:
%ptls_i8 = call i8* asm "movq %fs:0, $0;\0Aaddq $$-10896, $0", "=r,~{dirflag},~{fpsr},~{flags}"() #3
%ptls = bitcast i8* %ptls_i8 to i8****
%2 = alloca [4 x i8**], align 8
%.sub = getelementptr inbounds [4 x i8**], [4 x i8**]* %2, i64 0, i64 0
%3 = getelementptr [4 x i8**], [4 x i8**]* %2, i64 0, i64 2
%4 = bitcast i8*** %3 to i8*
call void @llvm.memset.p0i8.i32(i8* %4, i8 0, i32 16, i32 8, i1 false)
%5 = bitcast [4 x i8**]* %2 to i64*
store i64 4, i64* %5, align 8
%6 = getelementptr [4 x i8**], [4 x i8**]* %2, i64 0, i64 1
%7 = bitcast i8* %ptls_i8 to i64*
%8 = load i64, i64* %7, align 8
%9 = bitcast i8*** %6 to i64*
store i64 %8, i64* %9, align 8
store i8*** %.sub, i8**** %ptls, align 8
%10 = getelementptr [4 x i8**], [4 x i8**]* %2, i64 0, i64 3
%11 = call [2 x float] @julia_promote_68176(float %0, %Normed* %1)
%.elt = extractvalue [2 x float] %11, 0
%.elt2 = extractvalue [2 x float] %11, 1
store i8** inttoptr (i64 139997130231784 to i8**), i8*** %3, align 8
%12 = call i8** @jl_gc_pool_alloc(i8* %ptls_i8, i32 1440, i32 16)
%13 = getelementptr i8*, i8** %12, i64 -1
%14 = bitcast i8** %13 to i8***
store i8** inttoptr (i64 139997180171536 to i8**), i8*** %14, align 8
%15 = bitcast i8** %12 to float*
store float %.elt, float* %15, align 8
%.sroa_raw_cast5 = bitcast i8** %12 to i8*
%.sroa_raw_idx6 = getelementptr inbounds i8, i8* %.sroa_raw_cast5, i64 4
%16 = bitcast i8* %.sroa_raw_idx6 to float*
store float %.elt2, float* %16, align 4
store i8** %12, i8*** %10, align 8
%17 = call i8** @jl_f__apply(i8** null, i8*** %3, i32 2)
%18 = bitcast i8** %17 to float*
%19 = load float, float* %18, align 16
%20 = load i64, i64* %9, align 8
store i64 %20, i64* %7, align 8
ret float %19
}
julia> function myprod1(x, y)
xp, yp = promote(x, y)
xp*yp
end
myprod1 (generic function with 1 method)
julia> function myprod2(x, y)
T = promote_type(typeof(x), typeof(y))
convert(T, x) * convert(T, y)
end
myprod2 (generic function with 1 method)
julia> @code_llvm myprod1(x, y)
define float @julia_myprod1_68215(float, %Normed*) #0 !dbg !5 {
top:
%2 = call [2 x float] @julia_promote_68176(float %0, %Normed* %1)
%.elt = extractvalue [2 x float] %2, 0
%.elt2 = extractvalue [2 x float] %2, 1
%3 = fmul float %.elt, %.elt2
ret float %3
}
julia> @code_llvm myprod2(x, y)
define float @julia_myprod2_68216(float, %Normed*) #0 !dbg !5 {
top:
%2 = getelementptr inbounds %Normed, %Normed* %1, i64 0, i32 0
%3 = load i8, i8* %2, align 1
%4 = uitofp i8 %3 to float
%5 = fmul fast float %4, 0x3F70101020000000
%6 = fmul float %5, %0
ret float %6
}
julia> using BenchmarkTools
julia> @benchmark $x*$y seconds=1
BenchmarkTools.Trial:
memory estimate: 64 bytes
allocs estimate: 4
--------------
minimum time: 73.504 ns (0.00% GC)
median time: 74.888 ns (0.00% GC)
mean time: 79.331 ns (3.29% GC)
maximum time: 1.387 μs (87.63% GC)
--------------
samples: 10000
evals/sample: 972
time tolerance: 5.00%
memory tolerance: 1.00%
julia> @benchmark myprod1($x, $y) seconds=1
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 5.037 ns (0.00% GC)
median time: 5.041 ns (0.00% GC)
mean time: 5.101 ns (0.00% GC)
maximum time: 18.131 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
time tolerance: 5.00%
memory tolerance: 1.00%
julia> @benchmark myprod2($x, $y) seconds=1
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 3.703 ns (0.00% GC)
median time: 3.706 ns (0.00% GC)
mean time: 3.738 ns (0.00% GC)
maximum time: 14.419 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
time tolerance: 5.00%
memory tolerance: 1.00%
Note that this is in stark contrast to the case with "conventional" types:
julia> z = 3
3
julia> @which x*z
*(x::Number, y::Number) in Base at promotion.jl:247
julia> @which x*y
*(x::Number, y::Number) in Base at promotion.jl:247
julia> @benchmark $x*$z seconds=1
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 2.025 ns (0.00% GC)
median time: 2.030 ns (0.00% GC)
mean time: 2.060 ns (0.00% GC)
maximum time: 12.631 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
time tolerance: 5.00%
memory tolerance: 1.00%
julia> @code_llvm x*z
define float @"julia_*_68579"(float, i64) #0 !dbg !5 {
top:
%2 = sitofp i64 %1 to float
%3 = fmul float %2, %0
ret float %3
}
Before I make the obvious fix, is there a deeper issue to understand here?
Thanks to @Evizero for the observation that led to this investigation. CC @vchuravy, since it seriously affects FPN.
Metadata
Metadata
Assignees
Labels
performanceMust go fasterMust go faster