Slightly inaccurate emulated fma on Float16 #57784

giordano · 2025-03-15T14:03:50Z

Looking at llvm/llvm-project#128450, I realised that our emulated Float16 FMA is inaccurate as well:

julia> for T in (Float16, Float32, Float64), f in (fma, muladd)
           @eval @show $(f)($(T)(0x1.400p+8), $(T)(0x1.008p+7), $(T)(0x1.000p-24))
       end
(fma)((Float16)(320.0), (Float16)(128.25), (Float16)(5.960464477539063e-8)) = Float16(4.102e4)
(muladd)((Float16)(320.0), (Float16)(128.25), (Float16)(5.960464477539063e-8)) = Float16(4.106e4)
(fma)((Float32)(320.0), (Float32)(128.25), (Float32)(5.960464477539063e-8)) = 41040.0f0
(muladd)((Float32)(320.0), (Float32)(128.25), (Float32)(5.960464477539063e-8)) = 41040.0f0
(fma)((Float64)(320.0), (Float64)(128.25), (Float64)(5.960464477539063e-8)) = 41040.000000059605
(muladd)((Float64)(320.0), (Float64)(128.25), (Float64)(5.960464477539063e-8)) = 41040.000000059605

julia> for T in (Float16, Float32, Float64), f in (fma, muladd)
           @eval @show $(f)($(T)(0x1.eb8p-12), $(T)(0x1.9p-11), $(T)(-0x1p-11))
       end
(fma)((Float16)(0.0004687309265136719), (Float16)(0.000762939453125), (Float16)(-0.00048828125)) = Float16(-0.0004878)
(muladd)((Float16)(0.0004687309265136719), (Float16)(0.000762939453125), (Float16)(-0.00048828125)) = Float16(-0.000488)
(fma)((Float32)(0.0004687309265136719), (Float32)(0.000762939453125), (Float32)(-0.00048828125)) = -0.00048792362f0
(muladd)((Float32)(0.0004687309265136719), (Float32)(0.000762939453125), (Float32)(-0.00048828125)) = -0.00048792362f0
(fma)((Float64)(0.0004687309265136719), (Float64)(0.000762939453125), (Float64)(-0.00048828125)) = -0.0004879236366832629
(muladd)((Float64)(0.0004687309265136719), (Float64)(0.000762939453125), (Float64)(-0.00048828125)) = -0.0004879236366832629

julia> versioninfo()
Julia Version 1.13.0-DEV.204
Commit b9ac28a645* (2025-03-12 09:49 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin23.4.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LLVM: libLLVM-19.1.7 (ORCJIT, apple-m1)
  GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 4 virtual cores)

The result of fma is 1ULP off.

Note that on this CPU, with native support for fp16 extension, muladd gives the "right" result, unlike fma (which is using the emulated fma implementation because of #57783).

The text was updated successfully, but these errors were encountered:

giordano added the float16 label Mar 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slightly inaccurate emulated fma on Float16 #57784

Slightly inaccurate emulated fma on Float16 #57784

giordano commented Mar 15, 2025

Slightly inaccurate emulated fma on Float16 #57784

Slightly inaccurate emulated fma on Float16 #57784

Comments

giordano commented Mar 15, 2025