[AMDGPU] InstCombine moving freeze instructions breaks FMA formation #141622

jayfoad · 2025-05-27T15:36:32Z

This is a code quality issue that has been affecting some graphics workloads recently. The LLPC frontend tends to insert freeze instructions between cmp and conditional br instructions, to avoid undefined behavior if the condition is undef or poison. Then InstCombine moves the freeze instructions into places where they interfere with optimizations like FMA formation.

With this test case I get this ISA including a v_fma_f32 instruction:

$ llc -mtriple=amdgcn -mcpu=gfx1010 r.txt -o -
...
main:                                   ; @main
; %bb.0:                                ; %bb
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_fma_f32 v0, v0, v1, 1.0
	v_cmp_lt_f32_e32 vcc_lo, 0, v0
	v_cndmask_b32_e64 v0, 0, 1, vcc_lo
	s_setpc_b64 s[30:31]

But after running it through InstCombine, I get separate v_mul_f32 and v_add_f32 instructions:

$ opt -passes=instcombine r.txt -o - | llc -mtriple=amdgcn -mcpu=gfx1010
...
main:                                   ; @main
; %bb.0:                                ; %bb
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_mul_f32_e32 v0, v0, v1
	v_add_f32_e32 v0, 1.0, v0
	v_cmp_lt_f32_e32 vcc_lo, 0, v0
	v_cndmask_b32_e64 v0, 0, 1, vcc_lo
	s_setpc_b64 s[30:31]

The text was updated successfully, but these errors were encountered:

llvmbot · 2025-05-27T15:37:14Z

@llvm/issue-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

This is a code quality issue that has been affecting some graphics workloads recently. The LLPC frontend tends to insert `freeze` instructions between `cmp` and conditional `br` instructions, to avoid undefined behavior if the condition is undef or poison. Then InstCombine moves the `freeze` instructions into places where they interfere with optimizations like FMA formation.

With this test case I get this ISA including a v_fma_f32 instruction:

$ llc -mtriple=amdgcn -mcpu=gfx1010 r.txt -o -
...
main:                                   ; @<!-- -->main
; %bb.0:                                ; %bb
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_fma_f32 v0, v0, v1, 1.0
	v_cmp_lt_f32_e32 vcc_lo, 0, v0
	v_cndmask_b32_e64 v0, 0, 1, vcc_lo
	s_setpc_b64 s[30:31]

But after running it through InstCombine, I get separate v_mul_f32 and v_add_f32 instructions:

$ opt -passes=instcombine r.txt -o - | llc -mtriple=amdgcn -mcpu=gfx1010
...
main:                                   ; @<!-- -->main
; %bb.0:                                ; %bb
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_mul_f32_e32 v0, v0, v1
	v_add_f32_e32 v0, 1.0, v0
	v_cmp_lt_f32_e32 vcc_lo, 0, v0
	v_cndmask_b32_e64 v0, 0, 1, vcc_lo
	s_setpc_b64 s[30:31]

llvmbot added the new issue label May 27, 2025

jayfoad added backend:AMDGPU code-quality llvm:instcombine and removed new issue labels May 27, 2025

This was referenced May 29, 2025

[InstCombine] Avoid breaking FMA pattern when hoisting freeze #141934

Draft

[DAGCombiner] Fold freeze(fmul) + fadd/fsub into FMA combine #142250

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] InstCombine moving freeze instructions breaks FMA formation #141622

[AMDGPU] InstCombine moving freeze instructions breaks FMA formation #141622

jayfoad commented May 27, 2025

llvmbot commented May 27, 2025

Uh oh!

[AMDGPU] InstCombine moving freeze instructions breaks FMA formation #141622

[AMDGPU] InstCombine moving freeze instructions breaks FMA formation #141622

Comments

jayfoad commented May 27, 2025

llvmbot commented May 27, 2025

Uh oh!