Skip to content

[AMDGPU] InstCombine moving freeze instructions breaks FMA formation #141622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jayfoad opened this issue May 27, 2025 · 1 comment · May be fixed by #142250 or #141934
Open

[AMDGPU] InstCombine moving freeze instructions breaks FMA formation #141622

jayfoad opened this issue May 27, 2025 · 1 comment · May be fixed by #142250 or #141934

Comments

@jayfoad
Copy link
Contributor

jayfoad commented May 27, 2025

This is a code quality issue that has been affecting some graphics workloads recently. The LLPC frontend tends to insert freeze instructions between cmp and conditional br instructions, to avoid undefined behavior if the condition is undef or poison. Then InstCombine moves the freeze instructions into places where they interfere with optimizations like FMA formation.

With this test case I get this ISA including a v_fma_f32 instruction:

$ llc -mtriple=amdgcn -mcpu=gfx1010 r.txt -o -
...
main:                                   ; @main
; %bb.0:                                ; %bb
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_fma_f32 v0, v0, v1, 1.0
	v_cmp_lt_f32_e32 vcc_lo, 0, v0
	v_cndmask_b32_e64 v0, 0, 1, vcc_lo
	s_setpc_b64 s[30:31]

But after running it through InstCombine, I get separate v_mul_f32 and v_add_f32 instructions:

$ opt -passes=instcombine r.txt -o - | llc -mtriple=amdgcn -mcpu=gfx1010
...
main:                                   ; @main
; %bb.0:                                ; %bb
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_mul_f32_e32 v0, v0, v1
	v_add_f32_e32 v0, 1.0, v0
	v_cmp_lt_f32_e32 vcc_lo, 0, v0
	v_cndmask_b32_e64 v0, 0, 1, vcc_lo
	s_setpc_b64 s[30:31]

@llvmbot
Copy link
Member

llvmbot commented May 27, 2025

@llvm/issue-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

This is a code quality issue that has been affecting some graphics workloads recently. The LLPC frontend tends to insert `freeze` instructions between `cmp` and conditional `br` instructions, to avoid undefined behavior if the condition is undef or poison. Then InstCombine moves the `freeze` instructions into places where they interfere with optimizations like FMA formation.

With this test case I get this ISA including a v_fma_f32 instruction:

$ llc -mtriple=amdgcn -mcpu=gfx1010 r.txt -o -
...
main:                                   ; @<!-- -->main
; %bb.0:                                ; %bb
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_fma_f32 v0, v0, v1, 1.0
	v_cmp_lt_f32_e32 vcc_lo, 0, v0
	v_cndmask_b32_e64 v0, 0, 1, vcc_lo
	s_setpc_b64 s[30:31]

But after running it through InstCombine, I get separate v_mul_f32 and v_add_f32 instructions:

$ opt -passes=instcombine r.txt -o - | llc -mtriple=amdgcn -mcpu=gfx1010
...
main:                                   ; @<!-- -->main
; %bb.0:                                ; %bb
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_mul_f32_e32 v0, v0, v1
	v_add_f32_e32 v0, 1.0, v0
	v_cmp_lt_f32_e32 vcc_lo, 0, v0
	v_cndmask_b32_e64 v0, 0, 1, vcc_lo
	s_setpc_b64 s[30:31]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment