-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AArch64] Consider COPY between disjoint register classes as expensive #167661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-backend-aarch64 Author: Guy David (guy-david) ChangesThe motivation is to allow passes such as MachineLICM to hoist trivial FMOV instructions out of loops, where previously it didn't do so even when the RHS is a constant. Full diff: https://github.com/llvm/llvm-project/pull/167661.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 4b4073365483e..6482091c2cc70 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -1043,6 +1043,27 @@ static bool isCheapImmediate(const MachineInstr &MI, unsigned BitSize) {
return Is.size() <= 2;
}
+// Check if a COPY instruction is cheap.
+static bool isCheapCopy(const MachineInstr &MI,
+ const AArch64RegisterInfo &RI) {
+ assert(MI.isCopy() && "Expected COPY instruction");
+ const MachineRegisterInfo &MRI = MI.getMF()->getRegInfo();
+
+ // Cross-register-class copies (e.g., between GPR and FPR) are expensive on
+ // AArch64, typically requiring an FMOV instruction with a 2-6 cycle latency.
+ auto getRegClass = [&](Register Reg) -> const TargetRegisterClass * {
+ return Reg.isVirtual() ? MRI.getRegClass(Reg)
+ : Reg.isPhysical() ? RI.getMinimalPhysRegClass(Reg)
+ : nullptr;
+ };
+ const TargetRegisterClass *DstRC = getRegClass(MI.getOperand(0).getReg());
+ const TargetRegisterClass *SrcRC = getRegClass(MI.getOperand(1).getReg());
+ if (DstRC && SrcRC && !RI.getCommonSubClass(DstRC, SrcRC))
+ return false;
+
+ return MI.isAsCheapAsAMove();
+}
+
// FIXME: this implementation should be micro-architecture dependent, so a
// micro-architecture target hook should be introduced here in future.
bool AArch64InstrInfo::isAsCheapAsAMove(const MachineInstr &MI) const {
@@ -1056,6 +1077,9 @@ bool AArch64InstrInfo::isAsCheapAsAMove(const MachineInstr &MI) const {
default:
return MI.isAsCheapAsAMove();
+ case TargetOpcode::COPY:
+ return isCheapCopy(MI, RI);
+
case AArch64::ADDWrs:
case AArch64::ADDXrs:
case AArch64::SUBWrs:
diff --git a/llvm/test/CodeGen/AArch64/licm-regclass-copy.mir b/llvm/test/CodeGen/AArch64/licm-regclass-copy.mir
new file mode 100644
index 0000000000000..287379774f519
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/licm-regclass-copy.mir
@@ -0,0 +1,55 @@
+# NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+# RUN: llc -mtriple=aarch64 -run-pass=early-machinelicm -verify-machineinstrs -o - %s | FileCheck %s
+
+# This test verifies that cross-register-class copies (e.g., between GPR and FPR)
+# ARE hoisted out of loops by MachineLICM, as they translate to expensive
+# instructions like FMOV (2-6 cycles) on AArch64.
+
+---
+name: cross_regclass_copy_hoisted
+tracksRegLiveness: true
+registers:
+ - { id: 0, class: gpr64 }
+ - { id: 1, class: gpr64 }
+ - { id: 2, class: fpr64 }
+body: |
+ ; CHECK-LABEL: name: cross_regclass_copy_hoisted
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x80000000)
+ ; CHECK-NEXT: liveins: $x0, $d0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: %0:gpr64 = COPY $x0
+ ; CHECK-NEXT: %2:fpr64 = COPY $d0
+ ; CHECK-NEXT: %1:gpr64 = COPY %2
+ ; CHECK-NEXT: B %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: %0:gpr64 = ADDXri %0, 1, 0
+ ; CHECK-NEXT: $xzr = SUBSXri %0, 100, 0, implicit-def $nzcv
+ ; CHECK-NEXT: Bcc 11, %bb.1, implicit $nzcv
+ ; CHECK-NEXT: B %bb.2
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: $x0 = COPY %1
+ ; CHECK-NEXT: RET_ReallyLR implicit $x0
+ bb.0:
+ liveins: $x0, $d0
+ %0:gpr64 = COPY $x0
+ %2:fpr64 = COPY $d0
+ B %bb.1
+
+ bb.1:
+ ; This COPY between FPR64 and GPR64 should be hoisted
+ %1:gpr64 = COPY %2:fpr64
+ %0:gpr64 = ADDXri %0:gpr64, 1, 0
+ $xzr = SUBSXri %0:gpr64, 100, 0, implicit-def $nzcv
+ Bcc 11, %bb.1, implicit $nzcv
+ B %bb.2
+
+ bb.2:
+ $x0 = COPY %1:gpr64
+ RET_ReallyLR implicit $x0
+...
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
The motivation is to allow passes such as MachineLICM to hoist trivial FMOV instructions out of loops, where previously it didn't do so even when the RHS is a constant. On most architectures, these expensive move instructions have a latency of 2-6 cycles, and certainly not cheap as a 0-1 cycle move.
258e4cf to
0714e8b
Compare
nasherm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| return Reg.isVirtual() ? MRI.getRegClass(Reg) | ||
| : Reg.isPhysical() ? RI.getMinimalPhysRegClass(Reg) | ||
| : nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be more readable by breaking up into if() with early return rather than nested ? ops
|
|
||
| // Cross-register-class copies (e.g., between GPR and FPR) are expensive on | ||
| // AArch64, typically requiring an FMOV instruction with a 2-6 cycle latency. | ||
| auto getRegClass = [&](Register Reg) -> const TargetRegisterClass * { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: llvm coding standard uses upper case first letter for varibles.
| auto getRegClass = [&](Register Reg) -> const TargetRegisterClass * { | |
| auto GetRegClass = [&](Register Reg) -> const TargetRegisterClass * { |
| %0:gpr32 = MOVi32imm 2143289344 | ||
| %1:fpr32 = COPY %0:gpr32 | ||
| %2:fpr32 = FMOVS0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a test for a physical register?
The motivation is to allow passes such as MachineLICM to hoist trivial FMOV instructions out of loops, where previously it didn't do so even when the RHS is a constant.
On most architectures, these expensive move instructions have a latency of 2-6 cycles, and certainly not cheap as a 0-1 cycle move.