Skip to content

Conversation

@vhscampos
Copy link
Member

In order to have MVE support, the same bits of the CPACR register that enable the floating-point extension must be set.

@vhscampos vhscampos marked this pull request as ready for review November 11, 2025 14:58
@llvmbot llvmbot added the libc label Nov 11, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 11, 2025

@llvm/pr-subscribers-libc

Author: Victor Campos (vhscampos)

Changes

In order to have MVE support, the same bits of the CPACR register that enable the floating-point extension must be set.


Full diff: https://github.com/llvm/llvm-project/pull/167338.diff

1 Files Affected:

  • (modified) libc/startup/baremetal/arm/start.cpp (+9-7)
diff --git a/libc/startup/baremetal/arm/start.cpp b/libc/startup/baremetal/arm/start.cpp
index 4740067722022..b22529f214216 100644
--- a/libc/startup/baremetal/arm/start.cpp
+++ b/libc/startup/baremetal/arm/start.cpp
@@ -131,20 +131,23 @@ namespace LIBC_NAMESPACE_DECL {
   __arm_wsr("CPSR_c", 0x13); // SVC
 #endif
 
-#ifdef __ARM_FP
-// Enable FPU
-#if __ARM_ARCH_PROFILE == 'M'
+#if __ARM_ARCH_PROFILE == 'M' &&                                               \
+    (defined(__ARM_FP) || defined(__ARM_FEATURE_MVE))
+  // Enable FPU and MVE. They can't be enabled independently: the two are
+  // governed by the same bits in CPACR.
   // Based on
   // https://developer.arm.com/documentation/dui0646/c/Cortex-M7-Peripherals/Floating-Point-Unit/Enabling-the-FPU
-  // Set CPACR cp10 and cp11
+  // Set CPACR cp10 and cp11.
   auto cpacr = (volatile uint32_t *const)0xE000ED88;
   *cpacr |= (0xF << 20);
   __dsb(0xF);
   __isb(0xF);
-#elif __ARM_ARCH_PROFILE == 'A' || __ARM_ARCH_PROFILE == 'R'
+#elif (__ARM_ARCH_PROFILE == 'A' || __ARM_ARCH_PROFILE == 'R') &&              \
+    defined(__ARM_FP)
+  // Enable FPU.
   // Based on
   // https://developer.arm.com/documentation/dui0472/m/Compiler-Coding-Practices/Enabling-NEON-and-FPU-for-bare-metal
-  // Set CPACR cp10 and cp11
+  // Set CPACR cp10 and cp11.
   uint32_t cpacr = __arm_rsr("p15:0:c1:c0:2");
   cpacr |= (0xF << 20);
   __arm_wsr("p15:0:c1:c0:2", cpacr);
@@ -154,7 +157,6 @@ namespace LIBC_NAMESPACE_DECL {
   __asm__ __volatile__("vmrs %0, FPEXC" : "=r"(fpexc) : :);
   fpexc |= (1 << 30);
   __asm__ __volatile__("vmsr FPEXC, %0" : : "r"(fpexc) :);
-#endif
 #endif
 
   // Perform the equivalent of scatterloading

@vhscampos vhscampos requested a review from lntue November 11, 2025 15:00
// https://developer.arm.com/documentation/dui0646/c/Cortex-M7-Peripherals/Floating-Point-Unit/Enabling-the-FPU
// Set CPACR cp10 and cp11
// Set CPACR cp10 and cp11.
auto cpacr = (volatile uint32_t *const)0xE000ED88;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do you mind changing this to a C++ reinterpret_cast?

Copy link
Contributor

@lntue lntue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a nit.

Copy link
Collaborator

@statham-arm statham-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MVE, it's also important to initialize the LTPSIZE field in FPSCR, because on at least some platforms it starts off at 0, but 0 means tail predication is currently active. I think it would be a good idea to set it to 4 ("no tail predication") at startup, if there's a piece of code that knows it's enabling MVE at all.

In order to have MVE support, the same bits of the CPACR register that
enable the floating-point extension must be set.
@vhscampos vhscampos force-pushed the libc-arm-start-code-mve branch from 3ca0f6b to 77b94bd Compare November 11, 2025 17:28
Copy link
Collaborator

@statham-arm statham-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one remaining comment nit, but otherwise LGTM.

__isb(0xF);
#elif __ARM_ARCH_PROFILE == 'A' || __ARM_ARCH_PROFILE == 'R'
#if defined(__ARM_FEATURE_MVE)
// Set FPSCR's LTPSIZE field to 4 to disable low-overhead-loop tail
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to nitpick, but this comment could be misleading. "Disabling" tail predication sounds like making it impossible for people to use it, and we're not doing that. We're only ensuring that it doesn't accidentally trigger at startup.

Perhaps an alternative wording along the lines of "initialize low-overhead-loop tail predication to its neutral state"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I used your suggestion verbatim as I can't come up with something better anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants