Skip to content

aarch64 op: Enable two‑stage SVE detection in component configuration #13204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vogma
Copy link

@vogma vogma commented Apr 22, 2025

This pull request refactors the build configuration for the OpenMPI aarch64 op component, aligning it with the existing approach used by the avx component.

Currently, the avx component configuration systematically tests SIMD instruction support (e.g., AVX2, AVX512) by incrementally applying compiler flags until compilation succeeds, independent from the host CPU's capabilities. In contrast, the existing aarch64 configuration lacks this mechanism, performing only basic compilation checks without utilizing additional flags or function attributes.

To address this gap, this pull request enhances the aarch64 configuration by incorporating checks using compiler function attributes (specifically, __attribute__((__target__("+sve")))). This enables SVE code and includes it in OpenMPI regardless of compiler flags provided explicitly via the command line, just like the avx component already does.

If compilation via function attribute is successful, the attribute is automatically inserted via macro into the SVE function templates within op_aarch64_functions.c. Runtime detection of processor capabilities (NEON or SVE) remains unchanged. No API's have been changed, only the build systems is modified along with a macro definition in the source files.

There is a discussion that goes into more detail in the developer user group.

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, it looks ok to squash the code changes in this PR down to a single commit.

If you'd like to preserve your whitespace changes, please separate those into a separate commit and label them as such (e.g., 11ff7f0).

AC_LINK_IFELSE(
[AC_LANG_PROGRAM([[
AC_CACHE_CHECK([for SVE support], [op_cv_sve_support], [
AC_MSG_RESULT([])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use 4-space tabs consistently to match the style of the rest of the Open MPI code base.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried to use 4-space tabs in the code that i added to the configure.m4 file. If i have missed any let me know

Comment on lines 157 to 154
AS_IF([test "$op_cv_sve_add_flags" = "yes"],
[AC_DEFINE([OMPI_MCA_OP_SVE_EXTRA_FLAGS], [1],[SVE supported with additional compile attributes])])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you're extending the style of how it was already done in this component, but our style guide specifically states that we should always define boolean macros to 0 or 1 (vs. define them or not define them): https://docs.open-mpi.org/en/v5.0.x/developers/source-code.html#c-c

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the definition of the macro with 0 and adjusted the header file to check via #if ...
But if I understand correctly, all macros inside the aarch64 configuration.m4 file should always be defined and checked in the source code via #if ... and not #if defined(...). If so, do you prefer these changes in this PR or should I open another one?

@vogma vogma force-pushed the aarch64_build_update branch 2 times, most recently from d0d77f0 to 4d6bf1d Compare April 22, 2025 17:03
- Introduce AC_CACHE_CHECK probes for ARM Scalable Vector Extension (SVE)
  using both a default compile test and a second test with __attribute__((__target__("+sve"))).
- Define variables op_cv_sve_support and op_cv_sve_add_flags
- Update AM_CONDITIONAL and AC_DEFINE to expose SVE support macros
  (OMPI_MCA_OP_HAVE_SVE, OMPI_MCA_OP_SVE_EXTRA_FLAGS).
- Extend final AS_IF to enable the component when either NEON or SVE is available.
- Add a preprocessor guard around SVE-specific function attributes
- Encapsulate the +sve attribute behind OMPI_MCA_OP_SVE_EXTRA_FLAGS, ensuring
  that only builds which detected and enabled compiler SVE support will compile with
  SVE-targeted code paths.
- Simplifies later code by using SVE_ATTR in function declarations instead of
  repeating the attribute clause.
- apply SVE_ATTR macro in C source for conditional +sve targeting

Signed-off-by: Marco Vogel <[email protected]>
@vogma vogma force-pushed the aarch64_build_update branch from 4d6bf1d to 1268c51 Compare April 22, 2025 17:12
@vogma
Copy link
Author

vogma commented Apr 22, 2025

I'm working on the failed test. On my AWS Graviton Instance the Tests run fine but i get an Illegal Instruction error when openmpi was compiled with lower -march flags. Looking into it..
EDIT: Actually i have accidentally set the -march parameter too high, with -march=armv8-a the examples run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants