Skip to content

Commit d18f5f3

Browse files
committed
Merge 4.1 release branch into amd-fftw
2 parents 891adc6 + 02fdf40 commit d18f5f3

File tree

9 files changed

+5101
-4379
lines changed

9 files changed

+5101
-4379
lines changed

CMakeLists.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ SET(AMD_ARCH "znver1" CACHE STRING "select AMD zen version for Clang toolchain")
1414

1515
if (CMAKE_C_COMPILER_ID MATCHES Clang)
1616
if ("${AMD_ARCH}" STREQUAL "")
17-
message(FATAL_ERROR "Machine arch missing! Select one of znver1, znver2 or znver3")
17+
message(FATAL_ERROR "Machine arch missing! Select one of znver1, znver2, znver3 or znver4")
1818
elseif (${AMD_ARCH} STREQUAL "znver1")
1919
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -march=znver1")
2020
elseif (${AMD_ARCH} STREQUAL "znver2")
@@ -252,7 +252,7 @@ if (MSVC)
252252
endif(MSVC)
253253

254254
string(TIMESTAMP TODAY "%Y%m%d")
255-
add_compile_definitions(AOCL_FFTW_VERSION="AOCL-FFTW 4.0 Build ${TODAY}")
255+
add_compile_definitions(AOCL_FFTW_VERSION="AOCL-FFTW 4.1.0 Build ${TODAY}")
256256

257257
find_library (LIBM_LIBRARY NAMES m)
258258
if (LIBM_LIBRARY)

COPYRIGHT

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
/*
22
* Copyright (c) 2003, 2007-14 Matteo Frigo
33
* Copyright (c) 2003, 2007-14 Massachusetts Institute of Technology
4-
* Copyright (C) 2019-2022, Advanced Micro Devices, Inc. All Rights Reserved.
4+
* Copyright (C) 2019-2023, Advanced Micro Devices, Inc. All Rights Reserved.
55
*
66
* This program is free software; you can redistribute it and/or modify
77
* it under the terms of the GNU General Public License as published by

README_AMD.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,16 @@ AMD EPYC CPUs. It is developed on top of FFTW (version fftw-3.3.10).
66
All known features and functionalities of FFTW are retained and supported
77
as it is with this AMD optimized FFTW library.
88

9-
AOCL-FFTW achieves higher performance than the original FFTW 3.3.10 due to its
10-
various optimizations involving improved SIMD Kernel functions, improved copy
11-
functions (cpy2d and cpy2d_pair used in rank-0 transform and buffering plan),
9+
AOCL-FFTW achieves high performance as a result of its various optimizations
10+
involving improved SIMD Kernel functions, improved copy functions
11+
(cpy2d and cpy2d_pair used in rank-0 transform and buffering plan),
1212
improved 256-bit kernels selection by Planner and an optional in-place
1313
transpose for large problem sizes. AOCL-FFTW improves the performance
14-
of in-place MPI FFTs over FFTW 3.3.10 by employing a faster in-place MPI
15-
transpose function. AOCL-FFTW provides a new fast planner mode as an
16-
extension to the original planner that improves planning time of various
17-
planning modes in general and PATIENT mode in particular. Another new planning
18-
mode called Top N planner is also available that minimizes single-threaded
14+
of in-place MPI FFTs by employing a faster in-place MPI transpose function.
15+
AOCL-FFTW provides a new fast planner mode as an extension to the original
16+
planner that improves planning time of various planning modes in general
17+
and PATIENT mode in particular. Another new planning mode called
18+
Top N planner is also available that minimizes single-threaded
1919
run-to-run variations. AOCL-FFTW has a feature called AMD's application
2020
optimization layer that speeds up HPC and scientific applications. AOCL-FFTW
2121
implements the dynamic dispatcher feature that can build a single portable
@@ -45,7 +45,8 @@ generation architectures.
4545

4646
./configure --enable-sse2 --enable-avx --enable-avx2 --enable-avx512
4747
--enable-mpi --enable-openmp --enable-shared
48-
--enable-amd-opt --enable-amd-mpifft
48+
--enable-amd-opt --enable-amd-mpifft
49+
--enable-dynamic-dispatcher
4950
--prefix=<your-install-dir>
5051
make
5152
make install
@@ -85,7 +86,7 @@ problem types, Quad or Long double precisions, and split array format.
8586

8687
Dynamic dispatcher achieves Function Multi-versioning by using compiler's
8788
attributes. Use "--enable-dynamic-dispatcher" configure option to enable this
88-
feature. It is supported for GCC compiler and Linux based systems for now.
89+
feature. It is supported for Linux based systems for now.
8990
The set of x86 CPUs on which the single portable library can work depends upon
9091
the highest level of CPU SIMD instruction set with which it is configured.
9192

@@ -101,9 +102,8 @@ CONTACTS
101102
--------
102103

103104
AOCL-FFTW is developed and maintained by AMD.
104-
You can contact us on the email-id [email protected].
105-
You can also raise any issue/suggestion on the git-hub repository at
106-
https://github.com/amd/amd-fftw/issues
105+
For support of these libraries and the other tools of AMD Zen Software Studio,
106+
see https://www.amd.com/en/developer/aocc/compiler-technical-support.html
107107

108108
ACKNOWLEDGEMENTS
109109
----------------

0 commit comments

Comments
 (0)