feature/omptarget #1266

jxy · 2022-04-20T21:59:20Z

The OpenMP target backend here is still a work in progress. We welcome any suggestions.

As of now this port uses a few Intel extensions, contains hacks specifically for Intel architectures, and it only works on Intel GPUs.

For a quick test, try

cmake\
        -DCMAKE_BUILD_TYPE=RELEASE\
        -DQUDA_TARGET_TYPE=OMPTARGET\
        -DQUDA_DOWNLOAD_USQCD=on\
        -DQUDA_QMP=on\
        -DQUDA_QIO=on\
        -DQUDA_DIRAC_DEFAULT_OFF=on\
        -DQUDA_DIRAC_STAGGERED=on\
        -DQUDA_PRECISION=8\
        -DQUDA_RECONSTRUCT=4\
        -DQUDA_FAST_COMPILE_REDUCE=on\
        -DQUDA_FAST_COMPILE_DSLASH=on\
        -DQUDA_BUILD_NATIVE_LAPACK=off\
        -DCMAKE_CXX_COMPILER=mpic++\
        -DCMAKE_C_COMPILER=mpicc\
        ../quda

…evel reduction

…XmazMR

This reverts commit 4a31920.

…ation

omp flush doesn't seem to do what sycl::atomic_fence does. the atomic read seems to enforce the sequential consistency for the partial array we will revisit this once we have got better omp support from vendors

jxy added 30 commits April 7, 2021 22:39

omp: fix reduction by properly initializing omp_priv

11a6747

omp: allow program to run when offloading is disabled

b76a1ed

Merge branch 'feature/generic_kernel' into omp

b652488

omp: fix last merge

7ba3c0c

omp: ignore *_ctest

3907f29

omp: add explicit specialization for zero()

08f6a52

omp: add more static functions for omp declare reduction [WIP]

e893ceb

omp: use global variable in omp target for emulating 3D kernels

b1fae7d

omp: use multiple parallel regions in target teams region for block-l…

a14a446

…evel reduction

Merge branch 'feature/generic_kernel' into omp

e3e5cc1

omp: add QUDA_RT_CONSTS in coarse_op_kernel for the last merge

a93bf9c

omp: remove unused j in reduction_kernel.h

789839a

omp: fix diagnostic output in reductions

45c0f1c

omp: revert blas_test debug

027acc8

omp: add a few commented out printf in reduction

967bf34

omp: copy kernel arg to stack to for arg modifying kernels like caxpy…

80a2535

…XmazMR

Revert "omp: workaround rng"

20600b7

This reverts commit 4a31920.

omp: use rocrand_mrg32k3a for omp target

06fa0c4

omp: better handling of memcpyDefault

7616d7d

omp: update default device parameters

0dfb5c7

omp: try allocator(omp_pteam_mem_alloc) for shared memory [WIP]

736600d

Merge branch 'feature/generic_kernel' into omp

33db744

omp: only warning if no device

f299358

omp: add omp_init/reduce to MomUpdate

a19a5b8

omp: update ompwip functions

06ca380

omp: remove cpu side debug print in reduce_helper

6a57605

omp: remove debug print in tunable_nd/reduction

83b93dc

omp: remove debug print in cuda_color_spinor_field

ca6939e

omp: target/kernel cast pointer to void* before memcpy

bf353b3

omp: target/math_helper updates and uses generic versions

5302454

jxy added 30 commits February 29, 2024 01:10

Merge remote-tracking branch 'upstream/develop' into omp

5b9a8b8

omp: update following upstream

a68544f

omp: provide QUDA_OMPTARGET_THREAD_ARRAY_SLM to move thread_array loc…

629d0b4

…ation

Merge remote-tracking branch 'upstream/develop' into omp

6522f88

Merge remote-tracking branch 'upstream/develop' into omp

a5678dd

Merge remote-tracking branch 'upstream/develop' into omp

c2b5fd7

omp: include <utility>

156d81a

omp: QUDA_OMPTARGET_DEBUG only CPU side for now

872c8bf

Merge remote-tracking branch 'upstream/develop' into omp

aded7a1

Merge remote-tracking branch 'upstream/develop' into omp

605edaf

Merge branch 'unpack_fix' into omp

86648d0

Merge remote-tracking branch 'upstream/develop' into omp

8b87595

Merge remote-tracking branch 'upstream/develop' into omp

ea0aaba

omp: fix omptarget after merge

98addc4

Merge remote-tracking branch 'upstream/hotfix/complex_template' into omp

1568397

Merge remote-tracking branch 'upstream/develop' into omp

41eec7e

omp: work around a compiler bug

16b7eab

Merge remote-tracking branch 'upstream/develop' into omp

c687a04

omp: add empty device::get_state

9a5547a

BlockKernel2D_host: create block inside omp parallel

f6fe04e

Merge remote-tracking branch 'upstream/develop' into omp

1d9db13

omp: use memcpy for vector_load/store

71231b4

Merge remote-tracking branch 'upstream/develop' into omp

891f080

Merge remote-tracking branch 'upstream/develop' into omp

c918bde

Merge remote-tracking branch 'upstream/develop' into omp

8badbd9

Merge remote-tracking branch 'upstream/develop' into omp

a67e6e3

Merge remote-tracking branch 'upstream/develop' into omp

cb647c0

omptarget/reduce_helper.h: atomic read the partial results

af51dd0

omp flush doesn't seem to do what sycl::atomic_fence does. the atomic read seems to enforce the sequential consistency for the partial array we will revisit this once we have got better omp support from vendors

Merge remote-tracking branch 'upstream/develop' into omp

8794c0d

omp: omptarget/atomic_helper: add atomic_read for complex<T>

d351388

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature/omptarget #1266

feature/omptarget #1266

Uh oh!

jxy commented Apr 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feature/omptarget #1266

Are you sure you want to change the base?

feature/omptarget #1266

Uh oh!

Conversation

jxy commented Apr 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants