Skip to content

Conversation

HaoweiZhangIntel
Copy link

@HaoweiZhangIntel HaoweiZhangIntel commented Mar 30, 2023

Description

Mainly improve the performance of Lapack for CUDA backend by avoiding repeated cuCtxCreate/Destroy calls.

  • Apply the same logic as cuBlas to cuSolver at placedContext_.
    This could avoid calling cuCtxCreate & cuCtxDestroy every time when using multiple lapck APIs.
    For example, when solving Ax=b with Cholesky factorization, one needs to use both lapack::potrf & lapack::potrs APIs.
    cuCtxCreate/Destroy takes much longer than most GPU lapack kernels, see the below images from nvvp diagnostics:

    Before modification:
    image
    After modification:
    image

  • Fix deprecation warnings from cuda.hpp for cuSolver ([BLAS] fix deprecation warnings from cuda.hpp #295).

  • Fix the bug in dft (mklgpu => mklcpu).

Checklist

All Submissions

* Apply the same logic as cuBlas to cuSolver at placedContext_.
  Avoid calling cuCtxCreate every time when using multiple lapck APIs.

* Fix deprecation warnings from cuda.hpp for cuSolver (uxlfoundation#295).

* Fix the bug in dft (mklgpu => mklcpu).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant