Skip to content

Commit c6071fa

Browse files
authored
feat: add hipBlas support (leejet#94)
1 parent 5c614e4 commit c6071fa

File tree

6 files changed

+113
-2
lines changed

6 files changed

+113
-2
lines changed

.gitmodules

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
[submodule "ggml"]
22
path = ggml
3-
url = https://github.com/leejet/ggml.git
3+
url = https://github.com/ggerganov/ggml.git

CMakeLists.txt

+14
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ endif()
2525
#option(SD_BUILD_TESTS "sd: build tests" ${SD_STANDALONE})
2626
option(SD_BUILD_EXAMPLES "sd: build examples" ${SD_STANDALONE})
2727
option(SD_CUBLAS "sd: cuda backend" OFF)
28+
option(SD_HIPBLAS "sd: rocm backend" OFF)
2829
option(SD_METAL "sd: metal backend" OFF)
2930
option(SD_FLASH_ATTN "sd: use flash attention for x4 less memory usage" OFF)
3031
option(SD_FAST_SOFTMAX "sd: x1.5 faster softmax, indeterministic (sometimes, same seed don't generate same image), cuda only" OFF)
@@ -46,6 +47,15 @@ if(SD_METAL)
4647
add_definitions(-DSD_USE_METAL)
4748
endif()
4849

50+
if (SD_HIPBLAS)
51+
message("Use HIPBLAS as backend stable-diffusion")
52+
set(GGML_HIPBLAS ON)
53+
add_definitions(-DSD_USE_CUBLAS)
54+
if(SD_FAST_SOFTMAX)
55+
set(GGML_CUDA_FAST_SOFTMAX ON)
56+
endif()
57+
endif ()
58+
4959
if(SD_FLASH_ATTN)
5060
message("Use Flash Attention for memory optimization")
5161
add_definitions(-DSD_USE_FLASH_ATTENTION)
@@ -67,6 +77,10 @@ endif()
6777

6878

6979
set(CMAKE_POLICY_DEFAULT_CMP0077 NEW)
80+
81+
# see https://github.com/ggerganov/ggml/pull/682
82+
add_definitions(-DGGML_MAX_NAME=128)
83+
7084
# deps
7185
add_subdirectory(ggml)
7286

README.md

+11
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,17 @@ cmake .. -DSD_CUBLAS=ON
117117
cmake --build . --config Release
118118
```
119119
120+
##### Using HipBLAS
121+
This provides BLAS acceleration using the ROCm cores of your AMD GPU. Make sure to have the ROCm toolkit installed.
122+
123+
Windows User Refer to [docs/hipBLAS_on_Windows.md](docs%2FhipBLAS_on_Windows.md) for a comprehensive guide.
124+
125+
```
126+
cmake .. -G "Ninja" -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS=gfx1100
127+
cmake --build . --config Release
128+
```
129+
130+
120131
##### Using Metal
121132
122133
Using Metal makes the computation run on the GPU. Currently, there are some issues with Metal when performing operations on very large matrices, making it highly inefficient at the moment. Performance improvements are expected in the near future.

docs/hipBLAS_on_Windows.md

+85
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Using hipBLAS on Windows
2+
3+
To get hipBLAS in `stable-diffusion.cpp` working on Windows, go through this guide section by section.
4+
5+
## Build Tools for Visual Studio 2022
6+
7+
Skip this step if you already have Build Tools installed.
8+
9+
To install Build Tools, go to [Visual Studio Downloads](https://visualstudio.microsoft.com/vs/), download `Visual Studio 2022 and other Products` and run the installer.
10+
11+
## CMake
12+
13+
Skip this step if you already have CMake installed: running `cmake --version` should output `cmake version x.y.z`.
14+
15+
Download latest `Windows x64 Installer` from [Download | CMake](https://cmake.org/download/) and run it.
16+
17+
## ROCm
18+
19+
Skip this step if you already have Build Tools installed.
20+
21+
The [validation tools](https://rocm.docs.amd.com/en/latest/reference/validation_tools.html) not support on Windows. So you should confirm the Version of `ROCM` by yourself.
22+
23+
Fortunately, `AMD` provides complete help documentation, you can use the help documentation to install [ROCM](https://rocm.docs.amd.com/en/latest/deploy/windows/quick_start.html)
24+
25+
>**If you encounter an error, if it is [AMD ROCm Windows Installation Error 215](https://github.com/RadeonOpenCompute/ROCm/issues/2363), don't worry about this error. ROCM has been installed correctly, but the vs studio plugin installation failed, we can ignore it.**
26+
27+
Then we must set `ROCM` as environment variables before running cmake.
28+
29+
Usually if you install according to the official tutorial and do not modify the ROCM path, then there is a high probability that it is here `C:\Program Files\AMD\ROCm\5.5\bin`
30+
31+
This is what I use to set the clang:
32+
```Commandline
33+
set CC=C:\Program Files\AMD\ROCm\5.5\bin\clang.exe
34+
set CXX=C:\Program Files\AMD\ROCm\5.5\bin\clang++.exe
35+
```
36+
37+
## Ninja
38+
39+
Skip this step if you already have Ninja installed: running `ninja --version` should output `1.11.1`.
40+
41+
Download latest `ninja-win.zip` from [GitHub Releases Page](https://github.com/ninja-build/ninja/releases/tag/v1.11.1) and unzip. Then set as environment variables. I unzipped it in `C:\Program Files\ninja`, so I set it like this:
42+
43+
```Commandline
44+
set ninja=C:\Program Files\ninja\ninja.exe
45+
```
46+
## Building stable-diffusion.cpp
47+
48+
The thing different from the regular CPU build is `-DSD_HIPBLAS=ON` ,
49+
`-G "Ninja"`, `-DCMAKE_C_COMPILER=clang`, `-DCMAKE_CXX_COMPILER=clang++`, `-DAMDGPU_TARGETS=gfx1100`
50+
51+
>**Notice**: check the `clang` and `clang++` information:
52+
```Commandline
53+
clang --version
54+
clang++ --version
55+
```
56+
57+
If you see like this, we can continue:
58+
```
59+
clang version 17.0.0 ([email protected]:Compute-Mirrors/llvm-project e3201662d21c48894f2156d302276eb1cf47c7be)
60+
Target: x86_64-pc-windows-msvc
61+
Thread model: posix
62+
InstalledDir: C:\Program Files\AMD\ROCm\5.5\bin
63+
```
64+
65+
```
66+
clang version 17.0.0 ([email protected]:Compute-Mirrors/llvm-project e3201662d21c48894f2156d302276eb1cf47c7be)
67+
Target: x86_64-pc-windows-msvc
68+
Thread model: posix
69+
InstalledDir: C:\Program Files\AMD\ROCm\5.5\bin
70+
```
71+
72+
>**Notice** that the `gfx1100` is the GPU architecture of my GPU, you can change it to your GPU architecture. Click here to see your architecture [LLVM Target](https://rocm.docs.amd.com/en/latest/release/windows_support.html#windows-supported-gpus)
73+
74+
My GPU is AMD Radeon™ RX 7900 XTX Graphics, so I set it to `gfx1100`.
75+
76+
option:
77+
78+
```commandline
79+
mkdir build
80+
cd build
81+
cmake .. -G "Ninja" -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS=gfx1100
82+
cmake --build . --config Release
83+
```
84+
85+
If everything went OK, `build\bin\sd.exe` file should appear.

ggml

Submodule ggml updated from 5e44969 to 2f3b12f

stable-diffusion.h

+1
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ enum sd_type_t {
7171
SD_TYPE_Q5_K = 13,
7272
SD_TYPE_Q6_K = 14,
7373
SD_TYPE_Q8_K = 15,
74+
SD_TYPE_IQ2_XXS = 16,
7475
SD_TYPE_I8,
7576
SD_TYPE_I16,
7677
SD_TYPE_I32,

0 commit comments

Comments
 (0)