-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : build backends as libraries #10256
Conversation
28b3b76
to
0cdecd3
Compare
0cdecd3
to
ab26fb9
Compare
8cd434c
to
646e91a
Compare
bac7868
to
710822f
Compare
db2cb04
to
45f7dc4
Compare
45f7dc4
to
c8da7d0
Compare
This should be good now. There are a few remaining issues that I was not able to fix:
Other important changes:
|
@slaren Got it! I can start working on it. |
Thanks, I will check. |
During testing of the |
Right, the build now produces more libraries that also need to be copied to the container. Thanks for testing, I will update the dockerfiles after the swift and MUSA fixes are merged here. |
e6242b4
to
e503ad1
Compare
Verified that the $ docker run -it -v $HOME/models:/models local/llama.cpp:light-musa \
-m /models/llama3.2_1b_q8_0.gguf -ngl 999 -n 512 -co -cnv \
-p "You are a helpful assistant." |
Moves each backend to a different directory with its own build script. The ggml library is split into the target
ggml-base
that only includes the core ggml elements, andggml
that bundlesggml-base
and all the backends included in the build.To completely separate the build of the CPU backend,
ggml-quants.c
andggml-aarch64.c
have been split such as the reference quantization and dequantization functions are inggml-base
, and the optimized quantization and dot product functions are inggml-cpu
.The build is organized as such:
Currently, ggml needs to be linked to the backend libraries, but ultimately the goal is to load the backends dynamically at runtime, so that we can distribute a single llama.cpp package that includes all the backends, as well as multiple versions of the CPU backend compiled with different instruction sets.
Breaking changes
Applications that use ggml and llama.cpp should not require any changes, they only need to link to the ggml and llama targets as usual. However, when building with
BUILD_SHARED_LIBS
, additional shared libraries are produced that need to be bundled with the application: in addition tollama
andggml
,ggml-base
,ggml-cpu
and the any other backends included in the build should be added to the application package.GGML_HIPBLAS
toGGML_HIP
, in line with a previous change to the CUDA backend