Distributed inference #15796
Unanswered
alejandrods
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have 3 Macs with 512GB each (unified memory). I have been testing the distributed inference of
llama.cpp
withQwen3-30B-A3B-Instruct-2507
(just for testing) and it works great but I have a few questions:llama-server
also used for inference? or only for tokenization?llama-server.cpp
. I configured thellama-server
with--parallel 3
to process 3 requests in parallel. Doesupdate_slots: id 2
means that this is request number 2?rpc-servers
show me "Null buffer for tensor passed to init_tensor function" and the connection is closed?Thank you so much!
Beta Was this translation helpful? Give feedback.
All reactions