How to tell for sure that some layers have been offloaded to CPU/RAM? #15323
Unanswered
remon-nashid
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
... and how many of them.
With a model larger than VRAM, and when specifying -ngl 99, llama-server logs show that all layers are being offloaded to GPU despite they are clearly not.
Currently I check RAM use for clear increase to indicate that some layers are offloaded to RAM, but I don't believe it to be the smartest way.
Any clues? Any additional indicative log messages I'm missing? Ultimately, a display similar to ollama's GPU/CPU percentages would be ideal.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions