Replies: 2 comments 5 replies
-
Add |
Beta Was this translation helpful? Give feedback.
-
@ggerganov |
Beta Was this translation helpful? Give feedback.
-
Add |
Beta Was this translation helpful? Give feedback.
-
@ggerganov |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I'm new to llama.cpp — I previously used Ollama.
I'm trying to run Gemma 3 with the following command:
./llama-server -hf <model> --n-gpu-layers 999 --host 0.0.0.0 --port 10000
However, whenever the context size reaches 4096, I get this error in the console and the model stops responding:
srv send_error: task id = 0, error: the request exceeds the available context size. try increasing the context size or enable context shift
I tried increasing the context size, and while that helps temporarily, the error comes back after a while. I also tried using --keep and -n, but I'm still a bit confused about how to properly enable context shifting.
Any help would be greatly appreciated — thanks!
Beta Was this translation helpful? Give feedback.
All reactions