How do you shift context? ( Gemma 3 context shift issue ) #14170

R3veng3R · 2025-06-13T09:39:36Z

R3veng3R
Jun 13, 2025

Hi, I'm new to llama.cpp — I previously used Ollama.
I'm trying to run Gemma 3 with the following command:

./llama-server -hf <model> --n-gpu-layers 999 --host 0.0.0.0 --port 10000

However, whenever the context size reaches 4096, I get this error in the console and the model stops responding:

srv send_error: task id = 0, error: the request exceeds the available context size. try increasing the context size or enable context shift
I tried increasing the context size, and while that helps temporarily, the error comes back after a while. I also tried using --keep and -n, but I'm still a bit confused about how to properly enable context shifting.

Any help would be greatly appreciated — thanks!

ggerganov · 2025-06-13T10:03:18Z

ggerganov
Jun 13, 2025
Maintainer

Add -c argument to increase the context size. Use -c 0 to use the maximum available for the model.

0 replies

R3veng3R · 2025-06-13T10:40:34Z

R3veng3R
Jun 13, 2025
Author

@ggerganov
Thank you for the answer, but doesn't it mean that as soon as the model reaches it context size it would still stop answering after it, won't it? I was looking for a solution where you can speak to it as long as you need. Or am I missing something and doesn't understand something? I mean Ollama worked liked that, you could have a very long conversation with the model and it just removed the old part of it, leaving the most recent one with the system prompt. That's how I understand context shifting works

5 replies

ggerganov Jun 13, 2025
Maintainer

By default, the context shifting is enabled for all models and it will behave as you describe. However, Gemma 3 uses SWA which is not compatible with context shift. Therefore we disable it upon initialization and print a warning in the logs:

Note that you can disable SWA by adding the --swa-full argument. In this case, context shift will work but the model will use some extra RAM.

R3veng3R Jun 13, 2025
Author

@ggerganov oooh... that makes sense!! Thank you so much!! 🙌

I'll try it out and let you know if it helped!

R3veng3R Jun 13, 2025
Author

@ggerganov so I've tried running gemma 3 with this command line:

./llama-server -hf bartowski/google_gemma-3-12b-it-qat-GGUF:Q4_0 --swa-full --n-gpu-layers 999 --temp 1.0 --top-k 64 --top-p 0.95 --host 0.0.0.0 --port 10000

but I still got the same error in the console when reached 4096 context size..

slot launch_slot_: id 0 | task 1911 | processing task slot update_slots: id 0 | task 1911 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 4099 slot release: id 0 | task 1911 | stop processing: n_past = 0, truncated = 0 srv send_error: task id = 1911, error: the request exceeds the available context size. try increasing the context size or enable context shift

I am starting to think it might be an issue with the model itself... maybe I should try another one? Also it's worth mentioning that I am running it on Jetson Orin 32 Gb, the llama.cpp was built on it. it has aarm64 CPU architecture and tegra NVIDIA GPU. Could something of this be a problem? And the model I try is this: bartowski/google_gemma-3-12b-it-qat-GGUF:Q4_0

R3veng3R Jun 13, 2025
Author

I also noticed this in the console:

srv load_model: loaded multimodal model, '/home/<removed>/.cache/llama.cpp/bartowski_google_gemma-3-12b-it-qat-GGUF_mmproj-google_gemma-3-12b-it-qat-f16.gguf' srv load_model: ctx_shift is not supported by multimodal, it will be disabled

ggerganov Jun 13, 2025
Maintainer

The multi-modal capabilities are also not compatible with context-shift. You can disable those with --no-mmproj.

How do you shift context? ( Gemma 3 context shift issue ) #14170

Uh oh!

Uh oh!

R3veng3R Jun 13, 2025

Replies: 2 comments · 5 replies

Uh oh!

ggerganov Jun 13, 2025 Maintainer

Uh oh!

Uh oh!

R3veng3R Jun 13, 2025 Author

Uh oh!

ggerganov Jun 13, 2025 Maintainer

Uh oh!

R3veng3R Jun 13, 2025 Author

Uh oh!

Uh oh!

R3veng3R Jun 13, 2025 Author

Uh oh!

R3veng3R Jun 13, 2025 Author

Uh oh!

ggerganov Jun 13, 2025 Maintainer

R3veng3R
Jun 13, 2025

Replies: 2 comments 5 replies

ggerganov
Jun 13, 2025
Maintainer

R3veng3R
Jun 13, 2025
Author

ggerganov Jun 13, 2025
Maintainer

R3veng3R Jun 13, 2025
Author

R3veng3R Jun 13, 2025
Author

R3veng3R Jun 13, 2025
Author

ggerganov Jun 13, 2025
Maintainer