Llama 3 achieves pretty good recall to 65k context w/ rope_theta set to 16M #6890
Replies: 4 comments
-
does anyone know why setting the rope theta to 16m in order to achieve 65k context length according to the twitter source? in my understanding, if we want to extend the context length from 8k (llama3 8b supports naturally) to 65k, we need a multiply factor of 8. Based on the original setting of llama3 8b, the rope theta was set to 0.5m, and we multiply that by 8, we get 4m instead of 16m. Can't figure out where the difference comes from. Thanks in advance. |
Beta Was this translation helpful? Give feedback.
-
The answer may lie these posts: https://blog.eleuther.ai/rotary-embeddings/ |
Beta Was this translation helpful? Give feedback.
-
Thank you for the links. From what i read from the blog, the parameter rope-freq-base should be referred to as the new base value, correct? It is set to 8m here, however from the blog, shouldn't it become 0.5m (llama3 original base) * 4 (factor s, 32k [target ctx length] / 8k [llama3 ctx length]) ** (128 / 126) = 2m, instead of 8m? Correct me if i'm wrong, thanks. |
Beta Was this translation helpful? Give feedback.
-
Your understanding is the same as mine, and so are your questions. I initially posted here to solicitate some reactions to try to understand how RoPE works, but so far it has been a rather quiet echo from the experts. |
Beta Was this translation helpful? Give feedback.
-
Source: https://twitter.com/winglian/status/1783122644579090600
I'm not sure how this can be applied using llama.cpp but when I try with
-c 32768 --rope-scaling linear --rope-freq-base 8000000
I get coherent and high quality results from the model.Am I using the right parameters? I also noticed the VRAM usage doesn't go up all that much and I can easily run the Q8_0 of the 8B version on 24GB (it uses only 16.5GB fully offloaded). Actually, I can even run the FP16 fully offloaded using 23GB and 32K context.
The performance is also quite acceptable (this is on 4090):
Beta Was this translation helpful? Give feedback.
All reactions