Replies: 1 comment
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
Is there any particular reason why LLAMA_MAX_SEQ is set to 64 and, say, 128 would not work or degrade in a nonlinear way? I am thinking about a specific scenario where I would like to max out a single 80 GB GPU with as many slots as possible and with quantized model and cache around 96 slots could be possible but it's currently not due to LLAMA_MAX_SEQ beeing 64.
Thanks lot!
Cheers
Beta Was this translation helpful? Give feedback.
All reactions