Replies: 1 comment
-
Another technique is continuous batching, which reduces padding memory and computation. You can read this blog to learn more. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi guys, are there other mechanisms implemented in vllm to save gpu memory other than paged attention? Thank you.
Beta Was this translation helpful? Give feedback.
All reactions