Replies: 1 comment 1 reply
-
Do you use 1 or multiple slots? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a llama-server demo that's been running for about a month on an AWS Graviton instance (c8g.24xlarge), and over time the performance went from 60 TPS generation to about 5 TPS. I restarted the server process and performance went back to 60 TPS.
I checked log files and didn't notice any excessive single-file log use or anything like that. Basically the only thing I noticed was that resting CPU was about 11% before I restarted llama-server, at which point CPU use dropped back to near zero resting.
Has anyone here run llama-server for a significant period of uptime? Are there any known processes/files that snowball?
Beta Was this translation helpful? Give feedback.
All reactions