LiteLLM Performance Roadmap #15933
Pinned
AlexsanderHamir
announced in
Announcements
Replies: 2 comments 2 replies
-
|
Hi @AlexsanderHamir , can you share the ENV variables worker configurations and db configurations used as well or if you are using helm cna you share values.yaml |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
@AlexsanderHamir you are doing great job! Kudos! I would love to see memory issues resolved, already considering to use |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
Sharing our public roadmap on LiteLLM performance overheads:
As of v1.78.5, the LiteLLM AI Gateway adds a 8 ms median overhead and 45 ms P99 overhead at 1 K concurrent requests with 4 LiteLLM instances.
This is an ~80% improvement over v1.76.0. This roadmap has 3 key components we plan on achieving by end of 2025:
You can read a detailed breakdown of each component below.
Roadmap & Goals
1. Reduce latency across endpoints (Target: Nov 30, 2025)
Achieve 8 ms median overhead and 45 ms P99 overhead on the setup described above for the following endpoints:
/chat/completions/responses/embeddings/realtime/audio/speech/audio/transcriptions2. Address memory issues (Target: Nov 30, 2025)
Resolve all reported memory leaks:
Lower LiteLLM’s overall memory footprint:
Reduce LiteLLM’s memory allocations per request.
3. Halve LiteLLM’s overhead (Target: Dec 31, 2025)
Feedback
Is there anything you'd like to see us address related to LiteLLM performance this year?
Comment below — we're happy to work with you.
Appendix: Test Setup
Load Testing Configuration
/chat/completionsEnvironment
References
Beta Was this translation helpful? Give feedback.
All reactions