LiteLLM Performance Roadmap #15933

AlexsanderHamir · 2025-10-25T21:56:53Z

AlexsanderHamir
Oct 25, 2025
Maintainer

Hi all,

Sharing our public roadmap on LiteLLM performance overheads:

As of v1.78.5, the LiteLLM AI Gateway adds a 8 ms median overhead and 45 ms P99 overhead at 1 K concurrent requests with 4 LiteLLM instances.

This is an ~80% improvement over v1.76.0. This roadmap has 3 key components we plan on achieving by end of 2025:

Achieve 8 ms median across all major LLM endpoints.
Resolve open memory leak issues.
Achieve 4 ms median overhead across all major LLM endpoints.

You can read a detailed breakdown of each component below.

Roadmap & Goals

1. Reduce latency across endpoints (Target: Nov 30, 2025)

Achieve 8 ms median overhead and 45 ms P99 overhead on the setup described above for the following endpoints:

/chat/completions
/responses
/embeddings
/realtime
/audio/speech
/audio/transcriptions

2. Address memory issues (Target: Nov 30, 2025)

Resolve all reported memory leaks:
Lower LiteLLM’s overall memory footprint:
- Currently, a single worker consumes ~500 MB, with ~200 MB used by Prisma imports.
Reduce LiteLLM’s memory allocations per request.

3. Halve LiteLLM’s overhead (Target: Dec 31, 2025)

Target 4 ms median and 22.5 ms P99 on the same setup described below.
Components may be moved to Rust if required to meet performance goals.

Feedback

Is there anything you'd like to see us address related to LiteLLM performance this year?
Comment below — we're happy to work with you.

Appendix: Test Setup

Load Testing Configuration

Tool: Locust
Load: 1 K RPS
Endpoint: /chat/completions

Environment

Parameter	Configuration
Instances	4 LiteLLM nodes
CPUs	4 vCPUs per instance
Memory	8 GB RAM per instance
Database	Postgres
Redis	Disabled

References

jaswanth8888 · 2025-10-27T08:39:02Z

jaswanth8888
Oct 27, 2025

Hi @AlexsanderHamir , can you share the ENV variables worker configurations and db configurations used as well or if you are using helm cna you share values.yaml

1 reply

AlexsanderHamir Oct 30, 2025
Maintainer Author

Hello @jaswanth8888, yes of course, I will create gists with all that information and attach it to the announcement.

athoik · 2025-10-29T17:29:41Z

athoik
Oct 29, 2025

@AlexsanderHamir you are doing great job! Kudos!

I would love to see memory issues resolved, already considering to use MAX_REQUESTS_BEFORE_RESTART but I'll wait for newer version, since no such trick would required any more.

1 reply

AlexsanderHamir Oct 30, 2025
Maintainer Author

@AlexsanderHamir you are doing great job! Kudos!

I would love to see memory issues resolved, already considering to use MAX_REQUESTS_BEFORE_RESTART but I'll wait for newer version, since no such trick would required any more.

Thank you @athoik ! We've been able to fix a major memory leak, and I believe we have just one more that is causing pods to run OOM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

LiteLLM Performance Roadmap #15933

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

LiteLLM Performance Roadmap #15933

Uh oh!

AlexsanderHamir Oct 25, 2025 Maintainer

Roadmap & Goals

1. Reduce latency across endpoints (Target: Nov 30, 2025)

2. Address memory issues (Target: Nov 30, 2025)

3. Halve LiteLLM’s overhead (Target: Dec 31, 2025)

Feedback

Appendix: Test Setup

Load Testing Configuration

Environment

References

Replies: 2 comments · 2 replies

Uh oh!

Uh oh!

jaswanth8888 Oct 27, 2025

Uh oh!

AlexsanderHamir Oct 30, 2025 Maintainer Author

Uh oh!

athoik Oct 29, 2025

Uh oh!

AlexsanderHamir Oct 30, 2025 Maintainer Author

AlexsanderHamir
Oct 25, 2025
Maintainer

Replies: 2 comments 2 replies

jaswanth8888
Oct 27, 2025

AlexsanderHamir Oct 30, 2025
Maintainer Author

athoik
Oct 29, 2025

AlexsanderHamir Oct 30, 2025
Maintainer Author