Fallbacks don't trigger when a critical error is reported in streaming Responses endpoint #15910

arunmittal1 · 2025-10-24T17:12:33Z

arunmittal1
Oct 24, 2025

Hello,

We're using LiteLLM as our AI Gateway with model groups configured with fallback options. We've noticed an issue with error handling when streaming responses from OpenAI models.

Observed behavior:
When critical errors occur (such as context window exceeded, rate limit errors, or PTU-related issues), they are returned in the first chunk after the stream has started, rather than as an HTTP error before streaming begins.

Our questions:

Can LiteLLM detect these errors and trigger a fallback to the next model in the group, even when these critical errors occur as the first chunk after streaming has started?
Is there a recommended workaround/best practice to handle this scenario, preferably without requiring clients to implement their own recovery logic?

Any guidance would be appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Fallbacks don't trigger when a critical error is reported in streaming Responses endpoint #15910

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Uh oh!

Fallbacks don't trigger when a critical error is reported in streaming Responses endpoint #15910

Uh oh!

Uh oh!

arunmittal1 Oct 24, 2025

Replies: 0 comments

arunmittal1
Oct 24, 2025