Ollama HTTP 50x errors, and Ollama timeouts #702

esnible · 2025-03-06T16:46:34Z

Describe the bug
When running examples/gsm8k/gsm8.pdl with the full 1319 iterations, PDL tries to submit all 1319 completions at nearly the same time.

Sometimes Ollama logs 503, which is "Service Unavailable"
[GIN] 2025/03/06 - 11:42:21 | 503 | 42.044125ms | 127.0.0.1 | POST "/api/generate"
[GIN] 2025/03/06 - 11:42:21 | 503 | 43.173125ms | 127.0.0.1 | POST "/api/generate"
[GIN] 2025/03/06 - 11:42:21 | 503 | 44.790209ms | 127.0.0.1 | POST "/api/generate"
[GIN] 2025/03/06 - 11:42:21 | 503 | 45.941833ms | 127.0.0.1 | POST "/api/generate"

Also, PDL logs the following message:

gsm8.pdl:26 - Error during 'ollama/granite3.2:8b' model call: litellm.APIConnectionError: OllamaException - litellm.Timeout: Connection timed out after 600.0 seconds.
Failure generating the trace: Error during 'ollama/granite3.2:8b' model call: litellm.APIConnectionError: OllamaException - litellm.Timeout: Connection timed out after 600.0 seconds.

This suggests that LiteLLM or Ollama limits us to 10 minutes for a response, even for the 1319th entry, which won't be ready until the other 1318 entries were processed -- taking over an hour.

Also, Ollama logs the following message:

[GIN] 2025/03/06 - 12:10:19 | 500 |         9m59s |       127.0.0.1 | POST     "/api/generate"
time=2025-03-06T12:10:20.053-05:00 level=INFO source=server.go:727 msg="aborting completion request due to client closing the connection"

when running with 256 iterations, suggesting that LiteLLM or PDL gives up after 10 minutes and does not accept the response that Ollama finally generates.

To Reproduce
Edit gsm8.pdl to have MAX_ITERATIONS: 1319 and run gsm8.pdl.

Expected behavior
Perhaps PDL or LiteLLM should retry 503s after some delay?

Desktop (please complete the following information):

OS: Mac M3
Version Ollama 0.5.13

The text was updated successfully, but these errors were encountered:

starpit · 2025-03-06T16:51:24Z

fwiw, ollama has the following default limits; we could adjust these...

OLLAMA_MAX_LOADED_MODELS - The maximum number of models that can be loaded concurrently provided they fit in available memory. The default is 3 * the number of GPUs or 3 for CPU inference.
OLLAMA_NUM_PARALLEL - The maximum number of parallel requests each model will process at the same time. The default will auto-select either 4 or 1 based on available memory.
OLLAMA_MAX_QUEUE - The maximum number of requests Ollama will queue when busy before rejecting additional requests. The default is 512

esnible · 2025-03-07T15:33:13Z

@mandel @vazirim Any thoughts about an approach for this?

It would be straightforward to introduce a global data structure with N "tickets" to use LiteLLM; where the (N+1)th model request to use LiteLLM blocks until another model invocation completes.

Perhaps the limit is per-provider, with a default of e.g. 128 for Ollama and 500 for Replicate?

Perhaps PDL's interpreter doesn't handle this at all, but we introduce a library for excluding more than N readers, and developers explicitly request and release permission to read within PDL loops?

starpit · 2025-03-07T15:44:02Z

should we be using a thread pool executor?

esnible added the bug Something isn't working label Mar 6, 2025

esnible changed the title ~~Ollama HTTP 503 error, also known as the "Service Unavailable"~~ Ollama HTTP 503 errors, and Ollama timeouts Mar 6, 2025

esnible changed the title ~~Ollama HTTP 503 errors, and Ollama timeouts~~ Ollama HTTP 50x errors, and Ollama timeouts Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama HTTP 50x errors, and Ollama timeouts #702

Ollama HTTP 50x errors, and Ollama timeouts #702

esnible commented Mar 6, 2025 •

edited

Loading

starpit commented Mar 6, 2025 •

edited

Loading

esnible commented Mar 7, 2025

starpit commented Mar 7, 2025

Ollama HTTP 50x errors, and Ollama timeouts #702

Ollama HTTP 50x errors, and Ollama timeouts #702

Comments

esnible commented Mar 6, 2025 • edited Loading

starpit commented Mar 6, 2025 • edited Loading

esnible commented Mar 7, 2025

starpit commented Mar 7, 2025

esnible commented Mar 6, 2025 •

edited

Loading

starpit commented Mar 6, 2025 •

edited

Loading