-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ollama HTTP 50x errors, and Ollama timeouts #702
Comments
fwiw, ollama has the following default limits; we could adjust these...
|
@mandel @vazirim Any thoughts about an approach for this? It would be straightforward to introduce a global data structure with N "tickets" to use LiteLLM; where the (N+1)th Perhaps the limit is per-provider, with a default of e.g. 128 for Ollama and 500 for Replicate? Perhaps PDL's interpreter doesn't handle this at all, but we introduce a library for excluding more than N readers, and developers explicitly request and release permission to read within PDL loops? |
should we be using a thread pool executor? |
Describe the bug
When running examples/gsm8k/gsm8.pdl with the full 1319 iterations, PDL tries to submit all 1319 completions at nearly the same time.
Sometimes Ollama logs 503, which is "Service Unavailable"
[GIN] 2025/03/06 - 11:42:21 | 503 | 42.044125ms | 127.0.0.1 | POST "/api/generate"
[GIN] 2025/03/06 - 11:42:21 | 503 | 43.173125ms | 127.0.0.1 | POST "/api/generate"
[GIN] 2025/03/06 - 11:42:21 | 503 | 44.790209ms | 127.0.0.1 | POST "/api/generate"
[GIN] 2025/03/06 - 11:42:21 | 503 | 45.941833ms | 127.0.0.1 | POST "/api/generate"
Also, PDL logs the following message:
This suggests that LiteLLM or Ollama limits us to 10 minutes for a response, even for the 1319th entry, which won't be ready until the other 1318 entries were processed -- taking over an hour.
Also, Ollama logs the following message:
when running with 256 iterations, suggesting that LiteLLM or PDL gives up after 10 minutes and does not accept the response that Ollama finally generates.
To Reproduce
Edit gsm8.pdl to have
MAX_ITERATIONS: 1319
and run gsm8.pdl.Expected behavior
Perhaps PDL or LiteLLM should retry 503s after some delay?
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: