Replies: 1 comment 2 replies
-
Hey @nchammas good questions here -- There isn't really a clean solution inside of guidance today, but this is actually the focus of our current work cycle. Ideally, you'd be able to set up a guidance "server" and connect multiple clients to it, either from the same process (using async and/or threads) or from entirely separate processes and/or over http. You may be able to hack something together in the meantime though. The |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am using Guidance in the context of a Streamlit web application. I assume it would be a mistake to have every web session get its own instance of the LLM, as that would both be a large waste of memory and severely limit the number of concurrent users I can support.
What I want to do is load a model once, set its system prompt, and then allow multiple users to have independent conversations against that single model instance. So the model itself would be shared, as would the system prompt, but each web session would track its own conversation state.
I see that
Model
has astate
attribute. To track independent conversations against a common backend, should I basically save this attribute to my web session state and load it from there?Or am I thinking about this the wrong way? I am aware of related projects like vLLM that are meant for model serving and support structured outputs, but I don't know if I need to leverage a tool like that vs. just use Guidance directly.
Beta Was this translation helpful? Give feedback.
All reactions