-
Notifications
You must be signed in to change notification settings - Fork 588
Added a proxy for model swapping #1645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: concedo
Are you sure you want to change the base?
Conversation
|
Hmm I think
|
|
Would a seperate file play nice with the pyinstallers? |
|
If communicating solely thru API I don't see why not.
This should be doable cleanly as an entirely separate program. However, SSE streaming will be more challenging. |
|
Either way it should be the regular koboldcpp setting it up. I dont want the mess of users having to start seperate things for this feature. However we do it should be a thing the main koboldcpp launcher / cli starts when admin mode is in use Personally I do think integrating it into koboldcpp.py makes sense. |
|
I've updated this to use the model name. I also implemented support for getting the list of models instead of the currently active model. This was enough to get it working in open webui. I've updated the original comment with a checklist. |
|
The idea is that it would show the configs the admin api already shows. You get multi modal for free since it accepts kcpps files. |
|
I could be missing something, but I don't think that config files give us multi modal for free. If the goal is to keep at most one model loaded for each modality, having a config file either comes with a drawback (implementation 1) or requires splitting configs for each modality (implementation 2). There's a two ways I can think to implement multi modality:
|
I've just tried to implement it using this way but I ran into a problem. The API responds before the server has switched over. If I try to connect to the server right after the server responds, the old server is still active so it errors out when the connection gets closed. I added a sleep to try to wait until the old server closes but it feels like this method is unreliable. |
For this particular feature you could poll the model endpoint until it matches the model you want (or you get "no model loaded" in case of an error) |
|
Based on @pqnet suggested, I swapped over to the admin api. |
This is a very rough version of a proxy for kobold so that it can swap models for each request.
Only text models are supported but that'll be fixed as well. First step towards #1623.
This can currently be used w/ things like open webui to chat with multiple models.