Replies: 2 comments
-
same here, I experimented with gpt4all and I was super impressed by the speed on my MacBook Air, with Wizard or Mistral Models; getting a response in clearly less than one minute, often in just a few seconds. So that set my expectations for localGPT. It would be awesome to have some suggestions where to start for smaller mps or cpu models. Now I am very curious about my first tests with larger local document ingestions from localGPT. Training is fine, I could now train a bigger set of documents; but I'm having trouble picking a model with reasonable response times on my MacBook Air, CPU or MPS. I'm talking about five minutes plus for a response time. I've tried nearly all combinations that can be found in the constants.py but nowhere near the fast response times compared to gpt4all. Having a few suggestions as a starting point would be really helpful. |
Beta Was this translation helpful? Give feedback.
-
Localgpt relies on transformers and llamacpp (python binding) for loading and model inference. So we are limited by their speed. I have experimented with using models served with ollama for inference and that seems to give much better speed. That might be the way to go if speed is a major bottleneck. Here is a video on how to use ollama with localgpt https://www.youtube.com/watch?v=Cp33aqjJX78 |
Beta Was this translation helpful? Give feedback.
-
Hello,
I have been using localGPT by 1 week and I tried almost all models and embbedding models listed in constants.py.
Maybe I didn't try every combination yet but I noticed that there isn't a good one with acceptable response time.
I have runpod.io to use GPU resources.
Is there anyone can suggest model and GPU configuration to run a chat with reasonable response time (e.g. 4/5 sec)?
Beta Was this translation helpful? Give feedback.
All reactions