You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+47-83
Original file line number
Diff line number
Diff line change
@@ -44,7 +44,7 @@ Advantages:
44
44
1. Design three-stage pipelines of preprocess, rejection and response
45
45
*`chat_in_group` copes with **group chat** scenario, answer user questions without message flooding, see [2401.08772](https://arxiv.org/abs/2401.08772), [2405.02817](https://arxiv.org/abs/2405.02817), [Hybrid Retrieval](./docs/en/doc_knowledge_graph.md) and [Precision Report](./evaluation/)
46
46
*`chat_with_repo` for **real-time streaming** chat
47
-
2. No training required, with CPU-only, 2G, 10G, 20G and 80G configuration
47
+
2. No training required, with CPU-only, 2G, 10G configuration
48
48
3. Offers a complete suite of Web, Android, and pipeline source code, industrial-grade and commercially viable
49
49
50
50
Check out the [scenes in which HuixiangDou are running](./huixiangdou-inside.md) and current public service status:
@@ -60,6 +60,7 @@ Our Web version has been released to [OpenXLab](https://openxlab.org.cn/apps/det
60
60
61
61
The Web version's API for Android also supports other devices. See [Python sample code](./tests/test_openxlab_android_api.py).
62
62
63
+
-\[2025/03\] Simplify deployment and removing `--standalone`
63
64
-\[2025/03\][Forwarding multiple wechat group message](./docs/zh/doc_merge_wechat_group.md)
64
65
-\[2024/09\][Inverted indexer](https://github.com/InternLM/HuixiangDou/pull/387) makes LLM prefer knowledge base🎯
|[config-cpu.ini](./config-cpu.ini)| - | Use [siliconcloud](https://siliconflow.cn/) API <br/> for text only ||
173
-
|[config-2G.ini](./config-2G.ini)| 2GB | Use openai API (such as [kimi](https://kimi.moonshot.cn), [deepseek](https://platform.deepseek.com/usage) and [stepfun](https://platform.stepfun.com/) to search for text only ||
172
+
|\[Standard Edition\][config.ini](./config.ini)| 2GB | Use openai API (such as [kimi](https://kimi.moonshot.cn), [deepseek](https://platform.deepseek.com/usage) and [stepfun](https://platform.stepfun.com/) to search for text only ||
174
173
|[config-multimodal.ini](./config-multimodal.ini)| 10GB | Use openai API for LLM, image and text retrieval ||
175
-
|\[Standard Edition\][config.ini](./config.ini)| 19GB | Local deployment of LLM, single modality ||
176
-
|[config-advanced.ini](./config-advanced.ini)| 80GB | local LLM, anaphora resolution, single modality, practical for WeChat group ||
# Save the features of repodir to workdir, and update the positive and negative example thresholds into `config.ini`
214
211
mkdir workdir
212
+
# build knowledge base
215
213
python3 -m huixiangdou.service.feature_store
216
214
```
217
215
218
-
After running, test with `python3 -m huixiangdou.main --standalone`. At this time, reply to mmpose related questions (related to the knowledge base), while not responding to weather questions.
216
+
## III. Setup LLM API and test
217
+
Set the model and `api-key` in `config.ini`. If running LLM locally, we recommend using `vllm`.
> If restarting LLM every time is too slow, first <b>python3 -m huixiangdou.service.llm_server_hybrid</b>; then open a new window, and each time only execute <b>python3 -m huixiangdou.main</b> without restarting LLM.
Please update the `repodir` documents, [good_questions](./resource/good_questions.json) and [bad_questions](./resource/bad_questions.json), and try your own domain knowledge (medical, financial, power, etc.).
262
278
263
-
## III. Integration into Feishu, WeChat group
279
+
## IV. Integration into Feishu, WeChat group
264
280
265
281
-[**One-way** sending to Feishu group](./docs/zh/doc_send_only_lark_group.md)
266
282
-[**Two-way** Feishu group receiving and sending, recalling](./docs/zh/doc_add_lark_group.md)
If you find the installation too slow, a pre-installed image is provided in [Docker Hub](https://hub.docker.com/repository/docker/tpoisonooo/huixiangdou/tags). Simply replace it when starting the docker.
304
320
305
-
## **2G Cost-effective Edition**
306
-
307
-
If your GPU mem exceeds 1.8G, or you pursue cost-effectiveness. This configuration discards the local LLM and uses remote LLM instead, which is the same as the standard edition.
308
-
309
-
Take `siliconcloud` as an example, fill in the API TOKEN applied from the [official website](https://siliconflow.cn/) into `config-2G.ini`
310
-
311
-
```toml
312
-
# config-2G.ini
313
-
[llm]
314
-
enable_local = 0# Turn off local LLM
315
-
enable_remote = 1# Only use remote
316
-
..
317
-
remote_type = "siliconcloud"# Choose siliconcloud
318
-
remote_api_key = "YOUR-API-KEY-HERE"# Your API key
319
-
remote_llm_model = "alibaba/Qwen1.5-110B-Chat"
320
-
```
321
-
322
-
> \[!NOTE\]
323
-
>
324
-
> <divalign="center">
325
-
> Each Q&A scenario requires calling the LLM 7 times at worst, subject to the free user RPM limit, you can modify the <b>rpm</b> parameter in config.ini
326
-
> </div>
327
-
328
-
Execute the following to get the Q&A results
329
-
330
-
```shell
331
-
python3 -m huixiangdou.main --standalone --config-path config-2G.ini # Start all services at once
332
-
```
333
-
334
321
## **10G Multimodal Edition**
335
322
336
323
If you have 10G GPU mem, you can further support image and text retrieval. Just modify the model used in config.ini.
@@ -354,15 +341,7 @@ Run gradio to test, see the image and text retrieval result [here](https://githu
354
341
python3 tests/test_query_gradio.py
355
342
```
356
343
357
-
## **80G Complete Edition**
358
-
359
-
The "HuiXiangDou" in the WeChat experience group has enabled all features:
360
-
361
-
- Serper search and SourceGraph search enhancement
362
-
- Group chat images, WeChat public account parsing
363
-
- Text coreference resolution
364
-
- Hybrid LLM
365
-
- Knowledge base is related to openmmlab's 12 repositories (1700 documents), refusing small talk
344
+
## **Furthermore**
366
345
367
346
Please read the following topics:
368
347
@@ -391,21 +370,6 @@ Contributors have provided [Android tools](./android) to interact with WeChat. T
391
370
392
371
LLM long text based on transformers structure requires more memory. At this time, kv cache quantization needs to be done on the model, such as [lmdeploy quantization description](https://github.com/InternLM/lmdeploy/blob/main/docs/en/quantization). Then use docker to independently deploy Hybrid LLM Service.
393
372
394
-
3. How to access other local LLM / After access, the effect is not ideal?
395
-
396
-
- Open [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py), add a new LLM inference implementation.
397
-
- Refer to [test_intention_prompt and test data](./tests/test_intention_prompt.py), adjust prompt and threshold for the new model, and update them into [prompt.py](./huixiangdou/service/prompt.py).
398
-
399
-
4. What if the response is too slow/request always fails?
400
-
401
-
- Refer to [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py) to add exponential backoff and retransmission.
402
-
- Replace local LLM with an inference framework such as [lmdeploy](https://github.com/internlm/lmdeploy), instead of the native huggingface/transformers.
403
-
404
-
5. What if the GPU memory is too low?
405
-
406
-
At this time, it is impossible to run local LLM, and only remote LLM can be used in conjunction with text2vec to execute the pipeline. Please make sure that `config.ini` only uses remote LLM and turn off local LLM.
407
-
408
-
409
373
# 🍀 Acknowledgements
410
374
411
375
-[KIMI](https://kimi.moonshot.cn/): Long text LLM, supports direct file upload
0 commit comments