Skip to content

Commit c02a195

Browse files
tpoisonoootpoisonooo
and
tpoisonooo
authored
refactor(llm.py): remove redundant llm call (#427)
* refactor(llm.py): remove redundant llm call * refactor(project): remove `--standalone` * feat(project): remove `--hybrid-llm` * feat(project): update * feat(project): update * test(project): gradio_ui, main, server passed * docs(project): update * docs(project): update --------- Co-authored-by: tpoisonooo <[email protected]>
1 parent 9f5dee6 commit c02a195

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+711
-1558
lines changed

.gitignore

+1-7
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,12 @@
11
workdir/
22
write_toml.py
33
modeling_internlm2.py
4-
config.ini
54
config-template.ini
65
logs/
76
logs/work.txt
87
server.log
98
**/__pycache__
109
badcase.txt
11-
config.bak
1210
config.ini
1311
resource/prompt.txt
1412
build/
@@ -25,19 +23,15 @@ nohup.out
2523
start-web.sh
2624
web/proxy/config-template.ini
2725
web/env.sh
28-
config-alignment.ini
2926
logs/work.txt
3027
web/tools/query.jsonl
3128
query.jsonl
3229
tests/history_recv_send.txt
3330
unittest/token.json
34-
config.test
3531
wkteam/
36-
config-wechat.ini
3732
web.log
3833
evaluation/rejection/gt_bad.txt
3934
evaluation/rejection/gt_good.txt
4035
bm25.pkl
4136
repodir/
42-
repodir.full/
43-
workdir.full/
37+
logs/work.txt

README.md

+47-83
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Advantages:
4444
1. Design three-stage pipelines of preprocess, rejection and response
4545
* `chat_in_group` copes with **group chat** scenario, answer user questions without message flooding, see [2401.08772](https://arxiv.org/abs/2401.08772), [2405.02817](https://arxiv.org/abs/2405.02817), [Hybrid Retrieval](./docs/en/doc_knowledge_graph.md) and [Precision Report](./evaluation/)
4646
* `chat_with_repo` for **real-time streaming** chat
47-
2. No training required, with CPU-only, 2G, 10G, 20G and 80G configuration
47+
2. No training required, with CPU-only, 2G, 10G configuration
4848
3. Offers a complete suite of Web, Android, and pipeline source code, industrial-grade and commercially viable
4949

5050
Check out the [scenes in which HuixiangDou are running](./huixiangdou-inside.md) and current public service status:
@@ -60,6 +60,7 @@ Our Web version has been released to [OpenXLab](https://openxlab.org.cn/apps/det
6060

6161
The Web version's API for Android also supports other devices. See [Python sample code](./tests/test_openxlab_android_api.py).
6262

63+
- \[2025/03\] Simplify deployment and removing `--standalone`
6364
- \[2025/03\] [Forwarding multiple wechat group message](./docs/zh/doc_merge_wechat_group.md)
6465
- \[2024/09\] [Inverted indexer](https://github.com/InternLM/HuixiangDou/pull/387) makes LLM prefer knowledge base🎯
6566
- \[2024/09\] [Code retrieval](./huixiangdou/service/parallel_pipeline.py)
@@ -107,11 +108,9 @@ The Web version's API for Android also supports other devices. See [Python sampl
107108
<tr valign="top">
108109
<td>
109110

110-
- [InternLM2/InternLM2.5](https://github.com/InternLM/InternLM)
111-
- [Qwen1.5~2.5](https://github.com/QwenLM/Qwen2)
112-
- [puyu](https://internlm.openxlab.org.cn/)
113-
- [StepFun](https://platform.stepfun.com)
111+
- [vLLM](https://github.com/vllm-project/vllm)
114112
- [KIMI](https://kimi.moonshot.cn)
113+
- [StepFun](https://platform.stepfun.com)
115114
- [DeepSeek](https://www.deepseek.com)
116115
- [GLM (ZHIPU)](https://www.zhipuai.cn)
117116
- [SiliconCloud](https://siliconflow.cn/zh-cn/siliconcloud)
@@ -170,10 +169,8 @@ The following are the GPU memory requirements for different features, the differ
170169
| Configuration Example | GPU mem Requirements | Description | Verified on Linux |
171170
| :----------------------------------------------: | :------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------: |
172171
| [config-cpu.ini](./config-cpu.ini) | - | Use [siliconcloud](https://siliconflow.cn/) API <br/> for text only | ![](https://img.shields.io/badge/x86-passed-blue?style=for-the-badge) |
173-
| [config-2G.ini](./config-2G.ini) | 2GB | Use openai API (such as [kimi](https://kimi.moonshot.cn), [deepseek](https://platform.deepseek.com/usage) and [stepfun](https://platform.stepfun.com/) to search for text only | ![](https://img.shields.io/badge/1660ti%206G-passed-blue?style=for-the-badge) |
172+
| \[Standard Edition\][config.ini](./config.ini) | 2GB | Use openai API (such as [kimi](https://kimi.moonshot.cn), [deepseek](https://platform.deepseek.com/usage) and [stepfun](https://platform.stepfun.com/) to search for text only | ![](https://img.shields.io/badge/1660ti%206G-passed-blue?style=for-the-badge) |
174173
| [config-multimodal.ini](./config-multimodal.ini) | 10GB | Use openai API for LLM, image and text retrieval | ![](https://img.shields.io/badge/3090%2024G-passed-blue?style=for-the-badge) |
175-
| \[Standard Edition\] [config.ini](./config.ini) | 19GB | Local deployment of LLM, single modality | ![](https://img.shields.io/badge/3090%2024G-passed-blue?style=for-the-badge) |
176-
| [config-advanced.ini](./config-advanced.ini) | 80GB | local LLM, anaphora resolution, single modality, practical for WeChat group | ![](https://img.shields.io/badge/A100%2080G-passed-blue?style=for-the-badge) |
177174

178175
# 🔥 Running the Standard Edition
179176

@@ -198,47 +195,66 @@ pip install -r requirements.txt
198195
# For python3.8, install faiss-gpu instead of faiss
199196
```
200197

201-
## II. Create knowledge base and ask questions
198+
## II. Create knowledge base
202199

203200
Use mmpose documents to build the mmpose knowledge base and filtering questions. If you have your own documents, just put them under `repodir`.
204201

205202
Copy and execute all the following commands (including the '#' symbol).
206203

207204
```shell
208-
# Download the knowledge base, we only take the documents of mmpose as an example. You can put any of your own documents under `repodir`
205+
# Download the knowledge base, we only take the some documents as example. You can put any of your own documents under `repodir`
209206
cd HuixiangDou
210207
mkdir repodir
211-
git clone https://github.com/open-mmlab/mmpose --depth=1 repodir/mmpose
208+
cp -rf resource/data* repodir/
212209

213210
# Save the features of repodir to workdir, and update the positive and negative example thresholds into `config.ini`
214211
mkdir workdir
212+
# build knowledge base
215213
python3 -m huixiangdou.service.feature_store
216214
```
217215

218-
After running, test with `python3 -m huixiangdou.main --standalone`. At this time, reply to mmpose related questions (related to the knowledge base), while not responding to weather questions.
216+
## III. Setup LLM API and test
217+
Set the model and `api-key` in `config.ini`. If running LLM locally, we recommend using `vllm`.
219218

220-
```bash
221-
python3 -m huixiangdou.main --standalone
219+
```text
220+
vllm serve /path/to/Qwen-2.5-7B-Instruct --enable-prefix-caching --served-model-name Qwen-2.5-7B-Instruct
221+
```
222+
223+
Here is an example of the configured `config.ini`:
224+
225+
```ini
226+
[llm.server]
227+
remote_type = "kimi"
228+
remote_api_key = "sk-dp3GriuhhLXnYo0KUuWbFUWWKOXXXXXXXXXX"
229+
230+
# remote_type = "step"
231+
# remote_api_key = "5CpPyYNPhQMkIzs5SYfcdbTHXq3a72H5XXXXXXXXXXXXX"
232+
233+
# remote_type = "deepseek"
234+
# remote_api_key = "sk-86db9a205aa9422XXXXXXXXXXXXXX"
235+
236+
# remote_type = "vllm"
237+
# remote_api_key = "EMPTY"
238+
# remote_llm_model = "Qwen2.5-7B-Instruct"
239+
```
240+
241+
Then run the test:
242+
243+
```text
244+
# Respond to questions related to the Hundred-Plant Garden (related to the knowledge base), but do not respond to weather questions.
245+
python3 -m huixiangdou.main
222246
223-
+---------------------------+---------+----------------------------+-----------------+
224-
| Query | State | Reply | References |
225-
+===========================+=========+============================+=================+
226-
| How to install mmpose? | success | To install mmpose, plea.. | installation.md |
247+
+-----------------------+---------+--------------------------------+-----------------+
248+
| Query | State | Reply | References |
249+
+=======================+=========+================================+=================+
250+
| What is in the Hundred-Plant Garden? | success | The Hundred-Plant Garden has a rich variety of natural landscapes and life... | installation.md |
227251
--------------------------------------------------------------------------------------
228-
| How is the weather today? | unrelated.. | .. | |
252+
| How is the weather today? | Init state| .. | |
229253
+-----------------------+---------+--------------------------------+-----------------+
230254
🔆 Input your question here, type `bye` for exit:
231255
..
232256
```
233257

234-
> \[!NOTE\]
235-
>
236-
> <div align="center">
237-
> If restarting LLM every time is too slow, first <b>python3 -m huixiangdou.service.llm_server_hybrid</b>; then open a new window, and each time only execute <b>python3 -m huixiangdou.main</b> without restarting LLM.
238-
> </div>
239-
240-
<br/>
241-
242258
💡 Also run a simple Web UI with `gradio`:
243259

244260
```bash
@@ -260,14 +276,14 @@ curl -X POST http://127.0.0.1:23333/huixiangdou_inference -H "Content-Type: app
260276

261277
Please update the `repodir` documents, [good_questions](./resource/good_questions.json) and [bad_questions](./resource/bad_questions.json), and try your own domain knowledge (medical, financial, power, etc.).
262278

263-
## III. Integration into Feishu, WeChat group
279+
## IV. Integration into Feishu, WeChat group
264280

265281
- [**One-way** sending to Feishu group](./docs/zh/doc_send_only_lark_group.md)
266282
- [**Two-way** Feishu group receiving and sending, recalling](./docs/zh/doc_add_lark_group.md)
267283
- [Personal WeChat Android access](./docs/zh/doc_add_wechat_accessibility.md)
268284
- [Personal WeChat wkteam access](./docs/zh/doc_add_wechat_commercial.md)
269285

270-
## IV. Deploy web front and back end
286+
## V. Deploy web front and backend
271287

272288
We provide `typescript` front-end and `python` back-end source code:
273289

@@ -295,42 +311,13 @@ python3 -m pip install -r requirements-cpu.txt
295311
# Establish knowledge base
296312
python3 -m huixiangdou.service.feature_store --config_path config-cpu.ini
297313
# Q&A test
298-
python3 -m huixiangdou.main --standalone --config_path config-cpu.ini
314+
python3 -m huixiangdou.main --config_path config-cpu.ini
299315
# gradio UI
300316
python3 -m huixiangdou.gradio_ui --config_path config-cpu.ini
301317
```
302318

303319
If you find the installation too slow, a pre-installed image is provided in [Docker Hub](https://hub.docker.com/repository/docker/tpoisonooo/huixiangdou/tags). Simply replace it when starting the docker.
304320

305-
## **2G Cost-effective Edition**
306-
307-
If your GPU mem exceeds 1.8G, or you pursue cost-effectiveness. This configuration discards the local LLM and uses remote LLM instead, which is the same as the standard edition.
308-
309-
Take `siliconcloud` as an example, fill in the API TOKEN applied from the [official website](https://siliconflow.cn/) into `config-2G.ini`
310-
311-
```toml
312-
# config-2G.ini
313-
[llm]
314-
enable_local = 0 # Turn off local LLM
315-
enable_remote = 1 # Only use remote
316-
..
317-
remote_type = "siliconcloud" # Choose siliconcloud
318-
remote_api_key = "YOUR-API-KEY-HERE" # Your API key
319-
remote_llm_model = "alibaba/Qwen1.5-110B-Chat"
320-
```
321-
322-
> \[!NOTE\]
323-
>
324-
> <div align="center">
325-
> Each Q&A scenario requires calling the LLM 7 times at worst, subject to the free user RPM limit, you can modify the <b>rpm</b> parameter in config.ini
326-
> </div>
327-
328-
Execute the following to get the Q&A results
329-
330-
```shell
331-
python3 -m huixiangdou.main --standalone --config-path config-2G.ini # Start all services at once
332-
```
333-
334321
## **10G Multimodal Edition**
335322

336323
If you have 10G GPU mem, you can further support image and text retrieval. Just modify the model used in config.ini.
@@ -354,15 +341,7 @@ Run gradio to test, see the image and text retrieval result [here](https://githu
354341
python3 tests/test_query_gradio.py
355342
```
356343

357-
## **80G Complete Edition**
358-
359-
The "HuiXiangDou" in the WeChat experience group has enabled all features:
360-
361-
- Serper search and SourceGraph search enhancement
362-
- Group chat images, WeChat public account parsing
363-
- Text coreference resolution
364-
- Hybrid LLM
365-
- Knowledge base is related to openmmlab's 12 repositories (1700 documents), refusing small talk
344+
## **Furthermore**
366345

367346
Please read the following topics:
368347

@@ -391,21 +370,6 @@ Contributors have provided [Android tools](./android) to interact with WeChat. T
391370

392371
LLM long text based on transformers structure requires more memory. At this time, kv cache quantization needs to be done on the model, such as [lmdeploy quantization description](https://github.com/InternLM/lmdeploy/blob/main/docs/en/quantization). Then use docker to independently deploy Hybrid LLM Service.
393372

394-
3. How to access other local LLM / After access, the effect is not ideal?
395-
396-
- Open [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py), add a new LLM inference implementation.
397-
- Refer to [test_intention_prompt and test data](./tests/test_intention_prompt.py), adjust prompt and threshold for the new model, and update them into [prompt.py](./huixiangdou/service/prompt.py).
398-
399-
4. What if the response is too slow/request always fails?
400-
401-
- Refer to [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py) to add exponential backoff and retransmission.
402-
- Replace local LLM with an inference framework such as [lmdeploy](https://github.com/internlm/lmdeploy), instead of the native huggingface/transformers.
403-
404-
5. What if the GPU memory is too low?
405-
406-
At this time, it is impossible to run local LLM, and only remote LLM can be used in conjunction with text2vec to execute the pipeline. Please make sure that `config.ini` only uses remote LLM and turn off local LLM.
407-
408-
409373
# 🍀 Acknowledgements
410374

411375
- [KIMI](https://kimi.moonshot.cn/): Long text LLM, supports direct file upload

0 commit comments

Comments
 (0)