HuixiangDou2 is a validated GraphRAG solution in the plant field. If you are interested in the effects of HuixiangDou in non-computer fields, try the new version.
English | 简体中文
HuixiangDou1 is a professional knowledge assistant based on LLM.
Advantages:
- Design three-stage pipelines of preprocess, rejection and response
chat_in_group
copes with group chat scenario, answer user questions without message flooding, see 2401.08772, 2405.02817, Hybrid Retrieval and Precision Reportchat_with_repo
for real-time streaming chat
- No training required, with CPU-only, 2G, 10G configuration
- Offers a complete suite of Web, Android, and pipeline source code, industrial-grade and commercially viable
Check out the scenes in which HuixiangDou are running and current public service status:
- readthedocs ChatWithAI (cpu-only) is available
- OpenXLab is using GPU and under continuous maintenance
- WeChat bot has a cost associated with WeChat integration. All code has been verified to be functional for one year. Please deploy it on your own for either the free or commercial version.
If this helps you, please give it a star ⭐
Our Web version has been released to OpenXLab, where you can create knowledge base, update positive and negative examples, turn on web search, test chat, and integrate into Feishu/WeChat groups. See BiliBili and YouTube !
The Web version's API for Android also supports other devices. See Python sample code.
- [2025/03] Simplify deployment and removing
--standalone
- [2025/03] Forwarding multiple wechat group message
- [2024/09] Inverted indexer makes LLM prefer knowledge base🎯
- [2024/09] Code retrieval
- [2024/08] chat_with_readthedocs, see how to integrate 👍
- [2024/07] Image and text retrieval & Removal of
langchain
👍 - [2024/07] Hybrid Knowledge Graph and Dense Retrieval improve 1.7% F1 score 🎯
- [2024/06] Evaluation of chunksize, splitter, and text2vec model 🎯
- [2024/05] wkteam WeChat access, parsing image & URL, support coreference resolution
- [2024/05] SFT LLM on NLP task, F1 increased by 29% 🎯
🤗 LoRA-Qwen1.5-14B LoRA-Qwen1.5-32B alpaca data arXiv - [2024/04] RAG Annotation SFT Q&A Data and Examples
- [2024/04] Release Web Front and Back End Service Source Code 👍
- [2024/03] New Personal WeChat Integration and Prebuilt APK !
- [2024/02] [Experimental Feature] WeChat Group Integration of multimodal to achieve OCR
LLM | File Format | Retrieval Method | Integration | Preprocessing |
|
|
|
The following are the GPU memory requirements for different features, the difference lies only in whether the options are turned on.
Configuration Example | GPU mem Requirements | Description | Verified on Linux |
---|---|---|---|
config-cpu.ini | - | Use siliconcloud API for text only |
|
[Standard Edition]config.ini | 2GB | Use openai API (such as kimi, deepseek and stepfun to search for text only | |
config-multimodal.ini | 10GB | Use openai API for LLM, image and text retrieval |
We take the standard edition (local running LLM, text retrieval) as an introduction example. Other versions are just different in configuration options.
Click to agree to the BCE model agreement, log in huggingface
huggingface-cli login
Install dependencies
# parsing `word` format requirements
apt update
apt install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig libpulse-dev
# python requirements
pip install -r requirements.txt
# For python3.8, install faiss-gpu instead of faiss
We use some novels to build knowledge base and filtering questions. If you have your own documents, just put them under repodir
.
Copy and execute all the following commands (including the '#' symbol).
# Download the knowledge base, we only take the some documents as example. You can put any of your own documents under `repodir`
cd HuixiangDou
mkdir repodir
cp -rf resource/data* repodir/
# Build knowledge base, this will save the features of repodir to workdir, and update the positive and negative example thresholds into `config.ini`
mkdir workdir
python3 -m huixiangdou.service.feature_store
Set the model and api-key
in config.ini
. If running LLM locally, we recommend using vllm
.
vllm serve /path/to/Qwen-2.5-7B-Instruct --enable-prefix-caching --served-model-name Qwen-2.5-7B-Instruct
Here is an example of the configured config.ini
:
[llm.server]
remote_type = "kimi"
remote_api_key = "sk-dp3GriuhhLXnYo0KUuWbFUWWKOXXXXXXXXXX"
# remote_type = "step"
# remote_api_key = "5CpPyYNPhQMkIzs5SYfcdbTHXq3a72H5XXXXXXXXXXXXX"
# remote_type = "deepseek"
# remote_api_key = "sk-86db9a205aa9422XXXXXXXXXXXXXX"
# remote_type = "vllm"
# remote_api_key = "EMPTY"
# remote_llm_model = "Qwen2.5-7B-Instruct"
Then run the test:
# Respond to questions related to the Hundred-Plant Garden (related to the knowledge base), but do not respond to weather questions.
python3 -m huixiangdou.main
+-----------------------+---------+--------------------------------+-----------------+
| Query | State | Reply | References |
+=======================+=========+================================+=================+
| What is in the Hundred-Plant Garden? | success | The Hundred-Plant Garden has a rich variety of natural landscapes and life... | installation.md |
--------------------------------------------------------------------------------------
| How is the weather today? | Init state| .. | |
+-----------------------+---------+--------------------------------+-----------------+
🔆 Input your question here, type `bye` for exit:
..
💡 Also run a simple Web UI with gradio
:
python3 -m huixiangdou.gradio_ui
output.mp4
Or run a server to listen 23333, default pipeline is chat_with_repo
:
python3 -m huixiangdou.server
# test async API
curl -X POST http://127.0.0.1:23333/huixiangdou_stream -H "Content-Type: application/json" -d '{"text": "how to install mmpose","image": ""}'
# cURL sync API
curl -X POST http://127.0.0.1:23333/huixiangdou_inference -H "Content-Type: application/json" -d '{"text": "how to install mmpose","image": ""}'
Please update the repodir
documents, good_questions and bad_questions, and try your own domain knowledge (medical, financial, power, etc.).
- One-way sending to Feishu group
- Two-way Feishu group receiving and sending, recalling
- Personal WeChat Android access and Android tool
- Personal WeChat wkteam access
We provide typescript
front-end and python
back-end source code:
- Multi-tenant management supported
- Zero programming access to Feishu and WeChat
- k8s friendly
Same as OpenXlab APP, please read the web deployment document.
Try right-bottom button on the page and document.
If there is no GPU available, model inference can be completed using the siliconcloud API.
Taking docker miniconda+Python3.11 as an example, install CPU dependencies and run:
# Start container
docker run -v /path/to/huixiangdou:/huixiangdou -p 7860:7860 -p 23333:23333 -it continuumio/miniconda3 /bin/bash
# Install dependencies
apt update
apt install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig libpulse-dev
python3 -m pip install -r requirements-cpu.txt
# Establish knowledge base
python3 -m huixiangdou.service.feature_store --config_path config-cpu.ini
# Q&A test
python3 -m huixiangdou.main --config_path config-cpu.ini
# gradio UI
python3 -m huixiangdou.gradio_ui --config_path config-cpu.ini
If you find the installation too slow, a pre-installed image is provided in Docker Hub. Simply replace it when starting the docker.
If you have 10G GPU mem, you can further support image and text retrieval. Just modify the model used in config.ini.
# config-multimodal.ini
# !!! Download `https://huggingface.co/BAAI/bge-visualized/blob/main/Visualized_m3.pth` to `bge-m3` folder !!!
embedding_model_path = "BAAI/bge-m3"
reranker_model_path = "BAAI/bge-reranker-v2-minicpm-layerwise"
Note:
- You need to manually download Visualized_m3.pth to the bge-m3 directory
- Install FlagEmbedding on main branch, we have made bugfix. Here you can download
bpe_simple_vocab_16e6.txt.gz
- Install requirements/multimodal.txt
Run gradio to test, see the image and text retrieval result here.
python3 tests/test_query_gradio.py
Please read the following topics:
- Hybrid knowledge graph and dense retrieval
- Refer to config-advanced.ini configuration to improve effects
- Group chat scenario anaphora resolution training
- Use wkteam WeChat access, integrate images, public account parsing, and anaphora resolution
- Use rag.py to annotate SFT training data
-
What if the robot is too cold/too chatty?
- Fill in the questions that should be answered in the real scenario into
resource/good_questions.json
, and fill the ones that should be rejected intoresource/bad_questions.json
. - Adjust the theme content in
repodir
to ensure that the markdown documents in the main library do not contain irrelevant content.
Re-run
feature_store
to update thresholds and feature libraries.⚠️ You can directly modifyreject_throttle
in config.ini. Generally speaking, 0.5 is a high value; 0.2 is too low. - Fill in the questions that should be answered in the real scenario into
-
Launch is normal, but out of memory during runtime?
LLM long text based on transformers structure requires more memory. At this time, kv cache quantization needs to be done on the model, such as lmdeploy quantization description. Then use docker to independently deploy Hybrid LLM Service.
- KIMI: Long text LLM, supports direct file upload
- FlagEmbedding: BAAI RAG group
- BCEmbedding: Chinese-English bilingual feature model
- Langchain-ChatChat: Application of Langchain and ChatGLM
- GrabRedEnvelope: WeChat red packet grab
@misc{kong2024huixiangdou,
title={HuiXiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance},
author={Huanjun Kong and Songyang Zhang and Jiaying Li and Min Xiao and Jun Xu and Kai Chen},
year={2024},
eprint={2401.08772},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{kong2024labelingsupervisedfinetuningdata,
title={Labeling supervised fine-tuning data with the scaling law},
author={Huanjun Kong},
year={2024},
eprint={2405.02817},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2405.02817},
}
@misc{kong2025huixiangdou2robustlyoptimizedgraphrag,
title={HuixiangDou2: A Robustly Optimized GraphRAG Approach},
author={Huanjun Kong and Zhefan Wang and Chenyang Wang and Zhe Ma and Nanqing Dong},
year={2025},
eprint={2503.06474},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2503.06474},
}