You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There were no errors when building the knowledge base, but there were errors when querying. I use Qwen2.5-7BInstruct-GPTQ-Int4 as the large language model and bge-large-zh-v1.5 as the vector model. Use PDF file as input.
please help me ! thank you !!
##The code is as follows:
import os
import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.llm import openai_complete_if_cache, openai_embedding
from lightrag.utils import EmbeddingFunc
import numpy as np
from lightrag.llm import hf_embedding
from transformers import AutoModel, AutoTokenizer
WORKING_DIR = "./dickens/"
if not os.path.exists(WORKING_DIR):
os.mkdir(WORKING_DIR)
async def llm_model_func(
prompt, system_prompt=None, history_messages=[], **kwargs
) -> str:
return await openai_complete_if_cache(
"Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4",
prompt,
system_prompt=system_prompt,
history_messages=history_messages,
api_key=os.getenv("EMPTY"),
base_url="http://0.0.0.0:8000/v1",
**kwargs,
)
async def main():
try:
rag = LightRAG(
working_dir=WORKING_DIR,
llm_model_func=llm_model_func,
embedding_func=EmbeddingFunc(
embedding_dim=1024,
max_token_size=8192,
func=lambda texts: hf_embedding(
texts,
tokenizer=AutoTokenizer.from_pretrained(
"bge-large-zh-v1.5", model_max_length=512
),
embed_model=AutoModel.from_pretrained(
"bge-large-zh-v1.5"
),
),
),
)
import textract
file_path = '哈利波特第一章和第二章.pdf'
text_content = textract.process(file_path)
await rag.ainsert(text_content.decode('utf-8'))
while(True):
string = input()
print(
await rag.aquery(
string,
param=QueryParam(mode="hybrid"),
)
)
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == "__main__":
asyncio.run(main())
##error:
INFO:lightrag:Inserting 123 vectors to entities
We strongly recommend passing in an `attention_mask` since your input_ids may be padded. See https://huggingface.co/docs/transformers/troubleshooting#incorrect-output-when-padding-tokens-arent-masked.
You may ignore this warning if your `pad_token_id` (0) is identical to the `bos_token_id` (0), `eos_token_id` (2), or the `sep_token_id` (None), and your input is not padded.
INFO:lightrag:Inserting 118 vectors to relationships
INFO:lightrag:Writing graph with 126 nodes, 118 edges
你好
INFO:httpx:HTTP Request: POST http://0.0.0.0:8000/v1/chat/completions "HTTP/1.1 200 OK"
INFO:lightrag:Global query uses 59 entites, 60 relations, 3 text units
/usr/local/lib/python3.10/site-packages/lightrag/operate.py:1016: UserWarning: Low Level context is None. Return empty Low entity/relationship/source
warnings.warn(
INFO:httpx:HTTP Request: POST http://0.0.0.0:8000/v1/chat/completions "HTTP/1.1 400 Bad Request"
An error occurred: Error code: 400 - {'object': 'error', 'message': 'could not broadcast input array from shape (535,) into shape (512,)', 'type': 'BadRequestError', 'param': None, 'code': 400}
The text was updated successfully, but these errors were encountered:
There were no errors when building the knowledge base, but there were errors when querying. I use Qwen2.5-7BInstruct-GPTQ-Int4 as the large language model and bge-large-zh-v1.5 as the vector model. Use PDF file as input.
please help me ! thank you !!
##The code is as follows:
The text was updated successfully, but these errors were encountered: