intel
diff --git a/‎python/llm/example/CPU/LangChain/README.md
+96-45 b/‎python/llm/example/CPU/LangChain/README.md
+96-45
diff --git a/‎python/llm/example/CPU/LangChain/chat.py
+14-13 b/‎python/llm/example/CPU/LangChain/chat.py
+14-13
diff --git a/‎python/llm/example/CPU/LangChain/llm_math.py
+10-3 b/‎python/llm/example/CPU/LangChain/llm_math.py
+10-3
diff --git a/‎python/llm/example/CPU/LangChain/low_bit.py
+22-9 b/‎python/llm/example/CPU/LangChain/low_bit.py
+22-9
@@ -1,90 +1,141 @@
-## Langchain Examples
+# LangChain Example
 
-This folder contains examples showcasing how to use `langchain` with `ipex-llm`. 
+The examples in this folder shows how to use [LangChain](https://www.langchain.com/) with `ipex-llm` on Intel CPU.
 
-### Install-IPEX LLM
+> [!NOTE]
+> Please refer [here](https://python.langchain.com/docs/integrations/llms/ipex_llm) for upstream LangChain LLM documentation with ipex-llm and [here](https://python.langchain.com/docs/integrations/text_embedding/ipex_llm/) for upstream LangChain embedding documentation with ipex-llm.
 
-Ensure `ipex-llm` is installed by following the [IPEX-LLM Installation Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_cpu.html). 
+## 0. Requirements
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
-### Install Dependences Required by the Examples
+## 1. Install
 
+We suggest using conda to manage environment:
+
+On Linux:
 
 ```bash
-pip install langchain==0.0.184
-pip install -U chromadb==0.3.25
-pip install -U pandas==2.0.3
+conda create -n llm python=3.11
+conda activate llm
+
+# install ipex-llm with 'all' option
+pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
 ```
 
+On Windows:
+```cmd
+onda create -n llm python=3.11
+conda activate llm
+
+pip install --pre --upgrade ipex-llm[all]
+```
 
-### Example: Chat
+## 2. Run examples with LangChain
 
-The chat example ([chat.py](./chat.py)) shows how to use `LLMChain` to build a chat pipeline. 
+### 2.1. Example: Streaming Chat
 
-To run the example, execute the following command in the current directory:
+Install LangChain dependencies:
 
 ```bash
-python chat.py -m <path_to_model> [-q <your_question>]
+pip install -U langchain langchain-community
 ```
-> Note: if `-q` is not specified, it will use `What is AI` by default. 
-
-### Example: RAG (Retrival Augmented Generation) 
-
-The RAG example ([rag.py](./rag.py)) shows how to load the input text into vector database,  and then use `load_qa_chain` to build a retrival pipeline.
 
-To run the example, execute the following command in the current directory:
+In the current directory, run the example with command:
 
 ```bash
-python rag.py -m <path_to_model> [-q <your_question>] [-i <path_to_input_txt>]
+python chat.py -m MODEL_PATH -q QUESTION
 ```
-> Note: If `-i` is not specified, it will use a short introduction to Big-DL as input by default. if `-q` is not specified, `What is IPEX LLM?` will be used by default. 
+**Additional Parameters for Configuration:**
+- `-m MODEL_PATH`: **required**, path to the model
+- `-q QUESTION`: question to ask. Default is `What is AI?`.
 
+### 2.2. Example: Retrival Augmented Generation (RAG)
 
-### Example: Math
+The RAG example ([rag.py](./rag.py)) shows how to load the input text into vector database, and then use LangChain to build a retrival pipeline.
 
-The math example ([math.py](./llm_math.py)) shows how to build a chat pipeline specialized in solving math questions. For example, you can ask `What is 13 raised to the .3432 power?`
+Install LangChain dependencies:
 
-To run the exmaple, execute the following command in the current directory:
+```bash
+pip install -U langchain langchain-community langchain-chroma sentence-transformers==3.0.1
+```
+
+In the current directory, run the example with command:
 
 ```bash
-python llm_math.py -m <path_to_model> [-q <your_question>]
+python rag.py -m <path_to_llm_model> -e <path_to_embedding_model> [-q QUESTION] [-i INPUT_PATH]
 ```
-> Note: if `-q` is not specified, it will use `What is 13 raised to the .3432 power?` by default. 
+**Additional Parameters for Configuration:**
+- `-m LLM_MODEL_PATH`: **required**, path to the model.
+- `-e EMBEDDING_MODEL_PATH`: **required**, path to the embedding model.
+- `-q QUESTION`: question to ask. Default is `What is IPEX-LLM?`.
+- `-i INPUT_PATH`: path to the input doc.
 
 
-### Example: Voice Assistant
+### 2.3. Example: Low Bit
 
-The voice assistant example ([voiceassistant.py](./voiceassistant.py)) showcases how to use langchain to build a pipeline that takes in your speech as input in realtime, use an ASR model (e.g. [Whisper-Medium](https://huggingface.co/openai/whisper-medium)) to turn speech into text, and then feed the text into large language model to get response.  
+The low_bit example ([low_bit.py](./low_bit.py)) showcases how to use use LangChain with low_bit optimized model.
+By `save_low_bit` we save the weights of low_bit model into the target folder.
+> [!NOTE]
+> `save_low_bit` only saves the weights of the model. 
+> Users could copy the tokenizer model into the target folder or specify `tokenizer_id` during initialization. 
 
-To run the exmaple, execute the following command in the current directory:
+Install LangChain dependencies:
 
 ```bash
-python voiceassistant.py -m <path_to_model> [-q <your_question>]
+pip install -U langchain langchain-community
 ```
-**Runtime Arguments Explained**:
-- `-m MODEL_PATH`: **Required**, the path to the 
-- `-r RECOGNITION_MODEL_PATH`: **Required**,  the path to the huggingface speech recognition model
-- `-x MAX_NEW_TOKENS`: the max new tokens of model tokens input
-- `-l LANGUAGE`: you can specify a language such as "english" or "chinese" 
-- `-d True|False`: whether the model path specified in -m is saved low bit model.
-
 
-### Example: Low Bit
+In the current directory, run the example with command:
 
-The low_bit example ([low_bit.py](./low_bit.py)) showcases how to use use langchain with low_bit optimized model.
-By `save_low_bit` we save the weights of low_bit model into the target folder.
-> Note: `save_low_bit` only saves the weights of the model. 
-> Users could copy the tokenizer model into the target folder or specify `tokenizer_id` during initialization. 
 ```bash
 python low_bit.py -m <path_to_model> -t <path_to_target> [-q <your question>]
 ```
-**Runtime Arguments Explained**:
+**Additional Parameters for Configuration:**
 - `-m MODEL_PATH`: **Required**, the path to the model
 - `-t TARGET_PATH`: **Required**, the path to save the low_bit model
-- `-q QUESTION`: the question
+- `-q QUESTION`: question to ask. Default is `What is AI?`.
+
+### 2.4. Example: Math
 
+The math example ([math.py](./llm_math.py)) shows how to build a chat pipeline specialized in solving math questions. For example, you can ask `What is 13 raised to the .3432 power?`
+
+Install LangChain dependencies:
+
+```bash
+pip install -U langchain langchain-community
+```
+
+In the current directory, run the example with command:
+
+```bash
+python llm_math.py -m <path_to_model> [-q <your_question>]
+```
+
+**Additional Parameters for Configuration:**
+- `-m MODEL_PATH`: **Required**, the path to the model
+- `-q QUESTION`: question to ask. Default is `What is 13 raised to the .3432 power?`.
 
+> [!NOTE]
+> If `-q` is not specified, it will use `What is 13 raised to the .3432 power?` by default. 
 
-### Legacy (Native INT4 examples)
+### 2.5. Example: Voice Assistant
 
-IPEX-LLM also provides langchain integrations using native INT4 mode. Those examples can be foud in [native_int4](./native_int4/) folder. For detailed instructions of settting up and running `native_int4` examples, refer to [Native INT4 Examples README](./README_nativeint4.md). 
+The voice assistant example ([voiceassistant.py](./voiceassistant.py)) showcases how to use LangChain to build a pipeline that takes in your speech as input in realtime, use an ASR model (e.g. [Whisper-Medium](https://huggingface.co/openai/whisper-medium)) to turn speech into text, and then feed the text into large language model to get response.  
 
+Install LangChain dependencies:
+```bash
+pip install -U langchain langchain-community
+pip install transformers==4.36.2
+```
+
+To run the exmaple, execute the following command in the current directory:
+
+```bash
+python voiceassistant.py -m <path_to_model> -r <path_to_recognition_model> [-q <your_question>]
+```
+**Additional Parameters for Configuration:**
+- `-m MODEL_PATH`: **Required**, the path to the 
+- `-r RECOGNITION_MODEL_PATH`: **Required**,  the path to the huggingface speech recognition model
+- `-x MAX_NEW_TOKENS`: the max new tokens of model tokens input
+- `-l LANGUAGE`: you can specify a language such as "english" or "chinese" 
+- `-d True|False`: whether the model path specified in -m is saved low bit model.
@@ -20,10 +20,13 @@
 # only search the first bigdl package and end up finding only one sub-package.
 
 import argparse
+import warnings
 
-from ipex_llm.langchain.llms import TransformersLLM, TransformersPipelineLLM
-from langchain import PromptTemplate, LLMChain
-from langchain import HuggingFacePipeline
+from langchain.chains import LLMChain
+from langchain_community.llms import IpexLLM
+from langchain_core.prompts import PromptTemplate
+
+warnings.filterwarnings("ignore", category=UserWarning, message=".*padding_mask.*")
 
 
 def main(args):
@@ -38,20 +41,18 @@ def main(args):
 
     prompt = PromptTemplate(template=template, input_variables=["question"])
 
-    # llm = TransformersPipelineLLM.from_model_id(
-    #     model_id=model_path,
-    #     task="text-generation",
-    #     model_kwargs={"temperature": 0, "max_length": 64, "trust_remote_code": True},
-    # )
-
-    llm = TransformersLLM.from_model_id(
+    llm = IpexLLM.from_model_id(
         model_id=model_path,
-        model_kwargs={"temperature": 0, "max_length": 64, "trust_remote_code": True},
+        model_kwargs={
+            "temperature": 0,
+            "max_length": 64,
+            "trust_remote_code": True,
+        },
     )
 
-    llm_chain = LLMChain(prompt=prompt, llm=llm)
+    llm_chain = prompt | llm
 
-    output = llm_chain.run(question)
+    output = llm_chain.invoke(question)
     print("====output=====")
     print(output)
 
 
@@ -23,19 +23,26 @@
 # Code is adapted from https://python.langchain.com/docs/modules/chains/additional/llm_math
 
 import argparse
+import warnings
 
 from langchain.chains import LLMMathChain
-from ipex_llm.langchain.llms import TransformersLLM, TransformersPipelineLLM
+from langchain_community.llms import IpexLLM
+
+warnings.filterwarnings("ignore", category=UserWarning, message=".*padding_mask.*")
 
 
 def main(args):
 
     question = args.question
     model_path = args.model_path
 
-    llm = TransformersLLM.from_model_id(
+    llm = IpexLLM.from_model_id(
         model_id=model_path,
-        model_kwargs={"temperature": 0, "max_length": 1024, "trust_remote_code": True},
+        model_kwargs={
+            "temperature": 0,
+            "max_length": 1024,
+            "trust_remote_code": True,
+        },
     )
 
     llm_math = LLMMathChain.from_llm(llm, verbose=True)
 
@@ -16,9 +16,13 @@
 
 
 import argparse
-from ipex_llm.langchain.llms import TransformersLLM, TransformersPipelineLLM
-from langchain import PromptTemplate, LLMChain
-from langchain import HuggingFacePipeline
+import warnings
+
+from langchain.chains import LLMChain
+from langchain_community.llms import IpexLLM
+from langchain_core.prompts import PromptTemplate
+
+warnings.filterwarnings("ignore", category=UserWarning, message=".*padding_mask.*")
 
 
 def main(args):
@@ -29,20 +33,29 @@ def main(args):
 
     prompt = PromptTemplate(template=template, input_variables=["question"])
 
-    llm = TransformersLLM.from_model_id(
+    llm = IpexLLM.from_model_id(
         model_id=model_path,
-        model_kwargs={"temperature": 0, "max_length": 64, "trust_remote_code": True},
+        model_kwargs={
+            "temperature": 0,
+            "max_length": 64,
+            "trust_remote_code": True,
+        },
     )
     llm.model.save_low_bit(low_bit_model_path)
     del llm
-    low_bit_llm = TransformersLLM.from_model_id_low_bit(
+    llm_lowbit = IpexLLM.from_model_id_low_bit(
         model_id=low_bit_model_path,
         tokenizer_id=model_path,
-        model_kwargs={"temperature": 0, "max_length": 64, "trust_remote_code": True}
+        # tokenizer_name=saved_lowbit_model_path,  # copy the tokenizers to saved path if you want to use it this way
+        model_kwargs={
+            "temperature": 0,
+            "max_length": 64,
+            "trust_remote_code": True,
+        },
     )
-    llm_chain = LLMChain(prompt=prompt, llm=low_bit_llm)
+    llm_chain = prompt | llm_lowbit
 
-    output = llm_chain.run(question)
+    output = llm_chain.invoke(question)
     print("====output=====")
     print(output)