You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The backend codes in ~/src/shared/common_fn.py of load_embedding_model() will lead to 2 instance of embedding model for each worker, this will lead to big memory usage, for example, an all-MiniLM-L6-v2 instance will take about 90MB, for those big embedding models such as BAAI/bge-m3, which is about 3GB(the fp16 ollama version is about 1GB ), this is a big problem. So I load the embedding model with Ollama, just one instance running on GPU outside the backend container.
The IP 172.17.0.1 is mapped to host.docker.internal in backend container, for something I don't know, I can access Ollama through IP, but not hostname with proxy is set in container.
from langchain_ollama import OllamaEmbeddings
def load_embedding_model(embedding_model_name: str):
if embedding_model_name == "openai":
embeddings = OpenAIEmbeddings()
dimension = 1536
logging.info(f"Embedding: Using OpenAI Embeddings , Dimension:{dimension}")
elif embedding_model_name == "vertexai":
embeddings = VertexAIEmbeddings(
model="textembedding-gecko@003"
)
dimension = 768
logging.info(f"Embedding: Using Vertex AI Embeddings , Dimension:{dimension}")
# Added by Jean 2025/01/26
elif embedding_model_name == "BAAI/bge-m3":
embeddings = OllamaEmbeddings(model="bge-m3",base_url="http://172.17.0.1:11434")
dimension = 1024
logging.info(f"Embedding: Using Ollama BAAI/bge-m3 , Dimension:{dimension}")
else:
embeddings = HuggingFaceEmbeddings(
model_name="all-MiniLM-L6-v2"#, cache_folder="/embedding_model"
)
dimension = 384
logging.info(f"Embedding: Using Langchain HuggingFaceEmbeddings , Dimension:{dimension}")
return embeddings, dimension
Need to add this two packages to backend's ~/requirrements.txt.
langchain-ollama==0.2.1
datasets==3.1.0
Best regards
Jean
The text was updated successfully, but these errors were encountered:
The backend codes in ~/src/shared/common_fn.py of load_embedding_model() will lead to 2 instance of embedding model for each worker, this will lead to big memory usage, for example, an all-MiniLM-L6-v2 instance will take about 90MB, for those big embedding models such as BAAI/bge-m3, which is about 3GB(the fp16 ollama version is about 1GB ), this is a big problem. So I load the embedding model with Ollama, just one instance running on GPU outside the backend container.
The IP 172.17.0.1 is mapped to host.docker.internal in backend container, for something I don't know, I can access Ollama through IP, but not hostname with proxy is set in container.
Need to add this two packages to backend's ~/requirrements.txt.
Best regards
Jean
The text was updated successfully, but these errors were encountered: