Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization for loading big embedding models into GPU #1036

Open
icejean opened this issue Jan 26, 2025 · 0 comments
Open

Optimization for loading big embedding models into GPU #1036

icejean opened this issue Jan 26, 2025 · 0 comments
Assignees

Comments

@icejean
Copy link

icejean commented Jan 26, 2025

The backend codes in ~/src/shared/common_fn.py of load_embedding_model() will lead to 2 instance of embedding model for each worker, this will lead to big memory usage, for example, an all-MiniLM-L6-v2 instance will take about 90MB, for those big embedding models such as BAAI/bge-m3, which is about 3GB(the fp16 ollama version is about 1GB ), this is a big problem. So I load the embedding model with Ollama, just one instance running on GPU outside the backend container.
The IP 172.17.0.1 is mapped to host.docker.internal in backend container, for something I don't know, I can access Ollama through IP, but not hostname with proxy is set in container.

from langchain_ollama import OllamaEmbeddings

def load_embedding_model(embedding_model_name: str):
    if embedding_model_name == "openai":
        embeddings = OpenAIEmbeddings()
        dimension = 1536
        logging.info(f"Embedding: Using OpenAI Embeddings , Dimension:{dimension}")
    elif embedding_model_name == "vertexai":        
        embeddings = VertexAIEmbeddings(
            model="textembedding-gecko@003"
        )
        dimension = 768
        logging.info(f"Embedding: Using Vertex AI Embeddings , Dimension:{dimension}")

    # Added by Jean 2025/01/26
    elif embedding_model_name == "BAAI/bge-m3":
        embeddings = OllamaEmbeddings(model="bge-m3",base_url="http://172.17.0.1:11434")
        dimension = 1024
        logging.info(f"Embedding: Using Ollama BAAI/bge-m3 , Dimension:{dimension}")
        
    else:
        embeddings = HuggingFaceEmbeddings(
            model_name="all-MiniLM-L6-v2"#, cache_folder="/embedding_model"
        )
        dimension = 384
        logging.info(f"Embedding: Using Langchain HuggingFaceEmbeddings , Dimension:{dimension}")
    return embeddings, dimension

Need to add this two packages to backend's ~/requirrements.txt.

langchain-ollama==0.2.1
datasets==3.1.0

Best regards
Jean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants