Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: Dependencies for InstructorEmbedding not found. #30516

Open
5 tasks done
AryanKarumuri opened this issue Mar 27, 2025 · 2 comments
Open
5 tasks done

ImportError: Dependencies for InstructorEmbedding not found. #30516

AryanKarumuri opened this issue Mar 27, 2025 · 2 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@AryanKarumuri
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

I am encountering the following error when trying to run the code using the HuggingFaceInstructEmbeddings from the langchain_community.embeddings module:

ImportError: Dependencies for InstructorEmbedding not found.

Code:

from langchain_community.embeddings import HuggingFaceInstructEmbeddings
# Pre-trained Embedding Model.
EMBEDDING_MODEL_NAME = "hkunlp/instructor-large"

#Embeddings
embeddings = HuggingFaceInstructEmbeddings(
            model_name=EMBEDDING_MODEL_NAME,
            embed_instruction="Represent the document for retrieval:",
            query_instruction="Represent the question for retrieving supporting documents:"
        )

Installed Dependency Versions:

langchain==0.3.21
langchain-core==0.3.49
langchain-community==0.3.20
sentence-transformers==2.2.2
InstructorEmbedding==1.0.1

Could you please assist in resolving this issue? If there are any compatibility updates or changes needed in the dependencies or usage, it would be great to have more insight.

Error Message and Stack Trace (if applicable)

No response

Description

Expected Behavior:
The HuggingFaceInstructEmbeddings object should initialize successfully, and the embedding model should be loaded for document and query retrieval.

System Info

System Information

OS: Linux
OS Version: #138-Ubuntu SMP Sat Nov 30 22:28:23 UTC 2024
Python Version: 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0]

Package Information

langchain_core: 0.3.49
langchain: 0.3.21
langchain_community: 0.3.20
langsmith: 0.3.19
langchain_chroma: 0.2.2
langchain_groq: 0.3.1
langchain_huggingface: 0.1.2
langchain_text_splitters: 0.3.7

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Mar 27, 2025
@DavidSanSan110
Copy link
Contributor

LangChain uses uv as its package manager, ensuring correct dependency resolution. Installing InstructorEmbedding with pip may not work correctly, but using uv does.

Solution:

Instead of pip, install with:

uv add InstructorEmbedding

or

uv pip install InstructorEmbedding

For me, this worked fine, and the issue was resolved. Try this method if you're facing the same problem! 🚀

@sanju49b
Copy link

HuggingFaceInstructEmbeddings class from the langchain_community library. The specific issue was a compatibility problem between:

The langchain_community embeddings implementation
The underlying sentence-transformers library
The INSTRUCTOR model

The error suggested that there was an unexpected 'token' argument being passed during the model loading process, which was causing the initialization to fail. This typically happens due to:

Version mismatches between libraries
Changes in library initialization methods
Incompatible configuration parameters

This code Could Bridge all the Gaps That are Present

from langchain_community.embeddings import HuggingFaceInstructEmbeddings
import traceback

def create_embeddings(
model_name: str = "hkunlp/instructor-large",
embed_instruction: str = "Represent the document for retrieval:",
query_instruction: str = "Represent the question for retrieving supporting documents:"
):
"""
Initialize HuggingFace Instruct Embeddings with robust error handling.

Args:
    model_name (str): Name of the embedding model
    embed_instruction (str): Instruction for document embedding
    query_instruction (str): Instruction for query embedding

Returns:
    HuggingFaceInstructEmbeddings: Initialized embedding model
"""
try:
    # Directly import SentenceTransformer to handle initialization
    from sentence_transformers import SentenceTransformer
    
    # Load the model manually
    model = SentenceTransformer(model_name)
    
    # Create a custom embeddings class
    class CustomInstructEmbeddings:
        def __init__(self, model, embed_instruction, query_instruction):
            self.model = model
            self.embed_instruction = embed_instruction
            self.query_instruction = query_instruction
        
        def embed_documents(self, texts):
            """Embed documents with instruction"""
            instructed_texts = [[self.embed_instruction, text] for text in texts]
            return self.model.encode(instructed_texts, normalize_embeddings=True).tolist()
        
        def embed_query(self, text):
            """Embed query with instruction"""
            instructed_query = [self.query_instruction, text]
            return self.model.encode(instructed_query, normalize_embeddings=True).tolist()
    
    # Create custom embeddings instance
    embeddings = CustomInstructEmbeddings(
        model, 
        embed_instruction, 
        query_instruction
    )
    
    # Verify embeddings by embedding a test document
    test_text = "This is a test document to verify embedding initialization"
    test_embedding = embeddings.embed_documents([test_text])
    
    print("✅ HuggingFaceInstructEmbeddings initialized successfully!")
    print(f"Embedding dimensions: {len(test_embedding[0])}")
    
    return embeddings

except Exception as e:
    print(f"❌ Embedding initialization failed: {e}")
    print(traceback.format_exc())
    raise RuntimeError("Failed to initialize embeddings")

Example usage

def main():
try:
# Pre-trained Embedding Model
EMBEDDING_MODEL_NAME = "hkunlp/instructor-large"

    # Initialize embeddings
    embeddings = create_embeddings(
        model_name=EMBEDDING_MODEL_NAME,
        embed_instruction="Represent the document for retrieval:",
        query_instruction="Represent the question for retrieving supporting documents:"
    )
    
    # Demonstration of embedding methods
    test_documents = [
        "The quick brown fox jumps over the lazy dog",
        "Machine learning is a subset of artificial intelligence"
    ]
    
    # Embed multiple documents
    document_embeddings = embeddings.embed_documents(test_documents)
    print("\nDocument Embeddings:")
    for i, emb in enumerate(document_embeddings):
        print(f"Document {i+1} embedding length: {len(emb)}")
    
    # Embed a query
    query_embedding = embeddings.embed_query("What is machine learning?")
    print(f"\nQuery Embedding length: {len(query_embedding)}")

except Exception as e:
    print(f"Error in main execution: {e}")
    print(traceback.format_exc())

if name == "main":
main()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants