-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Open
Description
When using SentenceTransformer.encode
with a Column object from a Hugging Face datasets dataset (e.g., data["train"]["text"]
), a TypeError occurs because the method attempts to index the Column object with numpy.int64
indices, which are not supported by datasets. This happens during the sentence sorting step in encode. Converting the Column to a Python list (e.g., list(data["train"]["text"])
) resolves the issue, but native support for datasets Column objects would improve compatibility.
Metadata
Metadata
Assignees
Labels
No labels