Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChromaDB/Docs: similarity_search_* filter type hints are incorrect and API docs are incorrect #30507

Open
5 tasks done
hesreallyhim opened this issue Mar 27, 2025 · 0 comments
Labels
🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder

Comments

@hesreallyhim
Copy link
Contributor

hesreallyhim commented Mar 27, 2025

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document

chroma = Chroma(embedding_function=OpenAIEmbeddings())
docs = [
    Document(page_content="Hello world", metadata={"author": "john", "topic": "chroma"}),
    Document(page_content="Hello world 2", metadata={"author": "jack", "topic": "chroma"}),
]
chroma.add_documents(docs)
# THIS FILTER DOES NOT RAISE TYPE WARNING: `{"a": "b", "c": "d"}` is shorthand for `{"$and": [{"a": {"$eq": "b"}}, {"c": {"$eq": d"}}]}`
results1 = chroma.similarity_search(
    "Hello world",
    k=4,
    filter={"author": "john", "topic": "chroma"},
)
# THIS FILTER RAISES TYPE WARNING:
results2 = chroma.similarity_search(
    "Hello world",
    k=4,
    # THIS CONDITION IS FROM [CHROMA DOCS](https://github.com/chroma-core/chroma/blob/main/examples/basic_functionality/where_filtering.ipynb)
    filter={"$and": [{"category": "chroma"}, {"$or": [{"author": "john"}, {"author": "jack"}]}]}
)

Error Message and Stack Trace (if applicable)

Pylance error/warning:

Argument of type "dict[str, list[dict[str, str] | dict[str, list[dict[str, str]]]]]" cannot be assigned to parameter "filter" of type "Dict[str, str] | None" in function "similarity_search"
  Type "dict[str, list[dict[str, str] | dict[str, list[dict[str, str]]]]]" is not assignable to type "Dict[str, str] | None"
    "dict[str, list[dict[str, str] | dict[str, list[dict[str, str]]]]]" is not assignable to "Dict[str, str]"
      Type parameter "_VT@dict" is invariant, but "list[dict[str, str] | dict[str, list[dict[str, str]]]]" is not the same as "str"
      Consider switching from "dict" to "Mapping" which is covariant in the value type
    "dict[str, list[dict[str, str] | dict[str, list[dict[str, str]]]]]" is not assignable to "None"PylancereportArgumentType
Dict entry 0 has incompatible type "str": "list[object]"; expected "str": "str"Mypydict-item

Description

The type hints for the chroma search methods (filter, which equates to chroma's where (metadata search), and where_document (document search)) do not match the corresponding chroma query function signatures. Dict[str, str] is actually a special "short-hand" case for the general syntax, which is:

{
    "metadata_field": {
        <Operator>: <Value>
    }
}

e.g.,

{
    "category": {
        "$eq": "LLMs"
    }
}
  • This affects almost all of the search_* methods in the Chroma module.

Another issue

  • Although the annotation for where_document was actually updated with an example of an operator-style condition (e.g., here, it's not well-formed because the operator is missing quotation marks (should be "$contains"):

where_document
dict used to filter by the documents. E.g. {$contains: {"text": "hello"}}.

Furthermore, I think this is actually an incorrect usage of the $contains operator, but which is drawn directly from chroma's own docs:

where_document - A WhereDocument type dict used to filter by the documents. E.g. {$contains: {"text": "hello"}}

Our docs copied from this page, which is pointed to in our API reference, but this usage is not consistent with the chroma API type definitions and with other usages in other places. Actually, it's not even just the lack of quotation marks, I think the formula is just structurally wrong. You can see this from the types themselves, or other examples, like:

collection.query(
    query_texts=["doc10", "thus spake zarathustra", ...],
    n_results=10,
    where={"metadata_field": "is_equal_to_this"},
    where_document={"$contains":"search_string"}
)

$contains does not map to a Dict (see here)

System Info

System Information
------------------
> OS:  Darwin
> OS Version:  Darwin Kernel Version 24.3.0: Thu Jan  2 20:24:16 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T6000
> Python Version:  3.13.2 (main, Mar 10 2025, 18:46:41) [Clang 16.0.0 (clang-1600.0.26.6)]

Package Information
-------------------
> langchain_core: 0.3.48
> langchain: 0.3.21
> langsmith: 0.3.19
> langchain_chroma: 0.2.2
> langchain_openai: 0.3.10
> langchain_text_splitters: 0.3.7

Optional packages not installed
-------------------------------
> langserve

Other Dependencies
------------------
> async-timeout<5.0.0,>=4.0.0;: Installed. No version info available.
> chromadb!=0.5.10,!=0.5.11,!=0.5.12,!=0.5.4,!=0.5.5,!=0.5.7,!=0.5.9,<0.7.0,>=0.4.0: Installed. No version info available.
> httpx: 0.28.1
> jsonpatch<2.0,>=1.33: Installed. No version info available.
> langchain-anthropic;: Installed. No version info available.
> langchain-aws;: Installed. No version info available.
> langchain-azure-ai;: Installed. No version info available.
> langchain-cohere;: Installed. No version info available.
> langchain-community;: Installed. No version info available.
> langchain-core!=0.3.0,!=0.3.1,!=0.3.10,!=0.3.11,!=0.3.12,!=0.3.13,!=0.3.14,!=0.3.2,!=0.3.3,!=0.3.4,!=0.3.5,!=0.3.6,!=0.3.7,!=0.3.8,!=0.3.9,<0.4.0,>=0.2.43: Installed. No version info available.
> langchain-core<1.0.0,>=0.3.45: Installed. No version info available.
> langchain-core<1.0.0,>=0.3.48: Installed. No version info available.
> langchain-deepseek;: Installed. No version info available.
> langchain-fireworks;: Installed. No version info available.
> langchain-google-genai;: Installed. No version info available.
> langchain-google-vertexai;: Installed. No version info available.
> langchain-groq;: Installed. No version info available.
> langchain-huggingface;: Installed. No version info available.
> langchain-mistralai;: Installed. No version info available.
> langchain-ollama;: Installed. No version info available.
> langchain-openai;: Installed. No version info available.
> langchain-text-splitters<1.0.0,>=0.3.7: Installed. No version info available.
> langchain-together;: Installed. No version info available.
> langchain-xai;: Installed. No version info available.
> langsmith-pyo3: Installed. No version info available.
> langsmith<0.4,>=0.1.125: Installed. No version info available.
> langsmith<0.4,>=0.1.17: Installed. No version info available.
> numpy<2.0.0,>=1.22.4;: Installed. No version info available.
> numpy<2.0.0,>=1.26.2;: Installed. No version info available.
> openai-agents: Installed. No version info available.
> openai<2.0.0,>=1.68.2: Installed. No version info available.
> opentelemetry-api: 1.31.1
> opentelemetry-exporter-otlp-proto-http: Installed. No version info available.
> opentelemetry-sdk: 1.31.1
> orjson: 3.10.16
> packaging: 24.2
> packaging<25,>=23.2: Installed. No version info available.
> pydantic: 2.10.6
> pydantic<3.0.0,>=2.5.2;: Installed. No version info available.
> pydantic<3.0.0,>=2.7.4: Installed. No version info available.
> pydantic<3.0.0,>=2.7.4;: Installed. No version info available.
> pytest: Installed. No version info available.
> PyYAML>=5.3: Installed. No version info available.
> requests: 2.32.3
> requests-toolbelt: 1.0.0
> requests<3,>=2: Installed. No version info available.
> rich: 13.9.4
> SQLAlchemy<3,>=1.4: Installed. No version info available.
> tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
> tiktoken<1,>=0.7: Installed. No version info available.
> typing-extensions>=4.7: Installed. No version info available.
> zstandard: 0.23.0
@dosubot dosubot bot added the 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder label Mar 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder
Projects
None yet
Development

No branches or pull requests

1 participant