Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
63e66be
test(chroma): use async DocumentStore mixin tests
ShubhamGond105 Apr 27, 2026
28d58bc
chore(chroma): bump minimum haystack-ai version to 2.28.0
ShubhamGond105 Apr 28, 2026
b8fb275
test(chroma): fix lint issues
ShubhamGond105 Apr 28, 2026
1a238a5
test(chroma): override duplicate fail test for Chroma behaviour
ShubhamGond105 Apr 29, 2026
c2ac6f6
test(chroma): remove WriteDocumentsAsyncTest mixin - Chroma has custo…
ShubhamGond105 Apr 29, 2026
494a433
feat(supabase): add SupabaseGroongaDocumentStore and SupabaseGroongaR…
ShubhamGond105 May 2, 2026
17877e7
chore: resolve merge conflict in chroma test file
ShubhamGond105 May 11, 2026
7ddf365
fix(supabase): fix lint errors and add supabase test dependency
ShubhamGond105 May 11, 2026
e8f53bb
fix(supabase): fix mypy type errors in groonga document store
ShubhamGond105 May 11, 2026
446088c
fix(supabase): fix mypy union-attr error and count_documents implemen…
ShubhamGond105 May 11, 2026
5c8d270
Merge branch 'main' into feat/supabase-groonga
davidsbatista May 19, 2026
ceb8394
Merge branch 'main' into feat/supabase-groonga
davidsbatista May 19, 2026
3ae853c
fix(supabase): address reviewer feedback - lazy init, DocumentStore b…
ShubhamGond105 May 19, 2026
5be352e
fix(supabase): fix lint errors - imports, assert, formatting
ShubhamGond105 May 19, 2026
d679975
fix(supabase): fix mypy type errors - CountMethod and union-attr
ShubhamGond105 May 19, 2026
3887dff
Merge branch 'main' into feat/supabase-groonga
davidsbatista May 20, 2026
78231ce
converting methods to static
davidsbatista May 20, 2026
6d9ef28
Merge branch 'main' into feat/supabase-groonga
davidsbatista May 20, 2026
334744f
fix(supabase): fix groonga_search parameter name and add integration …
ShubhamGond105 May 20, 2026
22005a5
Merge branch 'main' into feat/supabase-groonga
davidsbatista May 21, 2026
310aa13
Merge branch 'main' into feat/supabase-groonga
davidsbatista May 21, 2026
95808fb
Merge branch 'main' into feat/supabase-groonga
davidsbatista May 27, 2026
01632b5
Merge branch 'main' into feat/supabase-groonga
davidsbatista May 29, 2026
2b88249
using apply_filter_policy from haystack document stores
davidsbatista May 29, 2026
1743aef
adding table_name validation
davidsbatista May 29, 2026
525b8fe
updating tests
davidsbatista May 29, 2026
9e68df7
decoupling tests
davidsbatista May 29, 2026
7bc7a91
removing unused imports#
davidsbatista May 29, 2026
c2ef5ab
Merge branch 'feat/supabase-groonga' of https://github.com/ShubhamGon…
ShubhamGond105 May 31, 2026
329a2ec
fix(supabase): add strict=False to supabase_key default
ShubhamGond105 May 31, 2026
f789808
fix(supabase): skip integration tests when SUPABASE_URL not set
ShubhamGond105 May 31, 2026
91d6ee0
fixing pyproject.toml
davidsbatista Jun 1, 2026
5e98a1f
adding integrations tests
davidsbatista Jun 1, 2026
e3c0bfc
adding delete_by_filter and update_by_filter
davidsbatista Jun 1, 2026
3a4dc55
adding Docker instance for integration tests and using mixins
davidsbatista Jun 1, 2026
a8a9e3b
updating integration tests
davidsbatista Jun 1, 2026
67d52b1
solving linting issues
davidsbatista Jun 1, 2026
5be825c
fixing filter tests and removing unit tests covered by integration tests
davidsbatista Jun 1, 2026
520717c
formatting
davidsbatista Jun 1, 2026
f630129
Merge branch 'main' into feat/supabase-groonga
davidsbatista Jun 1, 2026
48c249d
renaming the retriever since it only handles full text for consistenc…
davidsbatista Jun 1, 2026
d667275
formatting
davidsbatista Jun 1, 2026
0c38dc6
removing unused ignore
davidsbatista Jun 1, 2026
b8b0ef3
adding missed file
davidsbatista Jun 1, 2026
13c8032
documenting integration tests
davidsbatista Jun 1, 2026
0ce43ee
Merge branch 'main' into feat/supabase-groonga
davidsbatista Jun 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 22 additions & 2 deletions .github/workflows/supabase.yml
Original file line number Diff line number Diff line change
Expand Up @@ -120,11 +120,31 @@ jobs:
name: coverage-comment-supabase
path: python-coverage-comment-action-supabase.txt

- name: Run integration tests
- name: Run pgvector integration tests
if: runner.os == 'Linux'
env:
SUPABASE_DB_URL: "postgresql://postgres:postgres@localhost:5432/postgres"
run: hatch run test:integration-cov-append-retry
run: hatch run test:integration-cov-append-retry --ignore=tests/test_groonga_integration.py

- name: Start PGroonga + PostgREST stack
if: runner.os == 'Linux'
run: docker compose -f docker-compose-groonga.yml up -d --build

- name: Wait for PGroonga stack to be ready
if: runner.os == 'Linux'
run: |
for i in $(seq 1 30); do
if curl -sf http://localhost:8000/rest/v1/ -H "apikey: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZS1kZW1vIiwicm9sZSI6InNlcnZpY2Vfcm9sZSIsImV4cCI6MTk4MzgxMjk5Nn0.EGIM96RAZx35lJzdJsyH-qQwv8Hj04zWl196z2-SBc0"; then
echo "PGroonga stack is ready"
break
fi
echo "Waiting for PGroonga stack... ($i/30)"
sleep 5
done

- name: Run PGroonga integration tests
if: runner.os == 'Linux'
run: hatch run test:integration-cov-append-retry tests/test_groonga_integration.py

- name: Store combined coverage
if: github.event_name == 'push'
Expand Down
10 changes: 10 additions & 0 deletions integrations/supabase/Dockerfile.pgroonga
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
FROM postgres:17-bookworm

RUN apt-get update && \
apt-get install -y wget gnupg2 && \
wget -q -O /tmp/groonga-apt-source.deb \
https://packages.groonga.org/debian/groonga-apt-source-latest-bookworm.deb && \
dpkg -i /tmp/groonga-apt-source.deb && \
apt-get update && \
apt-get install -y postgresql-17-pgdg-pgroonga && \
rm -rf /var/lib/apt/lists/* /tmp/groonga-apt-source.deb
42 changes: 42 additions & 0 deletions integrations/supabase/docker-compose-groonga.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
services:
pgroonga-postgres:
build:
context: .
dockerfile: Dockerfile.pgroonga
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
ports:
- "5433:5432"
volumes:
- ./init-pgroonga.sql:/docker-entrypoint-initdb.d/init-pgroonga.sql
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 10

postgrest:
image: postgrest/postgrest:v12.2.0
environment:
PGRST_DB_URI: postgres://postgres:postgres@pgroonga-postgres:5432/postgres
PGRST_DB_SCHEMAS: public
# No PGRST_JWT_SECRET → JWT validation disabled; all requests run as PGRST_DB_ANON_ROLE.
# supabase-py still sends an apikey header but PostgREST ignores it.
PGRST_DB_ANON_ROLE: postgres
PGRST_LOG_LEVEL: info
ports:
- "3000:3000"
depends_on:
pgroonga-postgres:
condition: service_healthy

nginx:
image: nginx:alpine
ports:
- "8000:8000"
volumes:
- ./nginx-groonga.conf:/etc/nginx/nginx.conf:ro
depends_on:
- postgrest
62 changes: 62 additions & 0 deletions integrations/supabase/init-pgroonga.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
-- Enable PGroonga extension
CREATE EXTENSION IF NOT EXISTS pgroonga;

-- PostgreSQL role that PostgREST switches to when a service_role JWT is presented.
-- The role must exist before PostgREST connects.
DO $$
BEGIN
IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = 'service_role') THEN
CREATE ROLE service_role NOLOGIN;
END IF;
END
$$;

GRANT ALL ON SCHEMA public TO service_role;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO service_role;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO service_role;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON FUNCTIONS TO service_role;

-- exec_sql: allows the document store to create/drop tables and indexes via RPC.
CREATE OR REPLACE FUNCTION exec_sql(query TEXT)
RETURNS VOID AS $$
BEGIN
EXECUTE query;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

GRANT EXECUTE ON FUNCTION exec_sql(TEXT) TO service_role;

-- groonga_search: full-text search via PGroonga, called by _groonga_retrieval().
CREATE OR REPLACE FUNCTION groonga_search(query_text TEXT, table_name TEXT, top_k INT)
RETURNS TABLE(id TEXT, content TEXT, meta JSONB, score REAL) AS $$
DECLARE
sql TEXT;
BEGIN
sql := format(
'SELECT id, content, meta, pgroonga_score(tableoid, ctid)::REAL AS score
FROM %I
WHERE content &@~ %L
ORDER BY score DESC
LIMIT %s',
table_name, query_text, top_k
);
RETURN QUERY EXECUTE sql;
END;
$$ LANGUAGE plpgsql;

GRANT EXECUTE ON FUNCTION groonga_search(TEXT, TEXT, INT) TO service_role;

-- Pre-create the test table so PostgREST includes it in its schema cache at startup.
-- Tests use this fixed table and clear data between runs instead of recreating the table.
CREATE TABLE IF NOT EXISTS haystack_groonga_test (
id TEXT PRIMARY KEY,
content TEXT,
meta JSONB,
score REAL
);

CREATE INDEX IF NOT EXISTS pgroonga_haystack_groonga_test_index
ON haystack_groonga_test
USING pgroonga (content);

GRANT ALL ON TABLE haystack_groonga_test TO postgres;
18 changes: 18 additions & 0 deletions integrations/supabase/nginx-groonga.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Minimal reverse proxy so supabase-py (which appends /rest/v1/) reaches PostgREST.
events {}

http {
server {
listen 8000;

location /rest/v1/ {
rewrite ^/rest/v1/(.*)$ /$1 break;
proxy_pass http://postgrest:3000;
proxy_set_header Host $host;
# Strip auth headers — PostgREST has no JWT secret configured,
# so all requests run as the anon role (postgres).
proxy_set_header Authorization "";
proxy_set_header apikey "";
}
}
}
4 changes: 3 additions & 1 deletion integrations/supabase/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ classifiers = [
"Programming Language :: Python :: Implementation :: CPython",
"Programming Language :: Python :: Implementation :: PyPy",
]
dependencies = ["haystack-ai>=2.26.1", "pgvector-haystack>=6.3.0", "supabase>=2.9.0"]
dependencies = ["haystack-ai>=2.26.1", "pgvector-haystack>=6.3.0", "supabase>=2.23.0"]

[project.urls]
Documentation = "https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/supabase#readme"
Expand Down Expand Up @@ -58,6 +58,7 @@ dependencies = [
"pytest-rerunfailures",
"mypy",
"pip",
"supabase",
]

[tool.hatch.envs.test.scripts]
Expand Down Expand Up @@ -153,6 +154,7 @@ show_missing = true
exclude_lines = ["no cov", "if __name__ == .__main__.:", "if TYPE_CHECKING:"]

[tool.pytest.ini_options]
asyncio_mode = "auto"
addopts = "--strict-markers"
markers = [
"integration: integration tests",
Expand Down
Empty file added integrations/supabase/pytest
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@
# SPDX-License-Identifier: Apache-2.0

from .embedding_retriever import SupabasePgvectorEmbeddingRetriever
from .groonga_bm25_retriever import SupabaseGroongaBM25Retriever
from .keyword_retriever import SupabasePgvectorKeywordRetriever

__all__ = ["SupabasePgvectorEmbeddingRetriever", "SupabasePgvectorKeywordRetriever"]
__all__ = [
"SupabaseGroongaBM25Retriever",
"SupabasePgvectorEmbeddingRetriever",
"SupabasePgvectorKeywordRetriever",
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
# SPDX-FileCopyrightText: 2023-present deepset GmbH <info@deepset.ai>
#
# SPDX-License-Identifier: Apache-2.0

import copy
from typing import Any

from haystack import component, default_from_dict, default_to_dict
from haystack.dataclasses import Document
from haystack.document_stores.types import FilterPolicy
from haystack.document_stores.types.filter_policy import apply_filter_policy

from haystack_integrations.document_stores.supabase import SupabaseGroongaDocumentStore


@component
class SupabaseGroongaBM25Retriever:
"""
Retrieves documents from SupabaseGroongaDocumentStore using PGroonga full-text search.

This retriever works without embeddings — it searches documents using plain text queries.
It can be used alongside SupabasePgvectorEmbeddingRetriever in hybrid search pipelines.

Note: async operations are not supported as the supabase-py sync client does not expose
awaitable query methods. Use the sync run() method instead.

Example usage:

```python
from haystack_integrations.document_stores.supabase import SupabaseGroongaDocumentStore
from haystack_integrations.components.retrievers.supabase import SupabaseGroongaBM25Retriever
from haystack.utils import Secret

document_store = SupabaseGroongaDocumentStore(
supabase_url="https://<project>.supabase.co",
supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
table_name="haystack_fts_documents",
)
document_store.warm_up()

retriever = SupabaseGroongaBM25Retriever(document_store=document_store, top_k=10)
result = retriever.run(query="python programming")
print(result["documents"])
```
"""

def __init__(
self,
*,
document_store: SupabaseGroongaDocumentStore,
filters: dict[str, Any] | None = None,
top_k: int = 10,
filter_policy: str | FilterPolicy = FilterPolicy.REPLACE,
) -> None:
"""
Initialize the SupabaseGroongaBM25Retriever.

:param document_store: An instance of SupabaseGroongaDocumentStore.
:param filters: Optional filters applied to retrieved Documents.
:param top_k: Maximum number of Documents to return. Defaults to 10.
:param filter_policy: Policy to determine how filters are applied.
:raises ValueError: If document_store is not an instance of SupabaseGroongaDocumentStore.
"""
if not isinstance(document_store, SupabaseGroongaDocumentStore):
msg = "document_store must be an instance of SupabaseGroongaDocumentStore"
raise ValueError(msg)

self.document_store = document_store
self.filters = filters or {}
self.top_k = top_k
self.filter_policy = (
filter_policy if isinstance(filter_policy, FilterPolicy) else FilterPolicy.from_str(filter_policy)
)

@component.output_types(documents=list[Document])
def run(
self,
query: str,
filters: dict[str, Any] | None = None,
top_k: int | None = None,
) -> dict[str, list[Document]]:
"""
Runs the retriever on the given query.

:param query: The text query to search for.
:param filters: Optional runtime filters. Merged or replaced based on filter_policy.
:param top_k: Optional override for maximum number of documents to return.
:returns: Dictionary with key "documents" containing list of matching Documents.
"""
if not query:
return {"documents": []}

merged_filters = apply_filter_policy(self.filter_policy, self.filters, filters)
effective_top_k = top_k if top_k is not None else self.top_k

documents = self.document_store._groonga_retrieval(
query=query,
top_k=effective_top_k,
filters=merged_filters,
)

return {"documents": documents}

@component.output_types(documents=list[Document])
async def run_async(
self,
query: str,
filters: dict[str, Any] | None = None,
top_k: int | None = None,
) -> dict[str, list[Document]]:
"""
Async version of run().

Note: supabase-py's sync client does not support native async queries.
This method runs the synchronous retrieval and returns the result.
For fully async support, consider using acreate_client() from supabase-py
and refactoring the document store accordingly.

:param query: The text query to search for.
:param filters: Optional runtime filters. Merged or replaced based on filter_policy.
:param top_k: Optional override for maximum number of documents to return.
:returns: Dictionary with key "documents" containing list of matching Documents.
"""
return self.run(query=query, filters=filters, top_k=top_k)

def to_dict(self) -> dict[str, Any]:
"""
Serializes the component to a dictionary.

:returns: Dictionary with serialized data.
"""
return default_to_dict(
self,
filters=self.filters,
top_k=self.top_k,
filter_policy=self.filter_policy.value,
document_store=self.document_store.to_dict(),
)

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "SupabaseGroongaBM25Retriever":
"""
Deserializes the component from a dictionary.

:param data: Dictionary to deserialize from.
:returns: Deserialized component.
"""
data = copy.deepcopy(data)
doc_store_params = data["init_parameters"]["document_store"]
data["init_parameters"]["document_store"] = SupabaseGroongaDocumentStore.from_dict(doc_store_params)
if filter_policy := data["init_parameters"].get("filter_policy"):
data["init_parameters"]["filter_policy"] = FilterPolicy.from_str(filter_policy)
return default_from_dict(cls, data)
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,9 @@
#
# SPDX-License-Identifier: Apache-2.0
from .document_store import SupabasePgvectorDocumentStore
from .groonga_document_store import SupabaseGroongaDocumentStore

__all__ = ["SupabasePgvectorDocumentStore"]
__all__ = [
"SupabaseGroongaDocumentStore",
"SupabasePgvectorDocumentStore",
]
Loading
Loading