GitHub - tensorchord/vechord: Turn PostgreSQL into your search engine in a Pythonic way.

Turn PostgreSQL into your search engine in a Pythonic way.

Installation

pip install vechord

Features

Examples

simple.py: for people that are familiar with specialized vector database APIs
beir.py: the most flexible way to use the library (loading, indexing, querying and evaluation)
web.py: build a web application with from the defined tables and pipeline
essay.py: extract the content from Paul Graham's essays and evaluate the search results from LLM generated queries
contextual.py: contextual retrieval example
hybrid.py: hybrid search that rerank the results from vector search with keyword search

User Guide

For the API references, check our documentation.

Define the table

from typing import Annotated, Optional
from vechord.spec import Table, Vector, PrimaryKeyAutoIncrease, ForeignKey

# use 768 dimension vector
DenseVector = Vector[768]

class Document(Table, kw_only=True):
    uid: Optional[PrimaryKeyAutoIncrease] = None  # auto-increase id, no need to set
    link: str = ""
    text: str

class Chunk(Table, kw_only=True)
    uid: Optional[PrimaryKeyAutoIncrease] = None
    doc_id: Annotated[int, ForeignKey[Document.uid]]  # reference to `Document.uid`
    vector: DenseVector  # this comes with a default vector index
    text: str

Inject with decorator

import httpx
from vechord.registry import VechordRegistry
from vechord.extract import SimpleExtractor
from vechord.embedding import GeminiDenseEmbedding

vr = VechordRegistry(namespace="test", url="postgresql://postgres:[email protected]:5432/")
# ensure the table and index are created if not exists
vr.register([Document, Chunk])
extractor = SimpleExtractor()
emb = GeminiDenseEmbedding()

@vr.inject(output=Document)  # dump to the `Document` table
# function parameters are free to define since `inject(input=...)` is not set
def add_document(url: str) -> Document:  # the return type is `Document`
    with httpx.Client() as client:
        resp = client.get(url)
        text = extractor.extract_html(resp.text)
        return Document(link=url, text=text)

@vr.inject(input=Document, output=Chunk)  # load from the `Document` table and dump to the `Chunk` table
# function parameters are the attributes of the `Document` table, only defined attributes
# will be loaded from the `Document` table
def add_chunk(uid: int, text: str) -> list[Chunk]:  # the return type is `list[Chunk]`
    chunks = text.split("\n")
    return [Chunk(doc_id=uid, vector=emb.vectorize_chunk(t), text=t) for t in chunks]

if __name__ == "__main__":
    add_document("https://paulgraham.com/best.html")  # add arguments as usual
    add_chunk()  # omit the arguments since the `input` is will be loaded from the `Document` table
    vr.insert(Document(text="hello world"))  # insert manually
    print(vr.select_by(Document.partial_init()))  # select all the columns from table `Document`

Transaction

To guarantee the data consistency, users can use the VechordRegistry.run method to run multiple functions in a transaction.

In this transaction, all the functions will only load the data from the database that is inserted in the current transaction. So users can focus on the data processing part without worrying about which part of data has not been processed yet.

vr.set_pipeline([add_document, add_chunk])
vr.run("https://paulgraham.com/best.html")  # only accept the arguments for the first function

Search

print(vr.search_by_vector(Chunk, emb.vectorize_query("startup")))

Customized Index Configuration

from vechord.spec import VectorIndex

class Chunk(Table, kw_only=True):
    uid: Optional[PrimaryKeyAutoIncrease] = None
    vector: Annotated[DenseVector, VectorIndex(distance="cos", lists=128)]
    text: str

HTTP Service

This creates a WSGI application that can be served by any WSGI server.

Open the OpenAPI Endpoint to check the API documentation.

from vechord.service import create_web_app
from wsgiref.simple_server import make_server

app = create_web_app(vr)
with make_server("", 8000, app) as server:
    server.serve_forever()

Development

docker run --rm -d --name vdb -e POSTGRES_PASSWORD=postgres -p 5432:5432 ghcr.io/tensorchord/vchord_bm25-postgres:pg17-v0.1.1
envd up
# inside the envd env, sync all the dependencies
make sync
# format the code
make format

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
tests		tests
vechord		vechord
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.envd		build.envd
design.md		design.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Features

Examples

User Guide

Define the table

Inject with decorator

Transaction

Search

Customized Index Configuration

HTTP Service

Development

About

Releases

Contributors 2

Languages

License

tensorchord/vechord

Folders and files

Latest commit

History

Repository files navigation

Installation

Features

Examples

User Guide

Define the table

Inject with decorator

Transaction

Search

Customized Index Configuration

HTTP Service

Development

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Contributors 2

Languages