Helix-DB | Homepage | Documentation | PyPi
helix-py is a Python library for interacting with helix-db a powerful graph-vector database written in Rust. It provides both a simple query interface and a PyTorch-like front-end for defining and executing custom graph queries and vector-based operations. This makes it well-suited for use cases such as similarity search, knowledge graph construction, and machine learning pipelines.
uv add helix-py
or pip install helix-py
See getting started for more information on installing helix-db
curl -sSL "https://install.helix-db.com" | bash
helix install
helix-py allows users to define a PyTorch-like manner, similar to how
you would define a neural network's forward pass. You can use built-in queries in helix/client.py
to get started with inserting and search vectors, or you can define your own queries for more complex workflows.
Pytorch-like Query
QUERY add_user(name: String, age: I64) =>
usr <- AddV<User>({name: name, age: age})
RETURN usr
You can define a matching Python class:
from helix import Query
from helix.types import Payload
class add_user(Query):
def __init__(self, name: str, age: int):
super().__init__()
self.name = name
self.age = age
def query(self) -> Payload:
return [{ "name": self.name, "age": self.age }]
def response(self, response):
return response
Make sure that the Query.query
method returns a list of objects.
To setup a simple Client
to interface with a running helix instance:
import helix
from helix.client import hnswinsert, hnswsearch
db = helix.Client(local=True, verbose=True)
data = helix.Loader("path/to/data", cols=["vecs"])
[db.query(hnswinsert(d)) for d in data] # build hnsw index
my_query = [0.32, ..., -1.321]
nearest = db.query(hnswsearch(my_query)) # query hnsw index
# Calling your own query
# Standard
db.query('add_user', {"name": "John", "age": 20})
# Pytorch-like
db.query(add_user("John", 20))
The default port is 6969
, but you can change it by passing in the port
parameter.
For cloud instances, you can pass in the api_endpoint
parameter.
To setup a simple Instance
that automatically starts and stops a helix instance with respect
to the lifetime of the program, to interface with a helixdb-cfg
directory you have:
from helix.instance import Instance
helix_instance = Instance("helixdb-cfg", 6969, verbose=True)
helixdb-cfg
is the directory where the configuration files are stored.
and from there you can interact with the instance using Client
.
The instance will be automatically stopped when the script exits.
Helix has LLM interfaces for popular LLM providers.
Available providers:
OpenAIProvider
GeminiProvider
AnthropicProvider
All providers expose two methods:
enable_mcps(name: str, url: str=...) -> bool
to enable Helix MCP toolsgenerate(messages, response_model: BaseModel | None=None) -> str | BaseModel
The generate method supports messages in the 2 formats:
- Free-form text: pass a string
- Message lists: pass a list of
dict
or provider-specificMessage
models
It also supports structured outputs by passing a Pydantic model to get validated results.
Examples (see examples/llm_providers/providers.ipynb
for full demos):
from pydantic import BaseModel
# OpenAI
from helix.providers.openai_client import OpenAIProvider
openai_llm = OpenAIProvider(
name="openai-llm",
instructions="You are a helpful assistant.",
model="gpt-5-nano",
history=True
)
print(openai_llm.generate("Hello!"))
class Person(BaseModel):
name: str
age: int
occupation: str
print(openai_llm.generate([{"role": "user", "content": "Who am I?"}], Person))
To enable MCP tools with a running Helix MCP server (see MCP section):
openai_llm.enable_mcps("helix-mcp") # uses default http://localhost:8000/mcp/
gemini_llm.enable_mcps("helix-mcp") # uses default http://localhost:8000/mcp/
anthropic_llm.enable_mcps("helix-mcp", url="https://your-remote-mcp/...")
Notes:
- OpenAI GPT-5 family models support reasoning while other models use temperature.
- Anthropic local streamable MCP is not supported; use a URL-based MCP.
Helix has embedder interfaces for popular embedding providers.
Available embedders:
OpenAIEmbedder
GeminiEmbedder
VoyageAIEmbedder
Each embedder implements:
embed(text: str, **kwargs)
returns a vector[F64]
embed_batch(texts: List[str], **kwargs)
returns a list of vectors[F64]
Examples (see examples/llm_providers/providers.ipynb
for more):
from helix.embedding.openai_client import OpenAIEmbedder
openai_embedder = OpenAIEmbedder() # requires OPENAI_API_KEY
vec = openai_embedder.embed("Hello world")
batch = openai_embedder.embed_batch(["a", "b", "c"])
from helix.embedding.gemini_client import GeminiEmbedder
gemini_embedder = GeminiEmbedder()
vec = gemini_embedder.embed("doc text", task_type="RETRIEVAL_DOCUMENT")
from helix.embedding.voyageai_client import VoyageAIEmbedder
voyage_embedder = VoyageAIEmbedder()
vec = voyage_embedder.embed("query text", input_type="query")
Helix includes a ready-to-run MCP server exposing graph traversal and search tools from your helix instance.
Key classes are in helix/mcp.py
:
MCPServer(name, client, tool_config=ToolConfig(), embedder=None, embedder_args={})
ToolConfig
to enable/disable tools (e.g.,search_vector
,search_vector_text
,search_keyword
, traversal tools)
Starting a server (simple):
# examples/mcp_server.py
from helix.client import Client
from helix.mcp import MCPServer, ToolConfig
from helix.embedding.openai_client import OpenAIEmbedder
helix_client = Client(local=True)
openai_embedder = OpenAIEmbedder() # needs OPENAI_API_KEY
mcp_server = MCPServer("helix-mcp", helix_client, tool_config=tool_config, embedder=openai_embedder)
mcp_server.run() # streamable-http on http://127.0.0.1:8000/mcp/
Alternative app entry (with explicit host/port):
# apps/mcp_server.py
from helix.client import Client
from helix.mcp import MCPServer
from helix.embedding.openai_client import OpenAIEmbedder
client = Client(local=True, port=6969)
openai_embedder = OpenAIEmbedder()
mcp_server = MCPServer("helix-mcp", client, embedder=openai_embedder)
if __name__ == "__main__":
mcp_server.run(transport="streamable-http", host="127.0.0.1", port=8000)
Configure Claude Desktop to use the local MCP server (adjust paths):
{
"mcpServers": {
"helix-mcp": {
"command": "uv",
"args": [
"--directory",
"/absolute/path/to/your/app/folder",
"run",
"mcp_server.py"
]
}
}
}
Environment variables:
OPENAI_API_KEY
,GEMINI_API_KEY
,VOYAGEAI_API_KEY
,ANTHROPIC_API_KEY
as needed.
MCP tools overview (enabled via ToolConfig
):
- Traversal:
n_from_type
,e_from_type
,out_step
,out_e_step
,in_step
,in_e_step
,filter_items
- Search:
search_vector
(requiresembedder
),search_vector_text
(server-side embedding),search_keyword
To dynamically create, load, and edit your Helixdb schema, you can use the Schema
class.
To get started, you can create a schema
instance and optionally pass in the path to your configs file.
from helix.loader import Schema
schema = Schema()
This will either create a new schema.hx
file if you don't have one, or load the existing one.
To interact with the schema, you can use various methods, including:
schema.create_node("User", {"name": "String", "age": "U32"})
schema.create_edge("Follows", "User", "User")
schema.create_vector("Vec", {"vec": "Vec32"})
To save the schema to your configs folder, you can use the save
method.
schema.save()
Helix uses Chonkie chunking methods to split text into manageable pieces for processing and embedding:
from helix import Chunk
text = "Your long document text here..."
chunks = Chunk.token_chunk(text)
semantic_chunks = Chunk.semantic_chunk(text)
code_text = "def hello(): print('world')"
code_chunks = Chunk.code_chunk(code_text, language="python")
texts = ["Document 1...", "Document 2...", "Document 3..."]
batch_chunks = Chunk.sentence_chunk(texts)
You can find all the different chunking examples inside of the documentation.
The loader (helix/loader.py
) currently supports .parquet
, .fvecs
, and .csv
data. Simply pass in the path to your
file or files and the columns you want to process and the loader does the rest for you and is easy to integrate with
your queries
helix-py is licensed under the The AGPL (Affero General Public License).