Skip to content

Commit 9ef644b

Browse files
authored
Add Apache Cassandra vector store support (#3578)
1 parent 7afbaae commit 9ef644b

File tree

8 files changed

+1075
-1
lines changed

8 files changed

+1075
-1
lines changed
Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
---
2+
title: Apache Cassandra
3+
---
4+
5+
[Apache Cassandra](https://cassandra.apache.org/) is a highly scalable, distributed NoSQL database designed for handling large amounts of data across many commodity servers with no single point of failure. It supports vector storage for semantic search capabilities in AI applications and can scale to massive datasets with linear performance improvements.
6+
7+
### Usage
8+
9+
```python
10+
import os
11+
from mem0 import Memory
12+
13+
os.environ["OPENAI_API_KEY"] = "sk-xx"
14+
15+
config = {
16+
"vector_store": {
17+
"provider": "cassandra",
18+
"config": {
19+
"contact_points": ["127.0.0.1"],
20+
"port": 9042,
21+
"username": "cassandra",
22+
"password": "cassandra",
23+
"keyspace": "mem0",
24+
"collection_name": "memories",
25+
}
26+
}
27+
}
28+
29+
m = Memory.from_config(config)
30+
messages = [
31+
{"role": "user", "content": "I'm planning to watch a movie tonight. Any recommendations?"},
32+
{"role": "assistant", "content": "How about thriller movies? They can be quite engaging."},
33+
{"role": "user", "content": "I'm not a big fan of thriller movies but I love sci-fi movies."},
34+
{"role": "assistant", "content": "Got it! I'll avoid thriller recommendations and suggest sci-fi movies in the future."}
35+
]
36+
m.add(messages, user_id="alice", metadata={"category": "movies"})
37+
```
38+
39+
#### Using DataStax Astra DB
40+
41+
For managed Cassandra with DataStax Astra DB:
42+
43+
```python
44+
config = {
45+
"vector_store": {
46+
"provider": "cassandra",
47+
"config": {
48+
"contact_points": ["dummy"], # Not used with secure connect bundle
49+
"username": "token",
50+
"password": "AstraCS:...", # Your Astra DB application token
51+
"keyspace": "mem0",
52+
"collection_name": "memories",
53+
"secure_connect_bundle": "/path/to/secure-connect-bundle.zip"
54+
}
55+
}
56+
}
57+
```
58+
59+
<Note>
60+
When using DataStax Astra DB, provide the secure connect bundle path. The contact_points parameter is ignored when a secure connect bundle is provided.
61+
</Note>
62+
63+
### Config
64+
65+
Here are the parameters available for configuring Apache Cassandra:
66+
67+
| Parameter | Description | Default Value |
68+
| --- | --- | --- |
69+
| `contact_points` | List of contact point IP addresses | Required |
70+
| `port` | Cassandra port | `9042` |
71+
| `username` | Database username | `None` |
72+
| `password` | Database password | `None` |
73+
| `keyspace` | Keyspace name | `"mem0"` |
74+
| `collection_name` | Table name for storing vectors | `"memories"` |
75+
| `embedding_model_dims` | Dimensions of embedding vectors | `1536` |
76+
| `secure_connect_bundle` | Path to Astra DB secure connect bundle | `None` |
77+
| `protocol_version` | CQL protocol version | `4` |
78+
| `load_balancing_policy` | Custom load balancing policy | `None` |
79+
80+
### Setup
81+
82+
#### Option 1: Local Cassandra Setup using Docker:
83+
84+
```bash
85+
# Pull and run Cassandra container
86+
docker run --name mem0-cassandra \
87+
-p 9042:9042 \
88+
-e CASSANDRA_CLUSTER_NAME="Mem0Cluster" \
89+
-d cassandra:latest
90+
91+
# Wait for Cassandra to start (may take 1-2 minutes)
92+
docker exec -it mem0-cassandra cqlsh
93+
94+
# Create keyspace
95+
CREATE KEYSPACE IF NOT EXISTS mem0
96+
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
97+
```
98+
99+
#### Option 2: DataStax Astra DB (Managed Cloud):
100+
101+
1. Sign up at [DataStax Astra](https://astra.datastax.com/)
102+
2. Create a new database
103+
3. Download the secure connect bundle
104+
4. Generate an application token
105+
106+
<Tip>
107+
For production deployments, use DataStax Astra DB for fully managed Cassandra with automatic scaling, backups, and security.
108+
</Tip>
109+
110+
#### Option 3: Install Cassandra Locally:
111+
112+
**Ubuntu/Debian:**
113+
```bash
114+
# Add Apache Cassandra repository
115+
echo "deb https://downloads.apache.org/cassandra/debian 40x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
116+
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
117+
118+
# Install Cassandra
119+
sudo apt-get update
120+
sudo apt-get install cassandra
121+
122+
# Start Cassandra
123+
sudo systemctl start cassandra
124+
125+
# Verify installation
126+
nodetool status
127+
```
128+
129+
**macOS:**
130+
```bash
131+
# Using Homebrew
132+
brew install cassandra
133+
134+
# Start Cassandra
135+
brew services start cassandra
136+
137+
# Connect to CQL shell
138+
cqlsh
139+
```
140+
141+
### Python Client Installation
142+
143+
Install the required Python package:
144+
145+
```bash
146+
pip install cassandra-driver
147+
```
148+
149+
### Performance Considerations
150+
151+
- **Replication Factor**: For production, use replication factor of at least 3
152+
- **Consistency Level**: Balance between consistency and performance (QUORUM recommended)
153+
- **Partitioning**: Cassandra automatically distributes data across nodes
154+
- **Scaling**: Add nodes to linearly increase capacity and performance
155+
156+
### Advanced Configuration
157+
158+
```python
159+
from cassandra.policies import DCAwareRoundRobinPolicy
160+
161+
config = {
162+
"vector_store": {
163+
"provider": "cassandra",
164+
"config": {
165+
"contact_points": ["node1.example.com", "node2.example.com", "node3.example.com"],
166+
"port": 9042,
167+
"username": "mem0_user",
168+
"password": "secure_password",
169+
"keyspace": "mem0_prod",
170+
"collection_name": "memories",
171+
"protocol_version": 4,
172+
"load_balancing_policy": DCAwareRoundRobinPolicy(local_dc='DC1')
173+
}
174+
}
175+
}
176+
```
177+
178+
<Warning>
179+
For production use, configure appropriate replication strategies and consistency levels based on your availability and consistency requirements.
180+
</Warning>
181+

docs/docs.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@
5959
"icon": "star",
6060
"pages": [
6161
"platform/features/platform-overview",
62-
"platform/features/v2-memory-filters",
62+
"platform/features/v2-memory-filters",
6363
"platform/features/contextual-add",
6464
"platform/features/async-client",
6565
"platform/features/async-mode-default-change",
@@ -153,6 +153,7 @@
153153
"pages": [
154154
"components/vectordbs/dbs/qdrant",
155155
"components/vectordbs/dbs/chroma",
156+
"components/vectordbs/dbs/cassandra",
156157
"components/vectordbs/dbs/pgvector",
157158
"components/vectordbs/dbs/milvus",
158159
"components/vectordbs/dbs/pinecone",
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
from typing import Any, Dict, List, Optional
2+
3+
from pydantic import BaseModel, Field, model_validator
4+
5+
6+
class CassandraConfig(BaseModel):
7+
"""Configuration for Apache Cassandra vector database."""
8+
9+
contact_points: List[str] = Field(
10+
...,
11+
description="List of contact point addresses (e.g., ['127.0.0.1', '127.0.0.2'])"
12+
)
13+
port: int = Field(9042, description="Cassandra port")
14+
username: Optional[str] = Field(None, description="Database username")
15+
password: Optional[str] = Field(None, description="Database password")
16+
keyspace: str = Field("mem0", description="Keyspace name")
17+
collection_name: str = Field("memories", description="Table name")
18+
embedding_model_dims: int = Field(1536, description="Dimensions of the embedding model")
19+
secure_connect_bundle: Optional[str] = Field(
20+
None,
21+
description="Path to secure connect bundle for DataStax Astra DB"
22+
)
23+
protocol_version: int = Field(4, description="CQL protocol version")
24+
load_balancing_policy: Optional[Any] = Field(
25+
None,
26+
description="Custom load balancing policy object"
27+
)
28+
29+
@model_validator(mode="before")
30+
@classmethod
31+
def check_auth(cls, values: Dict[str, Any]) -> Dict[str, Any]:
32+
"""Validate authentication parameters."""
33+
username = values.get("username")
34+
password = values.get("password")
35+
36+
# Both username and password must be provided together or not at all
37+
if (username and not password) or (password and not username):
38+
raise ValueError(
39+
"Both 'username' and 'password' must be provided together for authentication"
40+
)
41+
42+
return values
43+
44+
@model_validator(mode="before")
45+
@classmethod
46+
def check_connection_config(cls, values: Dict[str, Any]) -> Dict[str, Any]:
47+
"""Validate connection configuration."""
48+
secure_connect_bundle = values.get("secure_connect_bundle")
49+
contact_points = values.get("contact_points")
50+
51+
# Either secure_connect_bundle or contact_points must be provided
52+
if not secure_connect_bundle and not contact_points:
53+
raise ValueError(
54+
"Either 'contact_points' or 'secure_connect_bundle' must be provided"
55+
)
56+
57+
return values
58+
59+
@model_validator(mode="before")
60+
@classmethod
61+
def validate_extra_fields(cls, values: Dict[str, Any]) -> Dict[str, Any]:
62+
"""Validate that no extra fields are provided."""
63+
allowed_fields = set(cls.model_fields.keys())
64+
input_fields = set(values.keys())
65+
extra_fields = input_fields - allowed_fields
66+
67+
if extra_fields:
68+
raise ValueError(
69+
f"Extra fields not allowed: {', '.join(extra_fields)}. "
70+
f"Please input only the following fields: {', '.join(allowed_fields)}"
71+
)
72+
73+
return values
74+
75+
class Config:
76+
arbitrary_types_allowed = True
77+

mem0/utils/factory.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,7 @@ class VectorStoreFactory:
184184
"langchain": "mem0.vector_stores.langchain.Langchain",
185185
"s3_vectors": "mem0.vector_stores.s3_vectors.S3Vectors",
186186
"baidu": "mem0.vector_stores.baidu.BaiduDB",
187+
"cassandra": "mem0.vector_stores.cassandra.CassandraDB",
187188
"neptune": "mem0.vector_stores.neptune_analytics.NeptuneAnalyticsVector",
188189
}
189190

0 commit comments

Comments
 (0)