fix(qdrant): write isDeleted to nested meta payload path in markDeleted/markActive#364
Conversation
…ed/markActive
markDeleted() and markActive() wrote {'isDeleted': True/False} to the
top-level Qdrant payload, but _convertFilter() and the payload index
both reference 'meta.isDeleted'. This mismatch meant soft-deleted
documents still appeared in every search and query result.
Change the payload to {'meta': {'isDeleted': True/False}} so the flag
is written to the same nested path the filters read from. Qdrant's
set_payload merges the nested dict, updating only meta.isDeleted
without overwriting other meta fields.
📝 WalkthroughWalkthroughThe Qdrant store's soft-delete mechanism is updated to use nested metadata structure ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/qdrant/qdrant.py`:
- Line 541: The set_payload calls in the Qdrant node are overwriting the entire
meta object; change both calls (used in the methods that markDeleted/markActive)
to call self.client.set_payload with key='meta' and payload={'isDeleted': True}
(or False) and keep points=filter_objectId so only meta.isDeleted is updated;
then update nodes/test/test_qdrant_mark_deleted.py assertions to expect
payload={'isDeleted': True/False} and assert that set_payload was invoked with
key='meta' rather than expecting the nested meta wrapper.
In `@nodes/test/test_qdrant_mark_deleted.py`:
- Around line 55-89: The test currently installs module-level mocks into
sys.modules using sys.modules.setdefault(...) which pollutes the global import
cache; change this to use patch.dict(sys.modules, {...}, clear=False) around the
import logic so the mocks apply only during the dynamic import of _qdrant_path;
specifically, replace the for-loop that sets mocks and the subsequent
_spec/_qdrant_mod import/exec_module sequence with a with
patch.dict(sys.modules, {...}, clear=False): block that contains the
spec_from_file_location, module_from_spec and
_spec.loader.exec_module(_qdrant_mod) calls, ensuring mocks are removed after
the block.
- Around line 33-52: Update the _FakeDocumentStoreBase stubs to satisfy Ruff by
adding explicit type annotations: change __init__(self, *a, **kw) to
__init__(self, *_a: Any, **_kw: Any) and annotate its return type as None;
change doesCollectionExist, _doesCollectionExist, _checkCollectionExists, and
createCollection signatures to accept *_a: Any, **_kw: Any and return -> bool
(or None where semantically appropriate), and import Any from typing at the top;
also rename variadic parameters to underscored names (_a, _kw) to suppress
unused-argument warnings while keeping the same behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a90dbed5-63b3-47d8-b3ff-1ef4484fe47a
📒 Files selected for processing (2)
nodes/src/nodes/qdrant/qdrant.pynodes/test/test_qdrant_mark_deleted.py
|
|
||
| # Set all the objects with the given objectId to true | ||
| self.client.set_payload(collection_name=self.collection, payload={'isDeleted': True}, points=filter_objectId) | ||
| self.client.set_payload(collection_name=self.collection, payload={'meta': {'isDeleted': True}}, points=filter_objectId) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find and explore the file structure
git ls-files nodes/src/nodes/qdrant/ | head -20Repository: rocketride-org/rocketride-server
Length of output: 326
🏁 Script executed:
# Check if the file exists and get its line count
wc -l nodes/src/nodes/qdrant/qdrant.pyRepository: rocketride-org/rocketride-server
Length of output: 112
🏁 Script executed:
# Read the specific lines mentioned (541 and 559)
sed -n '535,565p' nodes/src/nodes/qdrant/qdrant.pyRepository: rocketride-org/rocketride-server
Length of output: 1353
🏁 Script executed:
# Check the test file mentioned
ls -la nodes/test/test_qdrant_mark_deleted.py 2>/dev/null || echo "Test file not found"Repository: rocketride-org/rocketride-server
Length of output: 163
🏁 Script executed:
# Search for all set_payload calls in the codebase
rg 'set_payload' nodes/Repository: rocketride-org/rocketride-server
Length of output: 1020
🌐 Web query:
Qdrant python client set_payload API key parameter nested updates
💡 Result:
- API key:
api_keyis not a parameter ofset_payload(); you pass it when creating the client (or via client-level headers), e.g.QdrantClient(..., api_key="..."). Qdrant Cloud also accepts the key as theapi-keyheader (orAuthorization: Bearer ...). [2] - Python
set_payload()nested updates: use thekeyargument to target a nested path (dot-separated). The providedpayloadis applied at that nested object, adding/updating fields there while leaving the rest of the payload intact. [1] - Replacing vs updating:
- Update/merge:
set_payload(...)(optionally withkey=...) updates fields without wiping other payload keys. [1] - Full replace: use
overwrite_payload(...)to replace the entire payload for the point(s). [3]
- Update/merge:
Example (nested update):
from qdrant_client import QdrantClient
client = QdrantClient(url="https://<cluster>:6333", api_key="<QDRANT_API_KEY>") # [2]
client.set_payload(
collection_name="my_collection",
points=[42],
key="recipe.fruits", # nested path [1]
payload={"cinnamon": "2g"}, # merges into recipe.fruits [1]
)Sources: [1] Qdrant Python client set_payload docs, [2] Qdrant Cloud authentication docs, [3] Qdrant payload docs (overwrite payload section).
🏁 Script executed:
cat -n nodes/test/test_qdrant_mark_deleted.pyRepository: rocketride-org/rocketride-server
Length of output: 6467
🏁 Script executed:
cat -n nodes/test/mocks/qdrant_client/__init__.py | head -100Repository: rocketride-org/rocketride-server
Length of output: 5432
🏁 Script executed:
# Check if there are any other uses of set_payload or key parameter in the codebase
rg 'key=' nodes/src/nodes/qdrant/ -A 2 -B 2Repository: rocketride-org/rocketride-server
Length of output: 7586
🏁 Script executed:
# Check the requirements to see which version of qdrant-client is used
cat nodes/src/nodes/qdrant/requirements.txtRepository: rocketride-org/rocketride-server
Length of output: 150
Use Qdrant's nested-key update for set_payload on lines 541 and 559; current code risks losing metadata.
Qdrant's set_payload() without the key parameter replaces the entire top-level payload object. With payload={'meta': {'isDeleted': True/False}}, this overwrites the meta object completely, discarding other nested fields (objectId, parent, nodeId, isTable, tableId, etc.) already stored there.
To safely update only meta.isDeleted while preserving sibling fields, use the key parameter:
self.client.set_payload(
collection_name=self.collection,
payload={'isDeleted': True}, # or False for markActive
key='meta',
points=filter_objectId,
)The test file nodes/test/test_qdrant_mark_deleted.py currently verifies the wrong behavior (lines 127 and 137). Update those assertions to check payload={'isDeleted': True/False} instead of {'meta': {'isDeleted': True/False}}, and verify that the key parameter is passed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/src/nodes/qdrant/qdrant.py` at line 541, The set_payload calls in the
Qdrant node are overwriting the entire meta object; change both calls (used in
the methods that markDeleted/markActive) to call self.client.set_payload with
key='meta' and payload={'isDeleted': True} (or False) and keep
points=filter_objectId so only meta.isDeleted is updated; then update
nodes/test/test_qdrant_mark_deleted.py assertions to expect
payload={'isDeleted': True/False} and assert that set_payload was invoked with
key='meta' rather than expecting the nested meta wrapper.
| class _FakeDocumentStoreBase: | ||
| """Minimal stand-in for ai.common.store.DocumentStoreBase.""" | ||
|
|
||
| def __init__(self, *a, **kw): | ||
| self.vectorSize = 0 | ||
| self.modelName = '' | ||
| self.threshold_search = 0.5 | ||
| self.collectionLock = threading.Lock() | ||
|
|
||
| def doesCollectionExist(self, *a, **kw): | ||
| return True | ||
|
|
||
| def _doesCollectionExist(self): | ||
| return True | ||
|
|
||
| def _checkCollectionExists(self): | ||
| return True | ||
|
|
||
| def createCollection(self, *a, **kw): | ||
| return True |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f test_qdrant_mark_deleted.pyRepository: rocketride-org/rocketride-server
Length of output: 114
🏁 Script executed:
cat -n nodes/test/test_qdrant_mark_deleted.py | sed -n '1,60p'Repository: rocketride-org/rocketride-server
Length of output: 2257
🏁 Script executed:
rg -t py 'ruff\|flake8\|\[tool.ruff\]' pyproject.toml ruff.toml .ruff.toml 2>/dev/null | head -30Repository: rocketride-org/rocketride-server
Length of output: 58
🏁 Script executed:
find . -maxdepth 2 -name 'pyproject.toml' -o -name 'ruff.toml' | head -5Repository: rocketride-org/rocketride-server
Length of output: 104
🏁 Script executed:
cat ruff.tomlRepository: rocketride-org/rocketride-server
Length of output: 2502
🏁 Script executed:
cd nodes/test && ruff check test_qdrant_mark_deleted.py --select ANN,ARGRepository: rocketride-org/rocketride-server
Length of output: 10459
Add annotations to the fake base methods so this file stays Ruff-clean.
These stubs trigger ANN/ARG warnings from Ruff. Add explicit return types and underscore-prefixed variadic parameter names to resolve missing type annotations and unused argument warnings. As per coding guidelines, nodes/**/*.py requires ruff for linting/formatting.
🧹 Proposed fix
- def __init__(self, *a, **kw):
+ def __init__(self, *_args: object, **_kwargs: object) -> None:
self.vectorSize = 0
self.modelName = ''
self.threshold_search = 0.5
self.collectionLock = threading.Lock()
- def doesCollectionExist(self, *a, **kw):
+ def doesCollectionExist(self, *_args: object, **_kwargs: object) -> bool:
return True
- def _doesCollectionExist(self):
+ def _doesCollectionExist(self) -> bool:
return True
- def _checkCollectionExists(self):
+ def _checkCollectionExists(self) -> bool:
return True
- def createCollection(self, *a, **kw):
+ def createCollection(self, *_args: object, **_kwargs: object) -> bool:
return True📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class _FakeDocumentStoreBase: | |
| """Minimal stand-in for ai.common.store.DocumentStoreBase.""" | |
| def __init__(self, *a, **kw): | |
| self.vectorSize = 0 | |
| self.modelName = '' | |
| self.threshold_search = 0.5 | |
| self.collectionLock = threading.Lock() | |
| def doesCollectionExist(self, *a, **kw): | |
| return True | |
| def _doesCollectionExist(self): | |
| return True | |
| def _checkCollectionExists(self): | |
| return True | |
| def createCollection(self, *a, **kw): | |
| return True | |
| class _FakeDocumentStoreBase: | |
| """Minimal stand-in for ai.common.store.DocumentStoreBase.""" | |
| def __init__(self, *_args: object, **_kwargs: object) -> None: | |
| self.vectorSize = 0 | |
| self.modelName = '' | |
| self.threshold_search = 0.5 | |
| self.collectionLock = threading.Lock() | |
| def doesCollectionExist(self, *_args: object, **_kwargs: object) -> bool: | |
| return True | |
| def _doesCollectionExist(self) -> bool: | |
| return True | |
| def _checkCollectionExists(self) -> bool: | |
| return True | |
| def createCollection(self, *_args: object, **_kwargs: object) -> bool: | |
| return True |
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 36-36: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
[warning] 36-36: Missing type annotation for *a
(ANN002)
[warning] 36-36: Unused method argument: a
(ARG002)
[warning] 36-36: Missing type annotation for **kw
(ANN003)
[warning] 36-36: Unused method argument: kw
(ARG002)
[warning] 42-42: Missing return type annotation for private function doesCollectionExist
Add return type annotation: bool
(ANN202)
[warning] 42-42: Missing type annotation for *a
(ANN002)
[warning] 42-42: Unused method argument: a
(ARG002)
[warning] 42-42: Missing type annotation for **kw
(ANN003)
[warning] 42-42: Unused method argument: kw
(ARG002)
[warning] 45-45: Missing return type annotation for private function _doesCollectionExist
Add return type annotation: bool
(ANN202)
[warning] 48-48: Missing return type annotation for private function _checkCollectionExists
Add return type annotation: bool
(ANN202)
[warning] 51-51: Missing return type annotation for private function createCollection
Add return type annotation: bool
(ANN202)
[warning] 51-51: Missing type annotation for *a
(ANN002)
[warning] 51-51: Unused method argument: a
(ARG002)
[warning] 51-51: Missing type annotation for **kw
(ANN003)
[warning] 51-51: Unused method argument: kw
(ARG002)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/test/test_qdrant_mark_deleted.py` around lines 33 - 52, Update the
_FakeDocumentStoreBase stubs to satisfy Ruff by adding explicit type
annotations: change __init__(self, *a, **kw) to __init__(self, *_a: Any, **_kw:
Any) and annotate its return type as None; change doesCollectionExist,
_doesCollectionExist, _checkCollectionExists, and createCollection signatures to
accept *_a: Any, **_kw: Any and return -> bool (or None where semantically
appropriate), and import Any from typing at the top; also rename variadic
parameters to underscored names (_a, _kw) to suppress unused-argument warnings
while keeping the same behavior.
| # Install module-level stubs before any qdrant imports | ||
| _mock_store_mod = MagicMock() | ||
| _mock_store_mod.DocumentStoreBase = _FakeDocumentStoreBase | ||
|
|
||
| _mock_config_mod = MagicMock() | ||
| _mock_schema_mod = MagicMock() | ||
|
|
||
| for name, mock in { | ||
| 'depends': MagicMock(), | ||
| 'engLib': MagicMock(), | ||
| 'rocketlib': MagicMock(), | ||
| 'numpy': MagicMock(), | ||
| 'ai': MagicMock(), | ||
| 'ai.common': MagicMock(), | ||
| 'ai.common.schema': _mock_schema_mod, | ||
| 'ai.common.store': _mock_store_mod, | ||
| 'ai.common.config': _mock_config_mod, | ||
| 'qdrant_client': MagicMock(), | ||
| 'qdrant_client.models': MagicMock(), | ||
| 'qdrant_client.http': MagicMock(), | ||
| 'qdrant_client.http.models': MagicMock(), | ||
| 'qdrant_client.conversions': MagicMock(), | ||
| 'qdrant_client.conversions.common_types': MagicMock(), | ||
| }.items(): | ||
| sys.modules.setdefault(name, mock) | ||
|
|
||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Import qdrant.py DIRECTLY (bypassing __init__.py which pulls IEndpoint etc.) | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| _qdrant_path = NODES_SRC / 'qdrant' / 'qdrant.py' | ||
| _spec = importlib.util.spec_from_file_location('_qdrant_store', str(_qdrant_path)) | ||
| _qdrant_mod = importlib.util.module_from_spec(_spec) | ||
| _spec.loader.exec_module(_qdrant_mod) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine the test file
find . -path "*/nodes/test/test_qdrant_mark_deleted.py" -type fRepository: rocketride-org/rocketride-server
Length of output: 116
🏁 Script executed:
# Check the full content of the test file to understand setup/teardown and sys.modules handling
cat -n nodes/test/test_qdrant_mark_deleted.py | head -200Repository: rocketride-org/rocketride-server
Length of output: 6467
🏁 Script executed:
# Search for any cleanup/restoration of sys.modules in the file
rg "sys.modules|patch|setUp|tearDown|fixture|autouse" nodes/test/test_qdrant_mark_deleted.py -A 2 -B 2Repository: rocketride-org/rocketride-server
Length of output: 598
🏁 Script executed:
# Check for any pytest conftest or module-level cleanup
fd conftest.py | xargs cat -n 2>/dev/null | head -100Repository: rocketride-org/rocketride-server
Length of output: 3729
Isolate the sys.modules mocks to prevent test pollution.
The mock modules installed at lines 62–79 persist indefinitely in the global import cache because sys.modules.setdefault() is never reversed. If this test module runs alongside other tests, later test runs will unexpectedly receive cached MagicMock instances for numpy, qdrant_client, and ai.* instead of real modules, breaking isolation.
Wrap the mock installation and exec_module call in a patch.dict context manager:
Example fix
with patch.dict(sys.modules, {
'depends': MagicMock(),
'engLib': MagicMock(),
# ... other mocks ...
'qdrant_client.conversions.common_types': MagicMock(),
}, clear=False):
_spec = importlib.util.spec_from_file_location('_qdrant_store', str(_qdrant_path))
_qdrant_mod = importlib.util.module_from_spec(_spec)
_spec.loader.exec_module(_qdrant_mod)This ensures mocks are only active during the import and are removed immediately afterward.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/test/test_qdrant_mark_deleted.py` around lines 55 - 89, The test
currently installs module-level mocks into sys.modules using
sys.modules.setdefault(...) which pollutes the global import cache; change this
to use patch.dict(sys.modules, {...}, clear=False) around the import logic so
the mocks apply only during the dynamic import of _qdrant_path; specifically,
replace the for-loop that sets mocks and the subsequent _spec/_qdrant_mod
import/exec_module sequence with a with patch.dict(sys.modules, {...},
clear=False): block that contains the spec_from_file_location, module_from_spec
and _spec.loader.exec_module(_qdrant_mod) calls, ensuring mocks are removed
after the block.
|
Evidence hardening pass for this PR. Reproduction (before)
Validation (after)
CI Evidence
Why this is distinct
|
Summary
markDeleted()andmarkActive()in the Qdrant vector store node to writeisDeletedto the correct nestedmeta.isDeletedpayload path instead of a top-levelisDeletedfieldBug
markDeleted()(line 541) andmarkActive()(line 559) innodes/src/nodes/qdrant/qdrant.pywrite{'isDeleted': True/False}to the top-level Qdrant payload. However, all query filters (_convertFilter, line 216) and the payload index (_createCollection, line 175) referencemeta.isDeleted— the nested path under themetaJSON object.Because these paths don't match, calling
markDeleted()creates a new top-level fieldisDeletedinstead of updatingmeta.isDeleted. The filter conditionmeta.isDeleted == Falsealways matches the original (unchanged) value, so soft-deleted documents continue appearing in every search, keyword query, and semantic query.Steps to Reproduce
meta.isDeleted = FalsemarkDeleted(['<objectId>'])on the storemeta.isDeleted(stillFalse), while the update wrote to top-levelisDeleted(a separate field)Static reproduction (code path analysis):
Root Cause
Qdrant stores document metadata as a nested JSON object under the
metakey in the payload (e.g.,meta.objectId,meta.nodeId,meta.isDeleted). ThemarkDeleted()andmarkActive()methods usedpayload={'isDeleted': True/False}, which writes to a top-levelisDeletedfield — a completely different payload path thanmeta.isDeleted.All other metadata operations in the Qdrant store correctly reference the
meta.*path:_createCollectioncreates a payload index onmeta.isDeleted_convertFilteradds filter conditions onmeta.isDeletedaddChunksstores the flag inside themetadictOnly
markDeletedandmarkActivewere inconsistent.Cross-reference: other stores handle this correctly — Chroma updates
res['metadata']['isDeleted'], Pinecone uses flat metadata whereisDeletedis already at the right level, and Weaviate hasisDeletedas a top-level property (not nested).Fix
Changed the
set_payloadcall in both methods to use a nested dict:Why This Works
Qdrant's
set_payloadwith a nested dict like{'meta': {'isDeleted': True}}performs a partial merge — it updates onlymeta.isDeletedwithin the existingmetaobject without overwriting other fields (nodeId,objectId,parent, etc.). This matches the path used by_convertFilter(meta.isDeleted) and the payload index created in_createCollection.Testing / Validation
New Tests (
nodes/test/test_qdrant_mark_deleted.py)test_markDeleted_writes_meta_isDeleted_true— verifiesset_payloadreceives{'meta': {'isDeleted': True}}test_markActive_writes_meta_isDeleted_false— verifiesset_payloadreceives{'meta': {'isDeleted': False}}test_markDeleted_skips_when_collection_missing— verifies no-op behaviortest_markActive_skips_when_collection_missing— verifies no-op behaviorAll 4 tests pass. All 312 existing contract tests pass.
Type
fix
Checklist
./builder test— contract tests pass (312 passed); fullbuilder testrequires C++ build + running server (not feasible locally without full dev setup), but all Python-level tests passRelated Issues
None (previously undiscovered bug)
Summary by CodeRabbit
#frontier-tower-hackathon
Impact
This fixes soft-delete correctness so deleted documents are actually excluded from retrieval and search as intended.
CI Evidence