fix(chroma): honor DocFilter.isTable=false in _convertFilter#325
fix(chroma): honor DocFilter.isTable=false in _convertFilter#325kgarg2468 wants to merge 5 commits intorocketride-org:developfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughUpdated Chroma Store filter construction to add the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/test/test_chroma_filter_semantics.py`:
- Around line 16-95: Tests install stub modules via _install_stubs() which
mutates sys.modules globally and leaks between tests; implement a
`@contextmanager` named _scoped_stubs() (with the docstring: "Temporarily install
stubs, restoring original modules on exit.") that snapshots sys.modules, calls
_install_stubs(), yields to the caller, and on exit restores the original
sys.modules state (removing any newly added stub modules and restoring replaced
ones); update _load_store_class() to use the _scoped_stubs() context manager
around the import so stubs are automatically scoped and restored after each
test.
- Line 19: Update the helper scaffolding to satisfy ruff ANN/ARG/PT rules:
rename unused lambda parameters in mod_depends.depends from *args, **kwargs to
*_args, **_kwargs; add explicit return type -> None and annotate parameters
*_args: object, **_kwargs: object on the three helper __init__ methods
referenced in the diff; add type annotations provider: object, connConfig:
object and return type -> dict to getNodeConfig and rename any unused params
with a leading underscore; add -> type return annotation to _load_store_class;
replace the compound assertion with two assertions (assert spec is not None and
assert spec.loader is not None) where spec is checked; annotate _doc_filter to
accept **overrides: object and return -> types.SimpleNamespace; and add -> None
return type annotations to the test functions at the bottom (lines 115+). Ensure
these changes reference the exact function/class names in the file so linters
pass.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 223c3f28-8743-4211-b936-fc1764096f08
📒 Files selected for processing (2)
nodes/src/nodes/chroma/chroma.pynodes/test/test_chroma_filter_semantics.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/test/test_chroma_filter_semantics.py`:
- Line 15: Replace the typing import: change the import statement that currently
reads "from typing import Iterator" to import Iterator from collections.abc
instead (i.e., use collections.abc.Iterator) in the
test_chroma_filter_semantics.py file so it satisfies Ruff UP035; update any
references if necessary and run the linter/tests to confirm the warning is
resolved.
- Around line 182-203: Replace the duplicated hardcoded stub module list in
test_load_store_class_does_not_leak_stub_modules with the module-level constant
_STUB_MODULE_NAMES so the test always uses the canonical list; locate the test
function test_load_store_class_does_not_leak_stub_modules, remove the local
stub_modules assignment and use _STUB_MODULE_NAMES when building the before
mapping, and ensure _load_store_class is still invoked and subsequent assertions
remain unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b226271f-9ab7-4f9b-bc96-c311ea927efa
📒 Files selected for processing (1)
nodes/test/test_chroma_filter_semantics.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/test/test_chroma_filter_semantics.py`:
- Around line 128-142: Add a PEP 257 docstring to the helper function
_doc_filter describing its purpose (constructing a SimpleNamespace representing
a document filter with default keys), listing the accepted keyword overrides and
return type (types.SimpleNamespace), and noting that keys like nodeId, isTable,
tableIds, parent, permissions, objectIds, isDeleted, chunkIds, minChunkId, and
maxChunkId are initialized to None by default; place the docstring immediately
below the def _doc_filter(...) line.
- Around line 116-125: Add a PEP 257 docstring to the helper function
_load_store_class(): immediately beneath def _load_store_class() add a short
triple-quoted description that explains the function’s purpose (dynamically load
the chroma.Store class for tests), any important behavior (uses _scoped_stubs
and resolves the chroma.py path) and what it returns (the Store class type).
Keep the docstring concise and consistent with other helpers in the module.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ea4b0c8b-8d03-4fc4-8678-3ba647c761a0
📒 Files selected for processing (1)
nodes/test/test_chroma_filter_semantics.py
Isolate the Chroma filter regression tests from runtime-only modules so they run reliably in CI and local environments without full engine dependencies. Made-with: Cursor
|
Addressed the remaining review friction with one focused follow-up commit: What changed
Why this addresses the concern
Validation (after)
CI evidence
|
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
nodes/test/test_chroma_filter_semantics.py (1)
140-140:⚠️ Potential issue | 🟡 MinorSplit the compound assertion at Line 140 for clearer pytest failures and lint compliance.
Use two assertions so failures report the exact missing condition.
♻️ Proposed fix
- assert spec is not None and spec.loader is not None + assert spec is not None + assert spec.loader is not NoneAs per coding guidelines,
nodes/**/*.pyshould use ruff for linting/formatting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/test/test_chroma_filter_semantics.py` at line 140, Split the compound assertion into two separate assertions so failures are clearer and lint-compliant: replace the single `assert spec is not None and spec.loader is not None` with `assert spec is not None` followed by `assert spec.loader is not None`, referencing the same `spec` and `spec.loader` variables in the test function to preserve behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/test/test_chroma_filter_semantics.py`:
- Around line 47-133: Extract the setup and teardown logic out of the long
_scoped_imports contextmanager into two small helpers (e.g.,
_install_scoped_imports() and _restore_scoped_imports(original_sys_path,
original_modules, original_stub_modules)), keeping _scoped_imports as a thin
wrapper that calls the installer, yields, then calls the restorer; move path
mutations, sys.modules stub creation (the depends/rocketlib/ai/numpy stubs and
assignments), and importlib.invalidate_caches() into _install_scoped_imports,
and move restoring sys.path and module cleanup/restoration (using
_is_scoped_module and original_modules/original_stub_modules captured via
_capture_scoped_modules/_STUB_MODULE_NAMES) into _restore_scoped_imports so
behavior is identical but readability and testability improve.
---
Duplicate comments:
In `@nodes/test/test_chroma_filter_semantics.py`:
- Line 140: Split the compound assertion into two separate assertions so
failures are clearer and lint-compliant: replace the single `assert spec is not
None and spec.loader is not None` with `assert spec is not None` followed by
`assert spec.loader is not None`, referencing the same `spec` and `spec.loader`
variables in the test function to preserve behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8b721b8e-0a34-4697-95cb-d8c5e6219c79
📒 Files selected for processing (1)
nodes/test/test_chroma_filter_semantics.py
| @contextmanager | ||
| def _scoped_imports() -> Iterator[None]: | ||
| """Temporarily prepend canonical Chroma mock paths and restore import state.""" | ||
| original_sys_path = list(sys.path) | ||
| original_modules = _capture_scoped_modules() | ||
| original_stub_modules = {name: sys.modules.get(name) for name in _STUB_MODULE_NAMES} | ||
|
|
||
| test_dir = Path(__file__).resolve().parent | ||
| mock_path = test_dir / 'mocks' | ||
| nodes_path = test_dir.parent / 'src' / 'nodes' | ||
|
|
||
| sys.path.insert(0, str(nodes_path)) | ||
| sys.path.insert(0, str(mock_path)) | ||
| # `nodes/src/nodes/chroma/chroma.py` imports runtime dependencies at module | ||
| # import time; install lightweight stubs so tests stay hermetic. | ||
| depends_module = ModuleType('depends') | ||
| depends_module.depends = lambda *_a, **_kw: None # type: ignore[attr-defined] | ||
| sys.modules['depends'] = depends_module | ||
|
|
||
| rocketlib_module = ModuleType('rocketlib') | ||
| rocketlib_module.debug = lambda *_a, **_kw: None # type: ignore[attr-defined] | ||
| sys.modules['rocketlib'] = rocketlib_module | ||
|
|
||
| ai_module = ModuleType('ai') | ||
| ai_common_module = ModuleType('ai.common') | ||
| ai_common_module.__path__ = [] # type: ignore[attr-defined] | ||
| ai_schema_module = ModuleType('ai.common.schema') | ||
| ai_store_module = ModuleType('ai.common.store') | ||
| ai_config_module = ModuleType('ai.common.config') | ||
| ai_transform_module = ModuleType('ai.common.transform') | ||
| numpy_module = ModuleType('numpy') | ||
|
|
||
| class _Doc: | ||
| pass | ||
|
|
||
| class _DocFilter: | ||
| pass | ||
|
|
||
| class _DocMetadata: | ||
| pass | ||
|
|
||
| class _QuestionText: | ||
| pass | ||
|
|
||
| class _DocumentStoreBase: | ||
| def __init__(self, *_a: object, **_kw: object) -> None: | ||
| pass | ||
|
|
||
| class _Config: | ||
| @staticmethod | ||
| def getNodeConfig(_provider: object, _connConfig: object) -> dict[str, object]: | ||
| return {} | ||
|
|
||
| class _IEndpointTransform: | ||
| pass | ||
|
|
||
| ai_schema_module.Doc = _Doc | ||
| ai_schema_module.DocFilter = _DocFilter | ||
| ai_schema_module.DocMetadata = _DocMetadata | ||
| ai_schema_module.QuestionText = _QuestionText | ||
| ai_store_module.DocumentStoreBase = _DocumentStoreBase | ||
| ai_config_module.Config = _Config | ||
| ai_transform_module.IEndpointTransform = _IEndpointTransform | ||
|
|
||
| sys.modules['ai'] = ai_module | ||
| sys.modules['ai.common'] = ai_common_module | ||
| sys.modules['ai.common.schema'] = ai_schema_module | ||
| sys.modules['ai.common.store'] = ai_store_module | ||
| sys.modules['ai.common.config'] = ai_config_module | ||
| sys.modules['ai.common.transform'] = ai_transform_module | ||
| sys.modules['numpy'] = numpy_module | ||
| importlib.invalidate_caches() | ||
| try: | ||
| yield | ||
| finally: | ||
| sys.path[:] = original_sys_path | ||
| for module_name in list(sys.modules): | ||
| if _is_scoped_module(module_name) and module_name not in original_modules: | ||
| sys.modules.pop(module_name, None) | ||
| for module_name, module in original_modules.items(): | ||
| sys.modules[module_name] = module | ||
| for module_name, module in original_stub_modules.items(): | ||
| if module is None: | ||
| sys.modules.pop(module_name, None) | ||
| else: | ||
| sys.modules[module_name] = module | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider splitting _scoped_imports into install/restore helpers for readability and safer maintenance.
Line 48 introduces a long context manager that does path mutation, stub injection, cache invalidation, and cleanup in one block. Extracting setup/teardown into small helpers would reduce cognitive load and make future changes less error-prone.
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 48-48: Too many statements (57 > 50)
(PLR0915)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/test/test_chroma_filter_semantics.py` around lines 47 - 133, Extract
the setup and teardown logic out of the long _scoped_imports contextmanager into
two small helpers (e.g., _install_scoped_imports() and
_restore_scoped_imports(original_sys_path, original_modules,
original_stub_modules)), keeping _scoped_imports as a thin wrapper that calls
the installer, yields, then calls the restorer; move path mutations, sys.modules
stub creation (the depends/rocketlib/ai/numpy stubs and assignments), and
importlib.invalidate_caches() into _install_scoped_imports, and move restoring
sys.path and module cleanup/restoration (using _is_scoped_module and
original_modules/original_stub_modules captured via
_capture_scoped_modules/_STUB_MODULE_NAMES) into _restore_scoped_imports so
behavior is identical but readability and testability improve.
|
Looks like there is PR #373 that is more comprehensive. |
|
Hi @kgarg2468, thanks for catching the |
Summary
DocFilter(isTable=False)was ignored in_convertFilter.isTable=TrueandisTable=None.isTablefilter semantics, including combined filters.Bug
nodes/src/nodes/chroma/chroma.pyused a truthy check:This skipped the filter when
isTable=False, so "exclude tables" requests were not applied.Steps to Reproduce
Expected: filter contains
{'isTable': {'$eq': False}}whenDocFilter.isTable=False.Actual (before fix):
isTableclause omitted.Root Cause
_convertFiltertreatedFalseas "not set" because it used a truthiness check instead of an explicitNonecheck.Fix
Changed the conditional in
nodes/src/nodes/chroma/chroma.py:if docFilter.isTable:if docFilter.isTable is not None:Added tests in
nodes/test/test_chroma_filter_semantics.py:isTable=False-> includes{'isTable': {'$eq': False}}isTable=True-> includes{'isTable': {'$eq': True}}isTable=None-> omitsisTableclausenodeId + isTable=False-> preserves both clausesWhy This Works
is not Nonecorrectly distinguishes explicit boolean values (True/False) from "unset" (None), matchingDocFiltersemantics and preventing false-value drops.Testing / Validation
Type
fix
Checklist
./builder testpassesRelated Issues
Summary by CodeRabbit
Bug Fixes
Tests
Tag: #frontier-tower-hackathon
Impact
This restores correct filtering semantics so explicit
isTable=Falsequeries no longer return incorrect results.CI Evidence