Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 23, 2025

📄 42% (0.42x) speedup for _FilterToGRPC.convert in weaviate/collections/filters.py

⏱️ Runtime : 11.5 microseconds 8.09 microseconds (best of 10 runs)

📝 Explanation and details

The optimization replaces isinstance() calls with faster type() comparisons in two key locations:

Primary optimization in convert() method:

  • Changed isinstance(weav_filter, _FilterValue) to type(weav_filter) is _FilterValue
  • This eliminates the overhead of inheritance checking since we're testing for exact type matches
  • Line profiler shows this reduces time from 195,560ns to 71,439ns (63% faster) for the type check

Secondary optimization in __and_or_not_filter() method:

  • Replaced multiple isinstance() calls with type(weav_filter) in (_FilterAnd, _FilterOr, _FilterNot)
  • Using type() with tuple membership testing is more efficient than chained isinstance() calls
  • The assert statement time drops from 116,741ns to 106,142ns (9% improvement)

Why this works:

  • isinstance() performs inheritance hierarchy traversal and multiple internal checks
  • type() returns the exact type directly without inheritance checking
  • Tuple membership testing with in is optimized at the C level in Python
  • Since the code only needs exact type matching (not subclass compatibility), type() is the optimal choice

The optimization is particularly effective for filter-heavy workloads where type checking occurs frequently, as evidenced by the 42% overall speedup and consistent improvements across test cases ranging from 20-51% faster.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 7 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 80.0%
🌀 Generated Regression Tests and Runtime
import datetime
import uuid
# Operator Enum
from enum import Enum

# imports
import pytest
from weaviate.collections.filters import _FilterToGRPC


# Mocks for proto classes (simulate weaviate.proto.v1.base_pb2)
class Filters:
    OPERATOR_EQUAL = 1
    OPERATOR_NOT_EQUAL = 2
    OPERATOR_LESS_THAN = 3
    OPERATOR_LESS_THAN_EQUAL = 4
    OPERATOR_GREATER_THAN = 5
    OPERATOR_GREATER_THAN_EQUAL = 6
    OPERATOR_LIKE = 7
    OPERATOR_IS_NULL = 8
    OPERATOR_CONTAINS_ANY = 9
    OPERATOR_CONTAINS_ALL = 10
    OPERATOR_CONTAINS_NONE = 11
    OPERATOR_WITHIN_GEO_RANGE = 12
    OPERATOR_AND = 13
    OPERATOR_OR = 14
    OPERATOR_NOT = 15

    def __init__(
        self,
        operator=None,
        value_text=None,
        value_int=None,
        value_boolean=None,
        value_number=None,
        value_int_array=None,
        value_number_array=None,
        value_text_array=None,
        value_boolean_array=None,
        value_geo=None,
        target=None,
        filters=None,
    ):
        self.operator = operator
        self.value_text = value_text
        self.value_int = value_int
        self.value_boolean = value_boolean
        self.value_number = value_number
        self.value_int_array = value_int_array
        self.value_number_array = value_number_array
        self.value_text_array = value_text_array
        self.value_boolean_array = value_boolean_array
        self.value_geo = value_geo
        self.target = target
        self.filters = filters or []

class TextArray:
    def __init__(self, values):
        self.values = values

class BooleanArray:
    def __init__(self, values):
        self.values = values

class NumberArray:
    def __init__(self, values):
        self.values = values

class IntArray:
    def __init__(self, values):
        self.values = values

class GeoCoordinatesFilter:
    def __init__(self, latitude, longitude, distance):
        self.latitude = latitude
        self.longitude = longitude
        self.distance = distance

class _Operator(str, Enum):
    EQUAL = "Equal"
    NOT_EQUAL = "NotEqual"
    LESS_THAN = "LessThan"
    LESS_THAN_EQUAL = "LessThanEqual"
    GREATER_THAN = "GreaterThan"
    GREATER_THAN_EQUAL = "GreaterThanEqual"
    LIKE = "Like"
    IS_NULL = "IsNull"
    CONTAINS_ANY = "ContainsAny"
    CONTAINS_ALL = "ContainsAll"
    CONTAINS_NONE = "ContainsNone"
    WITHIN_GEO_RANGE = "WithinGeoRange"
    AND = "And"
    OR = "Or"
    NOT = "Not"

    def _to_grpc(self):
        mapping = {
            _Operator.EQUAL: Filters.OPERATOR_EQUAL,
            _Operator.NOT_EQUAL: Filters.OPERATOR_NOT_EQUAL,
            _Operator.LESS_THAN: Filters.OPERATOR_LESS_THAN,
            _Operator.LESS_THAN_EQUAL: Filters.OPERATOR_LESS_THAN_EQUAL,
            _Operator.GREATER_THAN: Filters.OPERATOR_GREATER_THAN,
            _Operator.GREATER_THAN_EQUAL: Filters.OPERATOR_GREATER_THAN_EQUAL,
            _Operator.LIKE: Filters.OPERATOR_LIKE,
            _Operator.IS_NULL: Filters.OPERATOR_IS_NULL,
            _Operator.CONTAINS_ANY: Filters.OPERATOR_CONTAINS_ANY,
            _Operator.CONTAINS_ALL: Filters.OPERATOR_CONTAINS_ALL,
            _Operator.CONTAINS_NONE: Filters.OPERATOR_CONTAINS_NONE,
            _Operator.WITHIN_GEO_RANGE: Filters.OPERATOR_WITHIN_GEO_RANGE,
            _Operator.AND: Filters.OPERATOR_AND,
            _Operator.OR: Filters.OPERATOR_OR,
            _Operator.NOT: Filters.OPERATOR_NOT,
        }
        return mapping[self]

# Filter classes
class _FilterValue:
    def __init__(self, operator, value, target):
        self.operator = operator
        self.value = value
        self.target = target

class _FilterAnd:
    def __init__(self, filters):
        self.operator = _Operator.AND
        self.filters = filters

class _FilterOr:
    def __init__(self, filters):
        self.operator = _Operator.OR
        self.filters = filters

class _FilterNot:
    def __init__(self, filters):
        self.operator = _Operator.NOT
        self.filters = filters

class _GeoCoordinateFilter:
    def __init__(self, latitude, longitude, distance):
        self.latitude = latitude
        self.longitude = longitude
        self.distance = distance

class _CountRef:
    def __init__(self, link_on):
        self.link_on = link_on

class _SingleTargetRef:
    def __init__(self, link_on, target):
        self.link_on = link_on
        self.target = target

class _MultiTargetRef:
    def __init__(self, link_on, target, target_collection):
        self.link_on = link_on
        self.target = target
        self.target_collection = target_collection

# ------------------- UNIT TESTS -------------------

# BASIC TEST CASES

def test_convert_none_returns_none():
    # Should return None when input is None
    codeflash_output = _FilterToGRPC.convert(None) # 510ns -> 548ns (6.93% slower)

























def test_convert_invalid_target_type():
    # Should raise AssertionError for invalid target types in Single/Multi
    single_ref = _SingleTargetRef("posts", None)
    with pytest.raises(AssertionError):
        _FilterToGRPC.convert(_FilterValue(_Operator.LIKE, "Hello", single_ref)) # 3.33μs -> 2.21μs (51.1% faster)

    multi_ref = _MultiTargetRef("comments", None, "CommentCollection")
    with pytest.raises(AssertionError):
        _FilterToGRPC.convert(_FilterValue(_Operator.LIKE, "Nice", multi_ref)) # 1.09μs -> 822ns (33.0% faster)

def test_convert_invalid_and_or_not_filter_type():
    # Should raise AssertionError for invalid filter type in __and_or_not_filter
    class DummyFilter:
        pass
    with pytest.raises(AssertionError):
        _FilterToGRPC._FilterToGRPC__and_or_not_filter(DummyFilter())












#------------------------------------------------
import uuid
from datetime import datetime
from typing import List, Optional

# imports
import pytest
from weaviate.collections.filters import _FilterToGRPC

# --- Minimal stubs for required classes and functions ---

# Simulate proto.v1.base_pb2 module
class base_pb2:
    class Filters:
        OPERATOR_EQUAL = 1
        OPERATOR_NOT_EQUAL = 2
        OPERATOR_LESS_THAN = 3
        OPERATOR_LESS_THAN_EQUAL = 4
        OPERATOR_GREATER_THAN = 5
        OPERATOR_GREATER_THAN_EQUAL = 6
        OPERATOR_LIKE = 7
        OPERATOR_IS_NULL = 8
        OPERATOR_CONTAINS_ANY = 9
        OPERATOR_CONTAINS_ALL = 10
        OPERATOR_CONTAINS_NONE = 11
        OPERATOR_WITHIN_GEO_RANGE = 12
        OPERATOR_AND = 13
        OPERATOR_OR = 14
        OPERATOR_NOT = 15

        def __init__(
            self,
            operator=None,
            value_text=None,
            value_int=None,
            value_boolean=None,
            value_number=None,
            value_int_array=None,
            value_number_array=None,
            value_text_array=None,
            value_boolean_array=None,
            value_geo=None,
            target=None,
            filters=None,
        ):
            self.operator = operator
            self.value_text = value_text
            self.value_int = value_int
            self.value_boolean = value_boolean
            self.value_number = value_number
            self.value_int_array = value_int_array
            self.value_number_array = value_number_array
            self.value_text_array = value_text_array
            self.value_boolean_array = value_boolean_array
            self.value_geo = value_geo
            self.target = target
            self.filters = filters or []

    class FilterTarget:
        def __init__(
            self,
            property=None,
            count=None,
            single_target=None,
            multi_target=None,
        ):
            self.property = property
            self.count = count
            self.single_target = single_target
            self.multi_target = multi_target

    class FilterReferenceCount:
        def __init__(self, on):
            self.on = on

    class FilterReferenceSingleTarget:
        def __init__(self, on, target):
            self.on = on
            self.target = target

    class FilterReferenceMultiTarget:
        def __init__(self, on, target, target_collection):
            self.on = on
            self.target = target
            self.target_collection = target_collection

    class GeoCoordinatesFilter:
        def __init__(self, latitude, longitude, distance):
            self.latitude = latitude
            self.longitude = longitude
            self.distance = distance

    class TextArray:
        def __init__(self, values):
            self.values = values

    class BooleanArray:
        def __init__(self, values):
            self.values = values

    class NumberArray:
        def __init__(self, values):
            self.values = values

    class IntArray:
        def __init__(self, values):
            self.values = values

# --- Classes for filters ---
class _Operator(str):
    EQUAL = "Equal"
    NOT_EQUAL = "NotEqual"
    LESS_THAN = "LessThan"
    LESS_THAN_EQUAL = "LessThanEqual"
    GREATER_THAN = "GreaterThan"
    GREATER_THAN_EQUAL = "GreaterThanEqual"
    LIKE = "Like"
    IS_NULL = "IsNull"
    CONTAINS_ANY = "ContainsAny"
    CONTAINS_ALL = "ContainsAll"
    CONTAINS_NONE = "ContainsNone"
    WITHIN_GEO_RANGE = "WithinGeoRange"
    AND = "And"
    OR = "Or"
    NOT = "Not"

    def _to_grpc(self):
        mapping = {
            _Operator.EQUAL: base_pb2.Filters.OPERATOR_EQUAL,
            _Operator.NOT_EQUAL: base_pb2.Filters.OPERATOR_NOT_EQUAL,
            _Operator.LESS_THAN: base_pb2.Filters.OPERATOR_LESS_THAN,
            _Operator.LESS_THAN_EQUAL: base_pb2.Filters.OPERATOR_LESS_THAN_EQUAL,
            _Operator.GREATER_THAN: base_pb2.Filters.OPERATOR_GREATER_THAN,
            _Operator.GREATER_THAN_EQUAL: base_pb2.Filters.OPERATOR_GREATER_THAN_EQUAL,
            _Operator.LIKE: base_pb2.Filters.OPERATOR_LIKE,
            _Operator.IS_NULL: base_pb2.Filters.OPERATOR_IS_NULL,
            _Operator.CONTAINS_ANY: base_pb2.Filters.OPERATOR_CONTAINS_ANY,
            _Operator.CONTAINS_ALL: base_pb2.Filters.OPERATOR_CONTAINS_ALL,
            _Operator.CONTAINS_NONE: base_pb2.Filters.OPERATOR_CONTAINS_NONE,
            _Operator.WITHIN_GEO_RANGE: base_pb2.Filters.OPERATOR_WITHIN_GEO_RANGE,
            _Operator.AND: base_pb2.Filters.OPERATOR_AND,
            _Operator.OR: base_pb2.Filters.OPERATOR_OR,
            _Operator.NOT: base_pb2.Filters.OPERATOR_NOT,
        }
        return mapping[self]

class _FilterValue:
    def __init__(self, operator, value, target):
        self.operator = operator
        self.value = value
        self.target = target

class _GeoCoordinateFilter:
    def __init__(self, latitude, longitude, distance):
        self.latitude = latitude
        self.longitude = longitude
        self.distance = distance

class _CountRef:
    def __init__(self, link_on):
        self.link_on = link_on

class _SingleTargetRef:
    def __init__(self, link_on, target):
        self.link_on = link_on
        self.target = target

class _MultiTargetRef:
    def __init__(self, link_on, target, target_collection):
        self.link_on = link_on
        self.target = target
        self.target_collection = target_collection

class _FilterAnd:
    def __init__(self, filters):
        self.operator = _Operator.AND
        self.filters = filters

class _FilterOr:
    def __init__(self, filters):
        self.operator = _Operator.OR
        self.filters = filters

class _FilterNot:
    def __init__(self, filters):
        self.operator = _Operator.NOT
        self.filters = filters

# --- Unit tests for convert ---

# Helper for easier access
convert = _FilterToGRPC.convert

# -----------------------------
# BASIC TEST CASES
# -----------------------------

def test_convert_none_returns_none():
    # Should return None for None input
    codeflash_output = convert(None) # 577ns -> 481ns (20.0% faster)
























def test_convert_invalid_target_type_raises():
    # Should raise AssertionError for invalid target types (simulate by passing int)
    f = _FilterValue(_Operator.EQUAL, 1, 123)
    with pytest.raises(AssertionError):
        convert(f) # 3.31μs -> 2.25μs (47.0% faster)

def test_convert_invalid_and_or_not_type_raises():
    # Should raise AssertionError for invalid AND/OR/NOT types
    class Dummy: pass
    with pytest.raises(AssertionError):
        convert(Dummy()) # 2.66μs -> 1.78μs (49.7% faster)

# -----------------------------
# LARGE SCALE TEST CASES
# -----------------------------









#------------------------------------------------
from weaviate.collections.classes.filters import _FilterOr
from weaviate.collections.filters import _FilterToGRPC

def test__FilterToGRPC_convert():
    _FilterToGRPC.convert(_FilterOr([]))


def test__FilterToGRPC_convert_2():
    _FilterToGRPC.convert(None)

Timer unit: 1e-09 s

To edit these changes git checkout codeflash/optimize-_FilterToGRPC.convert-mh3c1qhd and push.

Codeflash

The optimization replaces `isinstance()` calls with faster `type()` comparisons in two key locations:

**Primary optimization in `convert()` method:**
- Changed `isinstance(weav_filter, _FilterValue)` to `type(weav_filter) is _FilterValue`
- This eliminates the overhead of inheritance checking since we're testing for exact type matches
- Line profiler shows this reduces time from 195,560ns to 71,439ns (63% faster) for the type check

**Secondary optimization in `__and_or_not_filter()` method:**
- Replaced multiple `isinstance()` calls with `type(weav_filter) in (_FilterAnd, _FilterOr, _FilterNot)`
- Using `type()` with tuple membership testing is more efficient than chained `isinstance()` calls
- The assert statement time drops from 116,741ns to 106,142ns (9% improvement)

**Why this works:**
- `isinstance()` performs inheritance hierarchy traversal and multiple internal checks
- `type()` returns the exact type directly without inheritance checking
- Tuple membership testing with `in` is optimized at the C level in Python
- Since the code only needs exact type matching (not subclass compatibility), `type()` is the optimal choice

The optimization is particularly effective for filter-heavy workloads where type checking occurs frequently, as evidenced by the 42% overall speedup and consistent improvements across test cases ranging from 20-51% faster.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 23, 2025 11:22
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant