Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 23, 2025

📄 8% (0.08x) speedup for PhoneNumber._to_dict in weaviate/collections/classes/types.py

⏱️ Runtime : 952 microseconds 879 microseconds (best of 259 runs)

📝 Explanation and details

The optimization achieves an 8% speedup by eliminating dictionary mutation and reducing attribute access overhead.

Key changes:

  1. Single attribute access: Stores self.default_country in a local variable instead of accessing it twice, avoiding repeated attribute lookups.
  2. Direct dictionary construction: Instead of creating a base dict and conditionally adding keys, it directly constructs the appropriate dictionary in each branch, eliminating the overhead of dictionary mutation operations.
  3. Early return pattern: Uses early return for the None case, reducing code path complexity.

Why this is faster:

  • Dictionary mutation (out["defaultCountry"] = value) requires Python to perform additional internal checks and memory operations compared to direct construction
  • Attribute access (self.default_country) involves Python's descriptor protocol, so caching it locally saves repeated lookups
  • Direct dict construction {"input": ..., "defaultCountry": ...} is more efficient than incremental building

Performance characteristics:
The optimization shows particularly strong gains (30-50% faster) when default_country is not None, as seen in most test cases. For None cases, the improvement is more modest (8-25%) since the original code already had fewer operations. This makes the optimization especially valuable for typical usage patterns where phone numbers often include country information.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 8101 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Dict, Mapping, Optional

# imports
import pytest
from pydantic import BaseModel, Field
from weaviate.collections.classes.types import PhoneNumber


# Simulate the _PhoneNumberBase as a Pydantic BaseModel for testability
class _PhoneNumberBase(BaseModel):
    number: str
from weaviate.collections.classes.types import PhoneNumber

# unit tests

# 1. Basic Test Cases

def test_basic_with_number_and_default_country():
    # Test with both number and default_country provided
    pn = PhoneNumber(number="1234567890", default_country="US")
    codeflash_output = pn._to_dict(); result = codeflash_output # 919ns -> 836ns (9.93% faster)

def test_basic_with_number_only():
    # Test with only number provided, default_country is None
    pn = PhoneNumber(number="1234567890")
    codeflash_output = pn._to_dict(); result = codeflash_output # 748ns -> 557ns (34.3% faster)

def test_basic_with_number_and_default_country_none_explicit():
    # Test with default_country explicitly set to None
    pn = PhoneNumber(number="1234567890", default_country=None)
    codeflash_output = pn._to_dict(); result = codeflash_output # 692ns -> 534ns (29.6% faster)

# 2. Edge Test Cases

def test_edge_empty_number():
    # Test with empty string for number
    pn = PhoneNumber(number="", default_country="CA")
    codeflash_output = pn._to_dict(); result = codeflash_output # 837ns -> 566ns (47.9% faster)

def test_edge_empty_default_country():
    # Test with empty string for default_country (should be included)
    pn = PhoneNumber(number="9876543210", default_country="")
    codeflash_output = pn._to_dict(); result = codeflash_output # 797ns -> 557ns (43.1% faster)

def test_edge_number_with_special_characters():
    # Test with special characters in number
    pn = PhoneNumber(number="+1 (234) 567-8900", default_country="US")
    codeflash_output = pn._to_dict(); result = codeflash_output # 793ns -> 549ns (44.4% faster)

def test_edge_default_country_lowercase():
    # Test with lowercase default_country (should not be altered)
    pn = PhoneNumber(number="5551234", default_country="gb")
    codeflash_output = pn._to_dict(); result = codeflash_output # 766ns -> 530ns (44.5% faster)

def test_edge_default_country_numeric():
    # Test with numeric default_country (not a valid ISO code, but should be handled)
    pn = PhoneNumber(number="5551234", default_country="12")
    codeflash_output = pn._to_dict(); result = codeflash_output # 754ns -> 533ns (41.5% faster)

def test_edge_number_and_default_country_both_empty():
    # Both number and default_country are empty strings
    pn = PhoneNumber(number="", default_country="")
    codeflash_output = pn._to_dict(); result = codeflash_output # 759ns -> 534ns (42.1% faster)

def test_edge_default_country_is_none_string():
    # default_country is string "None", should be included
    pn = PhoneNumber(number="123", default_country="None")
    codeflash_output = pn._to_dict(); result = codeflash_output # 749ns -> 503ns (48.9% faster)

# 3. Large Scale Test Cases

def test_large_scale_long_number():
    # Test with a very long phone number string (999 digits)
    long_number = "1" * 999
    pn = PhoneNumber(number=long_number, default_country="US")
    codeflash_output = pn._to_dict(); result = codeflash_output # 746ns -> 506ns (47.4% faster)

def test_large_scale_long_default_country():
    # Test with a very long default_country string (999 chars)
    long_country = "A" * 999
    pn = PhoneNumber(number="1234567890", default_country=long_country)
    codeflash_output = pn._to_dict(); result = codeflash_output # 795ns -> 515ns (54.4% faster)

def test_large_scale_many_instances():
    # Create 1000 PhoneNumber instances and verify their dicts
    for i in range(1000):
        number = str(i)
        country = f"CC{i}"
        pn = PhoneNumber(number=number, default_country=country)
        codeflash_output = pn._to_dict(); result = codeflash_output # 245μs -> 216μs (13.4% faster)

def test_large_scale_many_instances_with_none():
    # Create 1000 PhoneNumber instances with default_country=None
    for i in range(1000):
        number = str(i)
        pn = PhoneNumber(number=number, default_country=None)
        codeflash_output = pn._to_dict(); result = codeflash_output # 208μs -> 204μs (1.87% faster)

# Additional edge: ensure no mutation of output dict
def test_output_is_new_dict_each_time():
    # The returned dict should not be the same object across calls
    pn = PhoneNumber(number="123", default_country="US")
    codeflash_output = pn._to_dict(); out1 = codeflash_output # 850ns -> 595ns (42.9% faster)
    codeflash_output = pn._to_dict(); out2 = codeflash_output # 383ns -> 269ns (42.4% faster)
    # Mutating one should not affect the other
    out1["input"] = "mutated"

# Additional edge: ensure dict keys are exactly as expected
def test_dict_keys_are_exact():
    pn = PhoneNumber(number="123", default_country="DE")
    codeflash_output = pn._to_dict(); d = codeflash_output # 733ns -> 496ns (47.8% faster)
    pn2 = PhoneNumber(number="123")
    codeflash_output = pn2._to_dict(); d2 = codeflash_output # 320ns -> 312ns (2.56% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Dict, Mapping, Optional

# imports
import pytest  # used for our unit tests
from pydantic import Field
from weaviate.collections.classes.types import PhoneNumber


# Minimal base class to allow instantiation for testing
class _PhoneNumberBase:
    def __init__(self, number: str):
        self.number = number
from weaviate.collections.classes.types import PhoneNumber

# unit tests

# 1. Basic Test Cases

def test_to_dict_basic_with_default_country():
    # Test with both number and default_country provided
    pn = PhoneNumber(number="0123456789", default_country="DE")
    codeflash_output = pn._to_dict(); result = codeflash_output # 745ns -> 481ns (54.9% faster)

def test_to_dict_basic_without_default_country():
    # Test with only number provided, default_country is None
    pn = PhoneNumber(number="0123456789")
    codeflash_output = pn._to_dict(); result = codeflash_output # 577ns -> 461ns (25.2% faster)

def test_to_dict_basic_empty_number():
    # Test with empty string as number
    pn = PhoneNumber(number="", default_country="US")
    codeflash_output = pn._to_dict(); result = codeflash_output # 727ns -> 473ns (53.7% faster)

def test_to_dict_basic_empty_number_and_no_country():
    # Test with empty string as number and no default_country
    pn = PhoneNumber(number="")
    codeflash_output = pn._to_dict(); result = codeflash_output # 571ns -> 466ns (22.5% faster)

# 2. Edge Test Cases


def test_to_dict_edge_default_country_empty_string():
    # Test with empty string as default_country
    pn = PhoneNumber(number="0123456789", default_country="")
    codeflash_output = pn._to_dict(); result = codeflash_output # 1.06μs -> 705ns (49.9% faster)

def test_to_dict_edge_default_country_numeric():
    # Test with numeric string as default_country
    pn = PhoneNumber(number="0123456789", default_country="123")
    codeflash_output = pn._to_dict(); result = codeflash_output # 854ns -> 554ns (54.2% faster)

def test_to_dict_edge_special_characters_in_number():
    # Test with special characters in number
    pn = PhoneNumber(number="+1 (800) 123-4567", default_country="US")
    codeflash_output = pn._to_dict(); result = codeflash_output # 830ns -> 573ns (44.9% faster)

def test_to_dict_edge_special_characters_in_default_country():
    # Test with special characters in default_country
    pn = PhoneNumber(number="0123456789", default_country="U$-!")
    codeflash_output = pn._to_dict(); result = codeflash_output # 828ns -> 533ns (55.3% faster)

def test_to_dict_edge_default_country_none_explicit():
    # Test with explicit None for default_country
    pn = PhoneNumber(number="0123456789", default_country=None)
    codeflash_output = pn._to_dict(); result = codeflash_output # 584ns -> 536ns (8.96% faster)

# 3. Large Scale Test Cases

def test_to_dict_large_many_instances():
    # Test creating many instances and calling _to_dict on each
    numbers = [f"{i:010d}" for i in range(1000)]  # 1000 unique numbers, zero-padded
    countries = ["US", "DE", "FR", "JP", None]
    results = []
    for i, num in enumerate(numbers):
        country = countries[i % len(countries)]
        pn = PhoneNumber(number=num, default_country=country)
        codeflash_output = pn._to_dict(); result = codeflash_output # 235μs -> 219μs (7.34% faster)
        results.append(result)
        if country is not None:
            pass
        else:
            pass

def test_to_dict_large_long_number_and_country():
    # Test with very long number and very long default_country
    long_number = "1" * 512  # 512 digits
    long_country = "X" * 256  # 256 characters
    pn = PhoneNumber(number=long_number, default_country=long_country)
    codeflash_output = pn._to_dict(); result = codeflash_output # 854ns -> 552ns (54.7% faster)

def test_to_dict_large_all_empty_strings():
    # Test with many instances all with empty strings for number and default_country
    for _ in range(1000):
        pn = PhoneNumber(number="", default_country="")
        codeflash_output = pn._to_dict(); result = codeflash_output # 237μs -> 219μs (8.26% faster)

def test_to_dict_large_varied_inputs():
    # Test with a mix of valid, empty, and special values
    numbers = ["", "123", "+44 20 7123 4567", "0000000000", "A" * 50]
    countries = [None, "", "GB", "U$-!", "Z" * 50]
    for num in numbers:
        for country in countries:
            pn = PhoneNumber(number=num, default_country=country)
            codeflash_output = pn._to_dict(); result = codeflash_output
            if country is not None:
                pass
            else:
                pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from weaviate.collections.classes.types import PhoneNumber

def test_PhoneNumber__to_dict():
    PhoneNumber._to_dict(PhoneNumber(number='', default_country=''))

Timer unit: 1e-09 s

To edit these changes git checkout codeflash/optimize-PhoneNumber._to_dict-mh2z0jtu and push.

Codeflash

The optimization achieves an 8% speedup by eliminating dictionary mutation and reducing attribute access overhead. 

**Key changes:**
1. **Single attribute access**: Stores `self.default_country` in a local variable instead of accessing it twice, avoiding repeated attribute lookups.
2. **Direct dictionary construction**: Instead of creating a base dict and conditionally adding keys, it directly constructs the appropriate dictionary in each branch, eliminating the overhead of dictionary mutation operations.
3. **Early return pattern**: Uses early return for the `None` case, reducing code path complexity.

**Why this is faster:**
- Dictionary mutation (`out["defaultCountry"] = value`) requires Python to perform additional internal checks and memory operations compared to direct construction
- Attribute access (`self.default_country`) involves Python's descriptor protocol, so caching it locally saves repeated lookups
- Direct dict construction `{"input": ..., "defaultCountry": ...}` is more efficient than incremental building

**Performance characteristics:**
The optimization shows particularly strong gains (30-50% faster) when `default_country` is not None, as seen in most test cases. For None cases, the improvement is more modest (8-25%) since the original code already had fewer operations. This makes the optimization especially valuable for typical usage patterns where phone numbers often include country information.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 23, 2025 05:17
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant