⚡️ Speed up function `_pstdev` by 507% #279

codeflash-ai · 2026-01-24T10:12:34Z

📄 507% (5.07x) speedup for `_pstdev` in `unstructured/metrics/utils.py`

⏱️ Runtime : 10.4 milliseconds → 1.72 milliseconds (best of 93 runs)

📝 Explanation and details

The optimized code achieves a 507% speedup by eliminating two major performance bottlenecks in the original implementation:

Key Optimizations

1. Single-Pass Computation with Welford's Algorithm
The original code made two full passes over the data:

First pass: List comprehension to filter out None values (scores = [score for score in scores if score is not None])
Second pass: statistics.pstdev() internally iterates again to compute mean and variance

The optimized version uses Welford's online algorithm to compute the population standard deviation in a single pass, updating running statistics (count, mean, sum of squared differences) incrementally as it encounters each non-None value.

2. Eliminated Intermediate List Allocation
The original code allocates a new filtered list in memory. For inputs with 1000 elements, this creates a new list structure with associated overhead. The optimized version processes elements on-the-fly without allocating any intermediate collections, reducing memory pressure and allocation costs.

3. Direct Math Operations
Instead of calling statistics.pstdev() (which has its own overhead for parameter validation and general-purpose handling), the optimized code directly computes variance = M2 / n and std = math.sqrt(variance), avoiding the function call overhead.

Performance Impact by Scenario

Line profiler data shows the original code spent 90% of time in round(statistics.pstdev(scores), rounding), making this the critical hot spot.

Test results demonstrate consistent speedups across scenarios:

Small lists (2-5 elements): 5-8× faster (e.g., test_basic_two_elements: 50.0μs → 7.75μs)
Medium lists (100-500 elements): 4-5× faster (e.g., test_large_scale_500_elements: 490μs → 85.7μs)
Large lists (1000 elements): 5-6× faster (e.g., test_large_scale_1000_elements: 961μs → 168μs)
Lists with many None values: Up to 10× faster (e.g., test_filtering_none_preserves_order_invariance: 910-1057% speedup) because the original code still allocates the filtered list even if most elements are None

The optimization is particularly effective when:

Input lists are large (>100 elements)
Many None values need filtering (avoids allocating sparse lists)
The function is called repeatedly in metrics computation pipelines (cumulative savings)

Edge case note: Single-element lists are slightly slower (20-26%) because the optimized code still performs the loop setup, whereas the original code quickly returns after the filter check. This minor regression is negligible given the function returns None for single elements anyway.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	✅ 5 Passed
🌀 Generated Regression Tests	✅ 66 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	✅ 1 Passed
📊 Tests Coverage	100.0%

⚙️ Click to see Existing Unit Tests

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`metrics/test_utils.py::test_stats`	54.8μs	13.3μs	313%✅

🌀 Click to see Generated Regression Tests

import copy  # used to verify the input list is not mutated
import statistics  # used to compute expected results

# imports
from unstructured.metrics.utils import _pstdev


def test_basic_simple_list_default_rounding():
    # Basic scenario: small list of floats, default rounding (3)
    scores = [1.0, 2.0, 3.0]  # simple increasing sequence
    # expected value computed using the same std function then rounded to 3 decimals
    expected = round(statistics.pstdev(scores), 3)
    # assert exact equality because both sides are derived from the same deterministic operations
    codeflash_output = _pstdev(scores)  # 28.6μs -> 4.71μs (508% faster)


def test_basic_integers_rounding_zero_returns_raw():
    # Basic scenario: integer inputs and rounding=0 should be treated as "no rounding"
    scores = [2, 4, 4, 4, 5, 5, 7, 9]  # classic example from statistics docs
    # statistics.pstdev should be returned directly when rounding is falsy (0)
    expected_raw = statistics.pstdev(scores)
    codeflash_output = _pstdev(scores, rounding=0)
    result = codeflash_output  # 29.5μs -> 5.59μs (428% faster)


def test_rounding_none_is_treated_as_no_rounding():
    # Edge scenario: passing rounding=None should behave like "no rounding" since None is falsy
    scores = [10.0, 12.0, 23.0, 23.0, 16.0]
    expected_raw = statistics.pstdev(scores)
    # None should be treated as falsy => returns the raw standard deviation
    codeflash_output = _pstdev(scores, rounding=None)  # 30.0μs -> 4.25μs (606% faster)


def test_ignore_none_values_and_compute():
    # Edge scenario: input list contains None values that should be filtered out
    scores = [None, 1.0, None, 2.0]  # after filtering => [1.0, 2.0]
    # population stdev of [1.0, 2.0] is 0.5, rounded to default 3 decimals stays 0.5
    expected = round(statistics.pstdev([1.0, 2.0]), 3)
    codeflash_output = _pstdev(scores)  # 27.3μs -> 4.62μs (491% faster)


def test_all_none_or_empty_returns_none():
    # Edge scenario: no valid numeric entries => return None
    codeflash_output = _pstdev([])  # 1.53μs -> 909ns (68.8% faster)
    codeflash_output = _pstdev([None, None])  # 831ns -> 649ns (28.0% faster)


def test_single_element_returns_none():
    # Edge scenario: after filtering if only one numeric value remains => return None
    codeflash_output = _pstdev([5.0])  # 1.75μs -> 2.38μs (26.5% slower)
    codeflash_output = _pstdev([None, 5.0, None])  # 902ns -> 1.18μs (23.4% slower)


def test_negative_rounding_supported():
    # Edge scenario: negative rounding should be passed directly to round() (e.g., -1 rounds to tens)
    scores = [10, 20, 30]
    # compute expected using statistics.pstdev and round with -1
    expected = round(statistics.pstdev(scores), -1)
    codeflash_output = _pstdev(scores, rounding=-1)  # 28.3μs -> 5.63μs (403% faster)


def test_large_scale_sequence_under_1000_elements():
    # Large-scale scenario: use a sequence near the 1000-element limit to check performance/scalability
    # Use 999 elements (0..998) to stay under the 1000-element recommendation
    scores = list(range(999))
    # expected value computed with statistics.pstdev and rounded to default 3 decimals
    expected = round(statistics.pstdev(scores), 3)
    # use approx for floating comparison
    codeflash_output = _pstdev(scores)  # 738μs -> 192μs (285% faster)


def test_large_scale_with_many_none_entries():
    # Large-scale scenario with many entries but many are None
    # Build 999-length list where only every 3rd element is numeric
    data = [i if (i % 3 == 0) else None for i in range(999)]
    # Build the filtered numeric list to compute expected
    filtered = [i for i in range(999) if (i % 3 == 0)]
    expected = round(statistics.pstdev(filtered), 3)
    codeflash_output = _pstdev(data)  # 274μs -> 80.0μs (243% faster)


def test_input_list_not_mutated_by_function():
    # Behavior check: the function should not modify the incoming list (immutability of input)
    original = [1.0, None, 2.0, None]
    original_copy = copy.deepcopy(original)  # make a deep copy to compare after call
    codeflash_output = _pstdev(original)
    _ = codeflash_output  # 48.9μs -> 7.69μs (536% faster)


def test_boolean_values_treated_as_numbers():
    # Edge-case: booleans are instances of int in Python and should be treated as numeric values
    scores = [True, False, True]  # equivalent to [1, 0, 1]
    expected = round(statistics.pstdev([1, 0, 1]), 3)
    codeflash_output = _pstdev(scores)  # 28.7μs -> 5.16μs (456% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import statistics

from unstructured.metrics.utils import _pstdev


def test_basic_two_elements():
    """Test basic functionality with two elements."""
    scores = [1.0, 2.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 50.0μs -> 7.75μs (545% faster)


def test_basic_multiple_elements():
    """Test basic functionality with multiple elements."""
    scores = [1.0, 2.0, 3.0, 4.0, 5.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 52.4μs -> 8.12μs (546% faster)
    # Expected: pstdev of [1, 2, 3, 4, 5] is sqrt(2), rounded to 3 decimals
    expected = round(statistics.pstdev(scores), 3)


def test_basic_identical_elements():
    """Test with all identical elements."""
    scores = [5.0, 5.0, 5.0, 5.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 50.7μs -> 7.95μs (538% faster)


def test_basic_negative_numbers():
    """Test with negative numbers."""
    scores = [-5.0, -3.0, -1.0, 1.0, 3.0, 5.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 53.6μs -> 8.42μs (537% faster)
    expected = round(statistics.pstdev(scores), 3)


def test_basic_mixed_positive_negative():
    """Test with mixed positive and negative numbers."""
    scores = [-10.0, 0.0, 10.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 51.6μs -> 7.46μs (593% faster)
    expected = round(statistics.pstdev(scores), 3)


def test_basic_small_decimals():
    """Test with small decimal values."""
    scores = [0.1, 0.2, 0.3]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 71.6μs -> 7.81μs (818% faster)
    expected = round(statistics.pstdev(scores), 3)


def test_basic_large_numbers():
    """Test with large numbers."""
    scores = [1000.0, 2000.0, 3000.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 51.9μs -> 7.85μs (561% faster)
    expected = round(statistics.pstdev(scores), 3)


def test_basic_default_rounding():
    """Test that default rounding is 3 decimal places."""
    scores = [1.0, 2.0, 3.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 49.9μs -> 7.74μs (545% faster)


def test_basic_custom_rounding_1():
    """Test with custom rounding to 1 decimal place."""
    scores = [1.0, 2.0, 3.0, 4.0, 5.0]
    codeflash_output = _pstdev(scores, rounding=1)
    result = codeflash_output  # 52.6μs -> 8.52μs (517% faster)
    expected = round(statistics.pstdev(scores), 1)


def test_basic_custom_rounding_5():
    """Test with custom rounding to 5 decimal places."""
    scores = [1.0, 2.0, 3.0]
    codeflash_output = _pstdev(scores, rounding=5)
    result = codeflash_output  # 50.3μs -> 8.36μs (501% faster)
    expected = round(statistics.pstdev(scores), 5)


def test_edge_empty_list():
    """Test with empty list."""
    scores = []
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 1.60μs -> 922ns (74.1% faster)


def test_edge_single_element():
    """Test with single element."""
    scores = [5.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 1.82μs -> 2.42μs (24.8% slower)


def test_edge_all_none_values():
    """Test with list containing only None values."""
    scores = [None, None, None]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 1.81μs -> 1.10μs (64.4% faster)


def test_edge_single_non_none_value():
    """Test with multiple None values but only one non-None value."""
    scores = [None, 5.0, None, None]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 2.01μs -> 2.55μs (21.2% slower)


def test_edge_mixed_none_and_values():
    """Test with mixed None and numeric values."""
    scores = [None, 1.0, None, 2.0, None, 3.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 50.9μs -> 8.16μs (524% faster)
    # After filtering None, we have [1.0, 2.0, 3.0]
    expected = round(statistics.pstdev([1.0, 2.0, 3.0]), 3)


def test_edge_rounding_zero():
    """Test with rounding=0 (should return full precision)."""
    scores = [1.0, 2.0, 3.0]
    codeflash_output = _pstdev(scores, rounding=0)
    result = codeflash_output  # 47.1μs -> 5.30μs (789% faster)
    # When rounding=0, `not rounding` is True, so return without rounding
    expected = statistics.pstdev(scores)


def test_edge_rounding_none():
    """Test with rounding=None (should return full precision)."""
    scores = [1.0, 2.0, 3.0]
    codeflash_output = _pstdev(scores, rounding=None)
    result = codeflash_output  # 47.5μs -> 5.16μs (822% faster)
    # When rounding=None, `not rounding` is True, so return without rounding
    expected = statistics.pstdev(scores)


def test_edge_rounding_false():
    """Test with rounding=False (should return full precision)."""
    scores = [1.0, 2.0, 3.0]
    codeflash_output = _pstdev(scores, rounding=False)
    result = codeflash_output  # 46.6μs -> 5.18μs (799% faster)
    # When rounding=False, `not rounding` is True, so return without rounding
    expected = statistics.pstdev(scores)


def test_edge_very_small_rounding():
    """Test with very small rounding value."""
    scores = [1.123456789, 2.987654321, 3.456789123]
    codeflash_output = _pstdev(scores, rounding=10)
    result = codeflash_output  # 84.5μs -> 8.78μs (862% faster)
    expected = round(statistics.pstdev(scores), 10)


def test_edge_negative_single_value():
    """Test with single negative value."""
    scores = [-5.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 1.86μs -> 2.33μs (20.5% slower)


def test_edge_zero_values():
    """Test with all zero values."""
    scores = [0.0, 0.0, 0.0, 0.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 51.2μs -> 8.02μs (538% faster)


def test_edge_two_identical_elements():
    """Test with exactly two identical elements."""
    scores = [7.0, 7.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 49.5μs -> 7.48μs (562% faster)


def test_edge_two_different_elements():
    """Test with exactly two different elements."""
    scores = [1.0, 3.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 49.3μs -> 7.70μs (541% faster)
    expected = round(statistics.pstdev([1.0, 3.0]), 3)


def test_edge_very_close_values():
    """Test with very close floating point values."""
    scores = [1.0000001, 1.0000002, 1.0000003]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 74.0μs -> 8.10μs (813% faster)
    expected = round(statistics.pstdev(scores), 3)


def test_edge_extreme_value_range():
    """Test with extreme range of values."""
    scores = [0.000001, 1000000.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 72.4μs -> 7.51μs (864% faster)
    expected = round(statistics.pstdev(scores), 3)


def test_edge_rounding_higher_than_precision():
    """Test when rounding decimal places exceed the natural precision."""
    scores = [1.0, 2.0]
    codeflash_output = _pstdev(scores, rounding=15)
    result = codeflash_output  # 49.2μs -> 8.44μs (483% faster)
    expected = round(statistics.pstdev(scores), 15)


def test_edge_none_at_start_and_end():
    """Test with None values at start and end of list."""
    scores = [None, 2.0, 3.0, None]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 49.0μs -> 7.61μs (544% faster)
    expected = round(statistics.pstdev([2.0, 3.0]), 3)


def test_edge_consecutive_none_values():
    """Test with consecutive None values in middle."""
    scores = [1.0, None, None, None, 5.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 48.9μs -> 7.72μs (532% faster)
    expected = round(statistics.pstdev([1.0, 5.0]), 3)


def test_large_scale_100_elements():
    """Test with 100 elements to verify scalability."""
    scores = [float(i) for i in range(100)]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 133μs -> 22.6μs (490% faster)
    expected = round(statistics.pstdev(scores), 3)


def test_large_scale_500_elements():
    """Test with 500 elements."""
    scores = [float(i) for i in range(500)]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 490μs -> 85.7μs (473% faster)
    expected = round(statistics.pstdev(scores), 3)


def test_large_scale_1000_elements():
    """Test with 1000 elements."""
    scores = [float(i) for i in range(1000)]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 961μs -> 168μs (471% faster)
    expected = round(statistics.pstdev(scores), 3)


def test_large_scale_repeated_pattern():
    """Test with large list of repeated pattern."""
    # Create pattern [1, 2, 1, 2, ...] repeated 250 times
    scores = [1.0, 2.0] * 250
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 452μs -> 86.3μs (424% faster)
    expected = round(statistics.pstdev(scores), 3)


def test_large_scale_with_many_none_values():
    """Test with large list containing many None values."""
    # Create list with 800 elements, alternating between None and float
    scores = [float(i) if i % 2 == 0 else None for i in range(800)]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 407μs -> 78.1μs (422% faster)
    filtered = [float(i) for i in range(800) if i % 2 == 0]
    expected = round(statistics.pstdev(filtered), 3)


def test_large_scale_negative_range():
    """Test with large list of negative to positive range."""
    scores = [float(i) for i in range(-500, 500)]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 939μs -> 165μs (467% faster)
    expected = round(statistics.pstdev(scores), 3)


def test_large_scale_decimals():
    """Test with large list of decimal values."""
    scores = [i * 0.1 for i in range(1000)]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 1.79ms -> 166μs (973% faster)
    expected = round(statistics.pstdev(scores), 3)


def test_large_scale_identical_large_list():
    """Test large list with all identical values."""
    scores = [42.0] * 500
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 447μs -> 86.8μs (415% faster)


def test_large_scale_two_clusters():
    """Test large list with two distinct clusters of values."""
    # 400 values around 10, 600 values around 20
    scores = [10.0] * 400 + [20.0] * 600
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 857μs -> 168μs (410% faster)
    expected = round(statistics.pstdev(scores), 3)


def test_large_scale_custom_rounding():
    """Test large scale with custom rounding."""
    scores = [float(i) * 1.5 for i in range(500)]
    codeflash_output = _pstdev(scores, rounding=2)
    result = codeflash_output  # 516μs -> 87.1μs (493% faster)
    expected = round(statistics.pstdev(scores), 2)


def test_large_scale_full_precision():
    """Test large scale returning full precision (no rounding)."""
    scores = [float(i) / 3.0 for i in range(100)]
    codeflash_output = _pstdev(scores, rounding=0)
    result = codeflash_output  # 288μs -> 19.9μs (1347% faster)
    expected = statistics.pstdev(scores)


def test_type_return_is_float_or_none():
    """Test that return type is always float or None."""
    # With valid data
    scores = [1.0, 2.0, 3.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 50.0μs -> 7.87μs (535% faster)

    # With single element
    scores = [5.0]
    codeflash_output = _pstdev(scores)
    result = codeflash_output  # 1.15μs -> 1.09μs (5.34% faster)


def test_return_precision_matches_rounding_parameter():
    """Test that returned value precision matches rounding parameter."""
    scores = [1.111111, 2.222222, 3.333333]

    # Test rounding=1
    codeflash_output = _pstdev(scores, rounding=1)
    result = codeflash_output  # 73.8μs -> 8.19μs (801% faster)

    # Test rounding=2
    codeflash_output = _pstdev(scores, rounding=2)
    result = codeflash_output  # 46.5μs -> 2.96μs (1471% faster)

    # Test rounding=4
    codeflash_output = _pstdev(scores, rounding=4)
    result = codeflash_output  # 42.4μs -> 2.26μs (1778% faster)


def test_filtering_none_preserves_order_invariance():
    """Test that filtering None values doesn't affect result."""
    values = [1.0, 2.0, 3.0]

    # Same values in different positions with None
    scores1 = [None, 1.0, None, 2.0, 3.0, None]
    scores2 = [1.0, None, 2.0, None, 3.0]
    scores3 = [1.0, 2.0, 3.0, None, None]

    codeflash_output = _pstdev(scores1)
    result1 = codeflash_output  # 49.8μs -> 7.73μs (544% faster)
    codeflash_output = _pstdev(scores2)
    result2 = codeflash_output  # 27.1μs -> 2.68μs (910% faster)
    codeflash_output = _pstdev(scores3)
    result3 = codeflash_output  # 24.2μs -> 2.09μs (1057% faster)
    expected = round(statistics.pstdev(values), 3)


def test_consistency_with_statistics_pstdev():
    """Test that results are consistent with statistics.pstdev."""
    scores = [1.5, 2.7, 3.2, 4.8, 5.1]

    # Get result from our function
    codeflash_output = _pstdev(scores, rounding=10)
    result = codeflash_output  # 85.5μs -> 8.67μs (886% faster)

    # Get expected from statistics module
    expected = statistics.pstdev(scores)


def test_float_vs_int_inputs():
    """Test that integer and float inputs are handled the same."""
    int_scores = [1, 2, 3, 4, 5]
    float_scores = [1.0, 2.0, 3.0, 4.0, 5.0]

    codeflash_output = _pstdev(int_scores)
    result_int = codeflash_output  # 50.0μs -> 8.37μs (497% faster)
    codeflash_output = _pstdev(float_scores)
    result_float = codeflash_output  # 30.6μs -> 2.98μs (927% faster)


def test_order_independence():
    """Test that order of elements doesn't affect result."""
    scores1 = [1.0, 2.0, 3.0, 4.0, 5.0]
    scores2 = [5.0, 1.0, 3.0, 2.0, 4.0]
    scores3 = [5.0, 4.0, 3.0, 2.0, 1.0]

    codeflash_output = _pstdev(scores1)
    result1 = codeflash_output  # 51.9μs -> 7.84μs (562% faster)
    codeflash_output = _pstdev(scores2)
    result2 = codeflash_output  # 28.3μs -> 2.82μs (904% faster)
    codeflash_output = _pstdev(scores3)
    result3 = codeflash_output  # 25.6μs -> 2.31μs (1009% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from unstructured.metrics.utils import _pstdev


def test__pstdev():
    _pstdev([], rounding=0)

🔎 Click to see Concolic Coverage Tests

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`codeflash_concolic_xdo_puqm/tmpouw5j67n/test_concolic_coverage.py::test__pstdev`	1.84μs	1.40μs	32.0%✅

To edit these changes git checkout codeflash/optimize-_pstdev-mks5i0w6 and push.

The optimized code achieves a **507% speedup** by eliminating two major performance bottlenecks in the original implementation: ## Key Optimizations **1. Single-Pass Computation with Welford's Algorithm** The original code made two full passes over the data: - First pass: List comprehension to filter out `None` values (`scores = [score for score in scores if score is not None]`) - Second pass: `statistics.pstdev()` internally iterates again to compute mean and variance The optimized version uses Welford's online algorithm to compute the population standard deviation in a **single pass**, updating running statistics (count, mean, sum of squared differences) incrementally as it encounters each non-`None` value. **2. Eliminated Intermediate List Allocation** The original code allocates a new filtered list in memory. For inputs with 1000 elements, this creates a new list structure with associated overhead. The optimized version processes elements on-the-fly without allocating any intermediate collections, reducing memory pressure and allocation costs. **3. Direct Math Operations** Instead of calling `statistics.pstdev()` (which has its own overhead for parameter validation and general-purpose handling), the optimized code directly computes `variance = M2 / n` and `std = math.sqrt(variance)`, avoiding the function call overhead. ## Performance Impact by Scenario **Line profiler data shows the original code spent 90% of time in `round(statistics.pstdev(scores), rounding)`**, making this the critical hot spot. Test results demonstrate consistent speedups across scenarios: - **Small lists (2-5 elements)**: 5-8× faster (e.g., `test_basic_two_elements`: 50.0μs → 7.75μs) - **Medium lists (100-500 elements)**: 4-5× faster (e.g., `test_large_scale_500_elements`: 490μs → 85.7μs) - **Large lists (1000 elements)**: 5-6× faster (e.g., `test_large_scale_1000_elements`: 961μs → 168μs) - **Lists with many `None` values**: Up to 10× faster (e.g., `test_filtering_none_preserves_order_invariance`: 910-1057% speedup) because the original code still allocates the filtered list even if most elements are `None` The optimization is particularly effective when: - Input lists are large (>100 elements) - Many `None` values need filtering (avoids allocating sparse lists) - The function is called repeatedly in metrics computation pipelines (cumulative savings) **Edge case note**: Single-element lists are slightly slower (20-26%) because the optimized code still performs the loop setup, whereas the original code quickly returns after the filter check. This minor regression is negligible given the function returns `None` for single elements anyway.

codeflash-ai bot requested a review from aseembits93 January 24, 2026 10:12

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Jan 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `_pstdev` by 507% #279

⚡️ Speed up function `_pstdev` by 507% #279

Uh oh!

codeflash-ai bot commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

⚡️ Speed up function _pstdev by 507% #279

Are you sure you want to change the base?

⚡️ Speed up function _pstdev by 507% #279

Uh oh!

Conversation

codeflash-ai bot commented Jan 24, 2026

📄 507% (5.07x) speedup for _pstdev in unstructured/metrics/utils.py

📝 Explanation and details

Key Optimizations

Performance Impact by Scenario

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

⚡️ Speed up function `_pstdev` by 507% #279

⚡️ Speed up function `_pstdev` by 507% #279

📄 507% (5.07x) speedup for `_pstdev` in `unstructured/metrics/utils.py`