⚡️ Speed up function _pstdev by 507%
#279
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 507% (5.07x) speedup for
_pstdevinunstructured/metrics/utils.py⏱️ Runtime :
10.4 milliseconds→1.72 milliseconds(best of93runs)📝 Explanation and details
The optimized code achieves a 507% speedup by eliminating two major performance bottlenecks in the original implementation:
Key Optimizations
1. Single-Pass Computation with Welford's Algorithm
The original code made two full passes over the data:
Nonevalues (scores = [score for score in scores if score is not None])statistics.pstdev()internally iterates again to compute mean and varianceThe optimized version uses Welford's online algorithm to compute the population standard deviation in a single pass, updating running statistics (count, mean, sum of squared differences) incrementally as it encounters each non-
Nonevalue.2. Eliminated Intermediate List Allocation
The original code allocates a new filtered list in memory. For inputs with 1000 elements, this creates a new list structure with associated overhead. The optimized version processes elements on-the-fly without allocating any intermediate collections, reducing memory pressure and allocation costs.
3. Direct Math Operations
Instead of calling
statistics.pstdev()(which has its own overhead for parameter validation and general-purpose handling), the optimized code directly computesvariance = M2 / nandstd = math.sqrt(variance), avoiding the function call overhead.Performance Impact by Scenario
Line profiler data shows the original code spent 90% of time in
round(statistics.pstdev(scores), rounding), making this the critical hot spot.Test results demonstrate consistent speedups across scenarios:
test_basic_two_elements: 50.0μs → 7.75μs)test_large_scale_500_elements: 490μs → 85.7μs)test_large_scale_1000_elements: 961μs → 168μs)Nonevalues: Up to 10× faster (e.g.,test_filtering_none_preserves_order_invariance: 910-1057% speedup) because the original code still allocates the filtered list even if most elements areNoneThe optimization is particularly effective when:
Nonevalues need filtering (avoids allocating sparse lists)Edge case note: Single-element lists are slightly slower (20-26%) because the optimized code still performs the loop setup, whereas the original code quickly returns after the filter check. This minor regression is negligible given the function returns
Nonefor single elements anyway.✅ Correctness verification report:
⚙️ Click to see Existing Unit Tests
metrics/test_utils.py::test_stats🌀 Click to see Generated Regression Tests
🔎 Click to see Concolic Coverage Tests
codeflash_concolic_xdo_puqm/tmpouw5j67n/test_concolic_coverage.py::test__pstdevTo edit these changes
git checkout codeflash/optimize-_pstdev-mks5i0w6and push.