Skip to content

⚡️ Speed up function get_user_labels by 21%#42

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-get_user_labels-mgzf6y4o
Open

⚡️ Speed up function get_user_labels by 21%#42
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-get_user_labels-mgzf6y4o

Conversation

@codeflash-ai
Copy link
Copy Markdown

@codeflash-ai codeflash-ai bot commented Oct 20, 2025

📄 21% (0.21x) speedup for get_user_labels in pr_agent/algo/utils.py

⏱️ Runtime : 7.16 milliseconds 5.90 milliseconds (best of 287 runs)

📝 Explanation and details

The optimization achieves a 21% speedup through three key improvements:

1. Eliminated Redundant Settings Calls
The original code called get_settings() twice - once for enable_custom_labels and once for custom_labels. The optimization caches the settings object in a single variable, reducing expensive context lookups from 67 calls to 34 calls (50% reduction).

2. O(1) Set Lookups vs O(n) List Lookups

  • Added a constant _SYSTEM_LABELS set for reserved label checking, replacing the inline list ['bug fix', 'tests', ...]
  • Converts custom_labels to a set for O(1) membership testing instead of O(n) list scanning
  • For the large mixed test case with 1000 labels, this optimization alone provides a 206% speedup (456μs → 149μs)

3. List Comprehension Instead of Manual Loop
Replaced the explicit for-loop with a list comprehension that combines both filtering conditions in a single expression. This reduces Python bytecode overhead and leverages optimized internal iteration.

The optimizations are particularly effective for:

  • Large label lists: 11-15% faster for 1000+ labels
  • Mixed label scenarios: Up to 206% faster when filtering both system and custom labels
  • High custom label counts: 11.5% improvement with 500 custom labels

The changes maintain identical behavior while significantly improving performance through algorithmic efficiency (O(1) lookups) and reduced function call overhead.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 33 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import List
# We'll need to patch get_settings and get_logger for deterministic behavior.
from unittest.mock import MagicMock, patch

# imports
import pytest
from pr_agent.algo.utils import get_user_labels

# Patch targets for get_settings and get_logger
# We'll use patch as a decorator for each test to keep tests isolated and clear.

# =========================
# Basic Test Cases
# =========================

@patch("pr_agent.algo.utils.get_logger", return_value=MagicMock())
@patch("pr_agent.algo.utils.get_settings")
def test_empty_input(mock_get_settings, mock_get_logger):
    # Test with no current_labels (None)
    mock_get_settings.return_value.config = {}
    mock_get_settings.return_value.get.return_value = []
    codeflash_output = get_user_labels(); result = codeflash_output # 30.0μs -> 13.4μs (124% faster)

@patch("pr_agent.algo.utils.get_logger", return_value=MagicMock())
@patch("pr_agent.algo.utils.get_settings")
def test_no_labels(mock_get_settings, mock_get_logger):
    # Test with empty list
    mock_get_settings.return_value.config = {}
    mock_get_settings.return_value.get.return_value = []
    codeflash_output = get_user_labels([]); result = codeflash_output # 25.8μs -> 11.7μs (122% faster)

@patch("pr_agent.algo.utils.get_logger", return_value=MagicMock())
@patch("pr_agent.algo.utils.get_settings")
def test_all_reserved_labels(mock_get_settings, mock_get_logger):
    # All labels are reserved, should return empty
    mock_get_settings.return_value.config = {}
    mock_get_settings.return_value.get.return_value = []
    reserved = ['bug fix', 'tests', 'enhancement', 'documentation', 'other']
    codeflash_output = get_user_labels(reserved); result = codeflash_output # 26.5μs -> 12.4μs (113% faster)

@patch("pr_agent.algo.utils.get_logger", return_value=MagicMock())
@patch("pr_agent.algo.utils.get_settings")
def test_case_insensitive_reserved_labels(mock_get_settings, mock_get_logger):
    # Reserved labels with different case
    mock_get_settings.return_value.config = {}
    mock_get_settings.return_value.get.return_value = []
    reserved = ['Bug Fix', 'TESTS', 'Enhancement', 'Documentation', 'Other']
    codeflash_output = get_user_labels(reserved); result = codeflash_output # 25.1μs -> 11.9μs (111% faster)

@patch("pr_agent.algo.utils.get_logger", return_value=MagicMock())
@patch("pr_agent.algo.utils.get_settings")
def test_mixed_labels_no_custom(mock_get_settings, mock_get_logger):
    # Mixture of reserved and user labels, no custom labels enabled
    mock_get_settings.return_value.config = {'enable_custom_labels': False}
    mock_get_settings.return_value.get.return_value = []
    labels = ['bug fix', 'mylabel', 'Enhancement', 'feature', 'other']
    codeflash_output = get_user_labels(labels); result = codeflash_output # 43.5μs -> 32.5μs (33.9% faster)

# =========================
# Edge Test Cases
# =========================

@patch("pr_agent.algo.utils.get_logger", return_value=MagicMock())
@patch("pr_agent.algo.utils.get_settings")
def test_custom_labels_enabled_exclude_custom(mock_get_settings, mock_get_logger):
    # Custom labels enabled, should exclude those in custom_labels
    mock_get_settings.return_value.config = {'enable_custom_labels': True}
    mock_get_settings.return_value.get.return_value = ['mylabel', 'feature']
    labels = ['mylabel', 'feature', 'user1', 'documentation']
    codeflash_output = get_user_labels(labels); result = codeflash_output # 40.8μs -> 39.9μs (2.38% faster)

@patch("pr_agent.algo.utils.get_logger", return_value=MagicMock())
@patch("pr_agent.algo.utils.get_settings")
def test_custom_labels_enabled_no_overlap(mock_get_settings, mock_get_logger):
    # Custom labels enabled, but none overlap
    mock_get_settings.return_value.config = {'enable_custom_labels': True}
    mock_get_settings.return_value.get.return_value = ['custom1', 'custom2']
    labels = ['user1', 'user2']
    codeflash_output = get_user_labels(labels); result = codeflash_output # 39.6μs -> 37.8μs (4.50% faster)

@patch("pr_agent.algo.utils.get_logger", return_value=MagicMock())
@patch("pr_agent.algo.utils.get_settings")
def test_custom_labels_disabled(mock_get_settings, mock_get_logger):
    # Custom labels disabled, should not exclude any custom labels
    mock_get_settings.return_value.config = {'enable_custom_labels': False}
    mock_get_settings.return_value.get.return_value = ['mylabel']
    labels = ['mylabel', 'user1']
    codeflash_output = get_user_labels(labels); result = codeflash_output # 39.4μs -> 27.1μs (45.4% faster)

@patch("pr_agent.algo.utils.get_logger", return_value=MagicMock())
@patch("pr_agent.algo.utils.get_settings")
def test_label_with_whitespace(mock_get_settings, mock_get_logger):
    # Label with leading/trailing whitespace (should not match reserved)
    mock_get_settings.return_value.config = {}
    mock_get_settings.return_value.get.return_value = []
    labels = [' bug fix ', 'user1']
    codeflash_output = get_user_labels(labels); result = codeflash_output # 40.8μs -> 27.8μs (46.6% faster)

@patch("pr_agent.algo.utils.get_logger", return_value=MagicMock())
@patch("pr_agent.algo.utils.get_settings")
def test_label_substring_reserved(mock_get_settings, mock_get_logger):
    # Label is a substring of reserved, should not match
    mock_get_settings.return_value.config = {}
    mock_get_settings.return_value.get.return_value = []
    labels = ['bug', 'fix', 'enhance', 'documentation', 'other']
    codeflash_output = get_user_labels(labels); result = codeflash_output # 40.3μs -> 28.5μs (41.0% faster)

@patch("pr_agent.algo.utils.get_logger", return_value=MagicMock())
@patch("pr_agent.algo.utils.get_settings")
def test_exception_in_settings(mock_get_settings, mock_get_logger):
    # get_settings raises exception, should return input as output
    mock_get_settings.side_effect = Exception("Settings error")
    labels = ['user1', 'user2']
    codeflash_output = get_user_labels(labels); result = codeflash_output # 25.8μs -> 27.5μs (6.13% slower)

@patch("pr_agent.algo.utils.get_logger", return_value=MagicMock())
@patch("pr_agent.algo.utils.get_settings")

def test_large_input_all_user_labels(mock_get_settings, mock_get_logger):
    # Large number of user labels
    mock_get_settings.return_value.config = {'enable_custom_labels': False}
    mock_get_settings.return_value.get.return_value = []
    labels = [f"user_label_{i}" for i in range(1000)]
    codeflash_output = get_user_labels(labels); result = codeflash_output # 129μs -> 114μs (12.6% faster)

@patch("pr_agent.algo.utils.get_logger", return_value=MagicMock())
@patch("pr_agent.algo.utils.get_settings")
def test_large_input_mixed_labels(mock_get_settings, mock_get_logger):
    # Large number of mixed labels
    mock_get_settings.return_value.config = {'enable_custom_labels': True}
    # Custom labels: every 10th label is custom
    custom_labels = [f"user_label_{i}" for i in range(0, 1000, 10)]
    mock_get_settings.return_value.get.return_value = custom_labels
    labels = []
    for i in range(1000):
        if i % 20 == 0:
            labels.append('bug fix')  # reserved
        elif i % 10 == 0:
            labels.append(f"user_label_{i}")  # custom
        else:
            labels.append(f"other_label_{i}")  # user
    codeflash_output = get_user_labels(labels); result = codeflash_output # 456μs -> 149μs (206% faster)
    # Should remove reserved and custom, keep user
    expected = [f"other_label_{i}" for i in range(1000) if i % 10 != 0 and i % 20 != 0]

@patch("pr_agent.algo.utils.get_logger", return_value=MagicMock())
@patch("pr_agent.algo.utils.get_settings")
def test_large_input_all_reserved(mock_get_settings, mock_get_logger):
    # Large number of reserved labels
    mock_get_settings.return_value.config = {}
    mock_get_settings.return_value.get.return_value = []
    reserved = ['bug fix', 'tests', 'enhancement', 'documentation', 'other']
    labels = [reserved[i % len(reserved)] for i in range(1000)]
    codeflash_output = get_user_labels(labels); result = codeflash_output # 55.9μs -> 43.2μs (29.6% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import List

# imports
import pytest
from pr_agent.algo.utils import get_user_labels

# Function to test (from pr_agent/algo/utils.py, with dependencies simplified for testability)
# We'll define minimal stubs for get_settings and get_logger to isolate get_user_labels logic.


# --- Dependency stubs/mocks for isolated testing ---

class DummyLogger:
    def __init__(self):
        self.debug_calls = []
        self.exception_calls = []
    def debug(self, msg):
        self.debug_calls.append(msg)
    def exception(self, msg):
        self.exception_calls.append(msg)

class DummySettings:
    def __init__(self, config=None, custom_labels=None):
        self.config = config or {}
        self._custom_labels = custom_labels if custom_labels is not None else []
    def get(self, key, default=None):
        if key == "custom_labels":
            return self._custom_labels
        return default

# We'll patch these globals in our tests
_test_settings = DummySettings()
_test_logger = DummyLogger()
from pr_agent.algo.utils import get_user_labels

# --- Basic Test Cases ---

def test_empty_labels_returns_empty_list():
    # No labels provided
    codeflash_output = get_user_labels([]) # 279μs -> 235μs (18.4% faster)

def test_none_labels_returns_empty_list():
    # None as input should be treated as empty list
    codeflash_output = get_user_labels(None) # 249μs -> 209μs (19.0% faster)

def test_removes_predefined_labels_case_insensitive():
    # Should remove all predefined labels regardless of case
    labels = ['Bug Fix', 'tests', 'ENHANCEMENT', 'Documentation', 'other']
    codeflash_output = get_user_labels(labels) # 243μs -> 207μs (17.6% faster)

def test_mixed_predefined_and_user_labels():
    # Should remove only predefined labels, keep user labels
    labels = ['Bug Fix', 'custom1', 'ENHANCEMENT', 'featureX']
    codeflash_output = get_user_labels(labels) # 332μs -> 290μs (14.2% faster)

def test_all_user_labels_are_kept():
    # No predefined or custom labels, all should be kept
    labels = ['label1', 'label2', 'label3']
    codeflash_output = get_user_labels(labels) # 317μs -> 273μs (16.1% faster)

def test_logger_debug_called_when_user_labels_kept():
    # If user labels are kept, logger.debug should be called
    labels = ['label1']
    get_user_labels(labels) # 313μs -> 271μs (15.6% faster)

def test_logger_debug_not_called_when_no_user_labels():
    # If no user labels, logger.debug should not be called
    labels = ['bug fix', 'tests']
    get_user_labels(labels) # 251μs -> 212μs (18.7% faster)

# --- Edge Test Cases ---

def test_labels_with_leading_trailing_spaces():
    # Spaces should not affect matching; only exact matches are removed
    labels = [' bug fix ', 'tests ', ' enhancement', 'userlabel']
    # Only 'userlabel' should remain since others are not exact matches
    codeflash_output = get_user_labels(labels) # 324μs -> 276μs (17.3% faster)

def test_label_is_substring_of_predefined():
    # 'bug' is not 'bug fix', so it should be kept
    labels = ['bug', 'fix', 'bug fix']
    codeflash_output = get_user_labels(labels) # 314μs -> 272μs (15.4% faster)

def test_custom_labels_are_removed_when_enabled():
    # Custom labels are removed if enable_custom_labels is True
    _test_settings.config['enable_custom_labels'] = True
    _test_settings._custom_labels = ['custom1', 'custom2']
    labels = ['custom1', 'custom2', 'userlabel']
    codeflash_output = get_user_labels(labels) # 316μs -> 272μs (16.0% faster)

def test_custom_labels_are_kept_when_disabled():
    # Custom labels are not removed if enable_custom_labels is False
    _test_settings.config['enable_custom_labels'] = False
    _test_settings._custom_labels = ['custom1', 'custom2']
    labels = ['custom1', 'custom2', 'userlabel']
    codeflash_output = get_user_labels(labels) # 314μs -> 271μs (15.7% faster)

def test_custom_labels_case_sensitive():
    # Custom label matching is case-sensitive
    _test_settings.config['enable_custom_labels'] = True
    _test_settings._custom_labels = ['Custom1']
    labels = ['custom1', 'Custom1']
    # Only 'Custom1' matches and is removed, 'custom1' remains
    codeflash_output = get_user_labels(labels) # 315μs -> 269μs (17.3% faster)

def test_duplicate_labels():
    # Duplicates are preserved if not filtered
    labels = ['label1', 'label1', 'bug fix', 'label1']
    codeflash_output = get_user_labels(labels) # 313μs -> 271μs (15.6% faster)

def test_labels_with_special_characters():
    # Special characters should not interfere
    labels = ['!@#, 'bug fix', 'label_123', 'tests']
    codeflash_output = get_user_labels(labels) # 311μs -> 270μs (15.4% faster)

def test_labels_with_empty_string():
    # Empty string is not a predefined label, so it should be kept
    labels = ['', 'bug fix', 'label']
    codeflash_output = get_user_labels(labels) # 312μs -> 271μs (14.8% faster)



def test_large_number_of_labels_performance():
    # Test with 1000 labels, half predefined, half user
    user_labels = [f'userlabel{i}' for i in range(500)]
    predefined_labels = ['bug fix', 'tests', 'enhancement', 'documentation', 'other'] * 100
    labels = user_labels + predefined_labels
    # Should keep all user labels, remove all predefined
    codeflash_output = get_user_labels(labels); result = codeflash_output # 425μs -> 382μs (11.0% faster)

def test_large_number_of_custom_labels():
    # 500 custom labels, 500 user labels, enable_custom_labels=True
    _test_settings.config['enable_custom_labels'] = True
    _test_settings._custom_labels = [f'custom{i}' for i in range(500)]
    labels = [f'custom{i}' for i in range(500)] + [f'user{i}' for i in range(500)]
    codeflash_output = get_user_labels(labels); result = codeflash_output # 408μs -> 366μs (11.5% faster)

def test_large_number_of_duplicates():
    # 1000 labels, all the same user label
    labels = ['userlabel'] * 1000
    codeflash_output = get_user_labels(labels); result = codeflash_output # 400μs -> 350μs (14.2% faster)

def test_large_mixed_labels():
    # 250 user, 250 predefined, 250 custom, 250 duplicates
    _test_settings.config['enable_custom_labels'] = True
    _test_settings._custom_labels = [f'custom{i}' for i in range(250)]
    labels = (
        [f'user{i}' for i in range(250)] +
        ['bug fix', 'tests', 'enhancement', 'documentation', 'other'] * 50 +
        [f'custom{i}' for i in range(250)] +
        ['user0'] * 250
    )
    codeflash_output = get_user_labels(labels); result = codeflash_output # 396μs -> 345μs (14.7% faster)
    # Only user labels and duplicates of 'user0' should remain
    expected = [f'user{i}' for i in range(250)] + ['user0'] * 250
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-get_user_labels-mgzf6y4o and push.

Codeflash

The optimization achieves a **21% speedup** through three key improvements:

**1. Eliminated Redundant Settings Calls**
The original code called `get_settings()` twice - once for `enable_custom_labels` and once for `custom_labels`. The optimization caches the settings object in a single variable, reducing expensive context lookups from 67 calls to 34 calls (50% reduction).

**2. O(1) Set Lookups vs O(n) List Lookups**
- Added a constant `_SYSTEM_LABELS` set for reserved label checking, replacing the inline list `['bug fix', 'tests', ...]`
- Converts `custom_labels` to a set for O(1) membership testing instead of O(n) list scanning
- For the large mixed test case with 1000 labels, this optimization alone provides a **206% speedup** (456μs → 149μs)

**3. List Comprehension Instead of Manual Loop**
Replaced the explicit for-loop with a list comprehension that combines both filtering conditions in a single expression. This reduces Python bytecode overhead and leverages optimized internal iteration.

The optimizations are particularly effective for:
- **Large label lists**: 11-15% faster for 1000+ labels
- **Mixed label scenarios**: Up to 206% faster when filtering both system and custom labels
- **High custom label counts**: 11.5% improvement with 500 custom labels

The changes maintain identical behavior while significantly improving performance through algorithmic efficiency (O(1) lookups) and reduced function call overhead.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 20, 2025 17:39
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants