Skip to content

⚡️ Speed up function find_common_tags by 64%#306

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-find_common_tags-mn00300f
Open

⚡️ Speed up function find_common_tags by 64%#306
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-find_common_tags-mn00300f

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Mar 21, 2026

📄 64% (0.64x) speedup for find_common_tags in src/algorithms/string.py

⏱️ Runtime : 11.3 milliseconds 6.88 milliseconds (best of 71 runs)

📝 Explanation and details

Reduced runtime from 11.3ms to 6.88ms (64% faster) by initializing the common set from the article with the fewest tags and avoiding the articles[1:] slice. Because set.intersection_update costs scale with the size of the left-hand set and slicing allocates a new list, starting from the smallest tag set cuts the amount of work and memory churn — the line profiler shows intersection_update’s share of time falling from ~54.8% to ~34.8% and the routine-level profile roughly halving. The trade-off is one extra O(n) scan to find the smallest tag list and a per-iteration skip check, which can regress on inputs consisting of many articles with uniformly tiny tag lists (observed ~15% slower in a couple large-uniform benchmarks), but it delivers clear wins for imbalanced or large tag collections.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 2 Passed
🌀 Generated Regression Tests 29 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Click to see Existing Unit Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_common_tags.py::test_common_tags_1 4.04μs 3.12μs 29.3%✅
🌀 Click to see Generated Regression Tests
# imports
# function to test
from __future__ import annotations

import pytest  # used for our unit tests
from codeflash.result.common_tags import find_common_tags

# unit tests


def test_single_article():
    # Single article should return its tags
    articles = [{"tags": ["python", "coding", "tutorial"]}]
    codeflash_output = find_common_tags(articles)  # 1.29μs -> 1.04μs (24.1% faster)
    # Outputs were verified to be equal to the original implementation


def test_multiple_articles_with_common_tags():
    # Multiple articles with common tags should return the common tags
    articles = [
        {"tags": ["python", "coding"]},
        {"tags": ["python", "data"]},
        {"tags": ["python", "machine learning"]},
    ]
    codeflash_output = find_common_tags(articles)  # 2.12μs -> 1.88μs (13.3% faster)
    # Outputs were verified to be equal to the original implementation


def test_empty_list_of_articles():
    # Empty list of articles should return an empty set
    articles = []
    codeflash_output = find_common_tags(articles)  # 541ns -> 292ns (85.3% faster)
    # Outputs were verified to be equal to the original implementation


def test_articles_with_no_common_tags():
    # Articles with no common tags should return an empty set
    articles = [{"tags": ["python"]}, {"tags": ["java"]}, {"tags": ["c++"]}]
    codeflash_output = find_common_tags(articles)  # 1.58μs -> 1.58μs (0.000% faster)
    # Outputs were verified to be equal to the original implementation


def test_articles_with_empty_tag_lists():
    # Articles with some empty tag lists should return an empty set
    articles = [{"tags": []}, {"tags": ["python"]}, {"tags": ["python", "java"]}]
    codeflash_output = find_common_tags(articles)  # 1.33μs -> 500ns (167% faster)
    # Outputs were verified to be equal to the original implementation


def test_all_articles_with_empty_tag_lists():
    # All articles with empty tag lists should return an empty set
    articles = [{"tags": []}, {"tags": []}, {"tags": []}]
    codeflash_output = find_common_tags(articles)  # 1.38μs -> 458ns (200% faster)
    # Outputs were verified to be equal to the original implementation


def test_tags_with_special_characters():
    # Tags with special characters should be handled correctly
    articles = [{"tags": ["python!", "coding"]}, {"tags": ["python!", "data"]}]
    codeflash_output = find_common_tags(articles)  # 1.67μs -> 1.58μs (5.24% faster)
    # Outputs were verified to be equal to the original implementation


def test_case_sensitivity():
    # Tags with different cases should not be considered the same
    articles = [{"tags": ["Python", "coding"]}, {"tags": ["python", "data"]}]
    codeflash_output = find_common_tags(articles)  # 1.62μs -> 1.54μs (5.38% faster)
    # Outputs were verified to be equal to the original implementation


def test_large_number_of_articles():
    # Large number of articles with a common tag should return that tag
    articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(1000)]
    codeflash_output = find_common_tags(articles)  # 182μs -> 217μs (16.0% slower)
    # Outputs were verified to be equal to the original implementation


def test_large_number_of_tags():
    # Large number of tags with some common tags should return the common tags
    articles = [
        {"tags": [f"tag{i}" for i in range(1000)]},
        {"tags": [f"tag{i}" for i in range(500, 1500)]},
    ]
    expected = {f"tag{i}" for i in range(500, 1000)}
    codeflash_output = find_common_tags(articles)  # 110μs -> 59.7μs (84.5% faster)
    # Outputs were verified to be equal to the original implementation


def test_mixed_length_of_tag_lists():
    # Articles with mixed length of tag lists should return the common tags
    articles = [
        {"tags": ["python", "coding"]},
        {"tags": ["python"]},
        {"tags": ["python", "coding", "tutorial"]},
    ]
    codeflash_output = find_common_tags(articles)  # 1.88μs -> 1.96μs (4.24% slower)
    # Outputs were verified to be equal to the original implementation


def test_tags_with_different_data_types():
    # Tags with different data types should only consider strings
    articles = [{"tags": ["python", 123]}, {"tags": ["python", "123"]}]
    codeflash_output = find_common_tags(articles)  # 1.50μs -> 1.54μs (2.72% slower)
    # Outputs were verified to be equal to the original implementation


def test_performance_with_large_data():
    # Performance with large data should return the common tag
    articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(10000)]
    codeflash_output = find_common_tags(articles)  # 1.83ms -> 2.16ms (15.2% slower)
    # Outputs were verified to be equal to the original implementation


def test_scalability_with_increasing_tags():
    # Scalability with increasing tags should return the common tag
    articles = [
        {"tags": ["common_tag"] + [f"tag{i}" for i in range(j)]} for j in range(1, 1001)
    ]
    codeflash_output = find_common_tags(articles)  # 547μs -> 433μs (26.5% faster)
    # Outputs were verified to be equal to the original implementation
# imports
# function to test
from __future__ import annotations

import pytest  # used for our unit tests
from codeflash.result.common_tags import find_common_tags

# unit tests


def test_empty_input_list():
    # Test with an empty list
    codeflash_output = find_common_tags([])  # 792ns -> 333ns (138% faster)
    # Outputs were verified to be equal to the original implementation


def test_single_article():
    # Test with a single article with tags
    codeflash_output = find_common_tags(
        [{"tags": ["python", "coding", "development"]}]
    )  # 1.62μs -> 1.21μs (34.5% faster)
    # Test with a single article with no tags
    codeflash_output = find_common_tags([{"tags": []}])  # 667ns -> 292ns (128% faster)
    # Outputs were verified to be equal to the original implementation


def test_multiple_articles_some_common_tags():
    # Test with multiple articles having some common tags
    articles = [
        {"tags": ["python", "coding", "development"]},
        {"tags": ["python", "development", "tutorial"]},
        {"tags": ["python", "development", "guide"]},
    ]
    codeflash_output = find_common_tags(articles)  # 3.00μs -> 2.08μs (44.0% faster)

    articles = [
        {"tags": ["tech", "news"]},
        {"tags": ["tech", "gadgets"]},
        {"tags": ["tech", "reviews"]},
    ]
    codeflash_output = find_common_tags(articles)  # 1.25μs -> 1.08μs (15.3% faster)
    # Outputs were verified to be equal to the original implementation


def test_multiple_articles_no_common_tags():
    # Test with multiple articles having no common tags
    articles = [
        {"tags": ["python", "coding"]},
        {"tags": ["development", "tutorial"]},
        {"tags": ["guide", "learning"]},
    ]
    codeflash_output = find_common_tags(articles)  # 1.83μs -> 1.62μs (12.9% faster)

    articles = [
        {"tags": ["apple", "banana"]},
        {"tags": ["orange", "grape"]},
        {"tags": ["melon", "kiwi"]},
    ]
    codeflash_output = find_common_tags(articles)  # 959ns -> 917ns (4.58% faster)
    # Outputs were verified to be equal to the original implementation


def test_articles_with_duplicate_tags():
    # Test with articles having duplicate tags
    articles = [
        {"tags": ["python", "python", "coding"]},
        {"tags": ["python", "development", "python"]},
        {"tags": ["python", "guide", "python"]},
    ]
    codeflash_output = find_common_tags(articles)  # 2.12μs -> 1.96μs (8.53% faster)

    articles = [
        {"tags": ["tech", "tech", "news"]},
        {"tags": ["tech", "tech", "gadgets"]},
        {"tags": ["tech", "tech", "reviews"]},
    ]
    codeflash_output = find_common_tags(articles)  # 1.25μs -> 1.12μs (11.1% faster)
    # Outputs were verified to be equal to the original implementation


def test_articles_with_mixed_case_tags():
    # Test with articles having mixed case tags
    articles = [
        {"tags": ["Python", "Coding"]},
        {"tags": ["python", "Development"]},
        {"tags": ["PYTHON", "Guide"]},
    ]
    codeflash_output = find_common_tags(articles)  # 1.50μs -> 1.58μs (5.24% slower)

    articles = [
        {"tags": ["Tech", "News"]},
        {"tags": ["tech", "Gadgets"]},
        {"tags": ["TECH", "Reviews"]},
    ]
    codeflash_output = find_common_tags(articles)  # 833ns -> 917ns (9.16% slower)
    # Outputs were verified to be equal to the original implementation


def test_articles_with_non_string_tags():
    # Test with articles having non-string tags
    articles = [
        {"tags": ["python", 123, "coding"]},
        {"tags": ["python", "development", 123]},
        {"tags": ["python", "guide", 123]},
    ]
    codeflash_output = find_common_tags(articles)  # 2.17μs -> 2.00μs (8.30% faster)

    articles = [
        {"tags": [None, "news"]},
        {"tags": ["tech", None]},
        {"tags": [None, "reviews"]},
    ]
    codeflash_output = find_common_tags(articles)  # 1.21μs -> 1.12μs (7.47% faster)
    # Outputs were verified to be equal to the original implementation


def test_large_scale_test_cases():
    # Test with large scale input where all tags should be common
    articles = [{"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(100)]
    expected_output = {"tag" + str(i) for i in range(1000)}
    codeflash_output = find_common_tags(articles)  # 5.76ms -> 3.96ms (45.5% faster)

    # Test with large scale input where no tags should be common
    articles = [{"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(50)] + [
        {"tags": ["unique_tag"]}
    ]
    codeflash_output = find_common_tags(articles)  # 2.84ms -> 20.2μs (14002% faster)
    # Outputs were verified to be equal to the original implementation

To edit these changes git checkout codeflash/optimize-find_common_tags-mn00300f and push.

Codeflash Static Badge

Reduced runtime from 11.3ms to 6.88ms (64% faster) by initializing the common set from the article with the fewest tags and avoiding the articles[1:] slice. Because set.intersection_update costs scale with the size of the left-hand set and slicing allocates a new list, starting from the smallest tag set cuts the amount of work and memory churn — the line profiler shows intersection_update’s share of time falling from ~54.8% to ~34.8% and the routine-level profile roughly halving. The trade-off is one extra O(n) scan to find the smallest tag list and a per-iteration skip check, which can regress on inputs consisting of many articles with uniformly tiny tag lists (observed ~15% slower in a couple large-uniform benchmarks), but it delivers clear wins for imbalanced or large tag collections.
@codeflash-ai codeflash-ai bot requested a review from KRRT7 March 21, 2026 07:22
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants