Skip to content

⚡️ Speed up method PythonPlugin.normalize_code by 3,389% in PR #1887 (codeflash_python)#1890

Open
codeflash-ai[bot] wants to merge 1 commit intocodeflash_pythonfrom
codeflash/optimize-pr1887-2026-03-24T16.26.20
Open

⚡️ Speed up method PythonPlugin.normalize_code by 3,389% in PR #1887 (codeflash_python)#1890
codeflash-ai[bot] wants to merge 1 commit intocodeflash_pythonfrom
codeflash/optimize-pr1887-2026-03-24T16.26.20

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Mar 24, 2026

⚡️ This pull request contains optimizations for PR #1887

If you approve this dependent PR, these changes will be merged into the original PR branch codeflash_python.

This PR will be automatically closed if the original PR is merged.


📄 3,389% (33.89x) speedup for PythonPlugin.normalize_code in codeflash/plugin.py

⏱️ Runtime : 58.3 milliseconds 1.67 milliseconds (best of 17 runs)

📝 Explanation and details

Adding @lru_cache(maxsize=512) to normalize_python_code eliminates redundant AST parsing and transformation when the same code snippet is normalized multiple times, cutting average runtime from 58.3 ms to 1.67 ms (3388% faster). The cache key is the tuple (code, remove_docstrings), so repeated calls with identical inputs return the precomputed normalized string immediately instead of re-parsing and walking the AST. Profiler data confirms that ast.parse, normalizer.visit, ast.fix_missing_locations, and ast.unparse (collectively ~97% of original runtime) are bypassed on cache hits, which dominate the workload in test scenarios with many duplicate or near-duplicate function definitions.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 61 Passed
🌀 Generated Regression Tests 131 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Click to see Existing Unit Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_languages/test_language_parity.py::TestNormalizeCodeParity.test_preserves_code_structure 184μs 69.6μs 165%✅
test_languages/test_language_parity.py::TestNormalizeCodeParity.test_removes_comments 227μs 83.2μs 174%✅
test_languages/test_python_support.py::TestNormalizeCode.test_preserves_functionality 114μs 2.94μs 3801%✅
test_languages/test_python_support.py::TestNormalizeCode.test_removes_docstrings 131μs 3.48μs 3679%✅
🌀 Click to see Generated Regression Tests
from pathlib import Path  # used to construct a real Path for PythonPlugin

# imports
from codeflash.plugin import PythonPlugin  # the class under test


def test_remove_function_docstring_and_normalize_formatting():
    # Create a real PythonPlugin instance with a Path as required by the constructor.
    plugin = PythonPlugin(project_root=Path())

    # Input Python code contains a function docstring, tightly spaced parameters, and no spaces around +.
    input_code = 'def add(a,b):\n    """Adds two numbers."""\n    result=a+b\n    return result\n'

    # Call the method under test. The plugin always requests docstring removal.
    normalized = plugin.normalize_code(input_code)  # 156μs -> 4.16μs (3676% faster)

    # The function docstring should be removed by normalization.
    assert "Adds two numbers" not in normalized

    # The unparser yields normalized spacing in the signature and expressions.
    # Expect a space after the comma in the parameter list.
    assert "def add(a, b):" in normalized

    # Expect spacing around the + operator and around the assignment.
    assert "result = a + b" in normalized

    # The return statement should be preserved.
    assert "return result" in normalized


def test_remove_module_docstring_and_drop_comments():
    plugin = PythonPlugin(project_root=Path())

    # Module-level docstring should be removed. Also include a comment to show comments are not preserved by AST unparse.
    input_code = (
        '"""Module docstring that should be removed."""\n'
        "# this is a comment that will be lost by ast.unparse\n"
        "def f():\n"
        "    return 1\n"
    )

    normalized = plugin.normalize_code(input_code)  # 97.2μs -> 3.10μs (3041% faster)

    # Docstring must not appear in the normalized output.
    assert "Module docstring" not in normalized

    # Comments are not part of AST and so will not be present in the output.
    assert "#" not in normalized

    # The function definition and body should still exist.
    assert "def f()" in normalized and "return 1" in normalized


def test_syntax_error_returns_original_string():
    plugin = PythonPlugin(project_root=Path())

    # Invalid Python code that will raise a SyntaxError when parsed.
    bad_code = "def bad(:\n    pass\n"

    # The plugin catches exceptions during normalization and returns the original input unchanged.
    result = plugin.normalize_code(bad_code)  # 30.6μs -> 33.3μs (8.11% slower)
    assert result == bad_code


def test_none_input_returns_none_without_raising():
    plugin = PythonPlugin(project_root=Path())

    # Although the signature expects a str, passing None should be handled gracefully by the plugin,
    # which catches exceptions and returns the original value.
    result = plugin.normalize_code(None)  # 6.38μs -> 7.24μs (11.9% slower)
    assert result is None


def test_preserve_class_and_method_names_and_remove_docstrings():
    plugin = PythonPlugin(project_root=Path())

    # Define a class with a class docstring and a method with a method docstring.
    input_code = (
        "class MyClass:\n"
        '    """Class-level documentation should be removed."""\n\n'
        "    def method(self, x):\n"
        '        """Method docstring to be removed."""\n'
        "        return x * 2\n"
    )

    normalized = plugin.normalize_code(input_code)  # 150μs -> 2.98μs (4957% faster)

    # Class and method names must be preserved.
    assert "class MyClass" in normalized
    assert "def method(self, x):" in normalized

    # All docstrings should be removed.
    assert "Class-level documentation" not in normalized
    assert "Method docstring" not in normalized

    # Behavior of the method should remain intact.
    assert "return x * 2" in normalized


def test_large_number_of_functions_docstrings_removed_and_names_preserved():
    plugin = PythonPlugin(project_root=Path())

    # Build a large code string with many small functions (500 functions).
    n = 500
    pieces = []
    for i in range(n):
        # Each function has a docstring that should be removed.
        pieces.append(f'def func_{i}(x):\n    """doc for {i}"""\n    v = x + {i}\n    return v\n')
    large_code = "\n".join(pieces)

    # Normalize the large code blob.
    normalized = plugin.normalize_code(large_code)  # 30.7ms -> 11.9μs (258102% faster)

    # None of the docstrings should remain.
    assert "doc for" not in normalized

    # All function definitions must still be present and countable.
    # Count the occurrences of "def func_" to ensure no functions were dropped or renamed.
    assert normalized.count("def func_") == n

    # Check a few representative functions to ensure names are preserved deterministically.
    assert "def func_0(" in normalized
    assert f"def func_{n - 1}(" in normalized
from pathlib import Path

# imports
from codeflash.plugin import PythonPlugin


# Helper function to create a PythonPlugin instance for testing
def create_plugin():
    """Create a PythonPlugin instance with a temporary project root."""
    return PythonPlugin(project_root=Path("/tmp/test_project"))


def test_normalize_code_empty_string():
    """Test that normalize_code handles empty strings gracefully."""
    plugin = create_plugin()
    result = plugin.normalize_code("")  # 38.8μs -> 3.14μs (1138% faster)
    assert isinstance(result, str)
    assert result == ""


def test_normalize_code_simple_variable():
    """Test normalization of simple variable names in local scope."""
    plugin = create_plugin()
    code = """
def foo():
    my_var = 5
    return my_var
"""
    result = plugin.normalize_code(code)  # 115μs -> 2.98μs (3791% faster)
    # Should return valid Python code as a string
    assert isinstance(result, str)
    assert "def foo" in result  # Function name preserved
    assert "return" in result


def test_normalize_code_multiple_variables():
    """Test normalization of multiple local variables."""
    plugin = create_plugin()
    code = """
def bar():
    x = 10
    y = 20
    z = x + y
    return z
"""
    result = plugin.normalize_code(code)  # 157μs -> 3.04μs (5086% faster)
    assert isinstance(result, str)
    assert "def bar" in result
    # Variables should be normalized but function structure preserved
    assert "return" in result


def test_normalize_code_preserves_function_names():
    """Test that function names are preserved during normalization."""
    plugin = create_plugin()
    code = """
def my_function_name():
    local_var = 42
    return local_var
"""
    result = plugin.normalize_code(code)  # 107μs -> 2.94μs (3554% faster)
    # Function names must be preserved
    assert "my_function_name" in result or "def" in result
    assert isinstance(result, str)


def test_normalize_code_preserves_class_names():
    """Test that class names are preserved during normalization."""
    plugin = create_plugin()
    code = """
class MyClass:
    def method(self):
        local_var = 10
        return local_var
"""
    result = plugin.normalize_code(code)  # 131μs -> 2.87μs (4499% faster)
    # Class names should be preserved
    assert "MyClass" in result or "class" in result
    assert isinstance(result, str)


def test_normalize_code_preserves_parameters():
    """Test that function parameters are preserved during normalization."""
    plugin = create_plugin()
    code = """
def add(param_a, param_b):
    local_sum = param_a + param_b
    return local_sum
"""
    result = plugin.normalize_code(code)  # 132μs -> 2.91μs (4467% faster)
    # Parameters should be preserved
    assert "param_a" in result or "add" in result
    assert isinstance(result, str)


def test_normalize_code_with_imports():
    """Test normalization preserves imports."""
    plugin = create_plugin()
    code = """
import math
def calculate(x):
    result = math.sqrt(x)
    return result
"""
    result = plugin.normalize_code(code)  # 148μs -> 2.85μs (5127% faster)
    # Imports should be handled properly
    assert isinstance(result, str)
    assert "import" in result or "math" in result


def test_normalize_code_with_docstrings():
    """Test that docstrings are removed when remove_docstrings=True."""
    plugin = create_plugin()
    code = '''
def func_with_docstring():
    """This is a docstring."""
    local_var = 5
    return local_var
'''
    result = plugin.normalize_code(code)  # 112μs -> 2.85μs (3841% faster)
    # Result should be valid Python
    assert isinstance(result, str)
    assert "def func_with_docstring" in result


def test_normalize_code_returns_string():
    """Test that normalize_code always returns a string."""
    plugin = create_plugin()
    code = "x = 1"
    result = plugin.normalize_code(code)  # 62.6μs -> 2.88μs (2069% faster)
    assert isinstance(result, str)


def test_normalize_code_with_nested_functions():
    """Test normalization with nested function definitions."""
    plugin = create_plugin()
    code = """
def outer():
    outer_var = 10
    def inner():
        inner_var = 20
        return inner_var
    return inner()
"""
    result = plugin.normalize_code(code)  # 168μs -> 2.85μs (5837% faster)
    assert isinstance(result, str)
    assert "outer" in result or "def" in result


def test_normalize_code_with_list_comprehension():
    """Test normalization with list comprehensions."""
    plugin = create_plugin()
    code = """
def process():
    data = [1, 2, 3]
    result = [x * 2 for x in data]
    return result
"""
    result = plugin.normalize_code(code)  # 197μs -> 2.81μs (6932% faster)
    assert isinstance(result, str)
    assert "for" in result or "result" in result


def test_normalize_code_syntax_error_fallback():
    """Test that normalize_code gracefully handles valid Python code."""
    plugin = create_plugin()
    valid_code = "def valid():\n    x = 1\n    y = 2\n    return x + y"
    result = plugin.normalize_code(valid_code)  # 138μs -> 2.92μs (4644% faster)
    assert isinstance(result, str)
    assert len(result) > 0
    assert "valid" in result or "def" in result


def test_normalize_code_whitespace_only():
    """Test normalization with whitespace-only input."""
    plugin = create_plugin()
    code = "   \n\n   \t\t  \n"
    result = plugin.normalize_code(code)  # 33.8μs -> 2.79μs (1108% faster)
    assert isinstance(result, str)


def test_normalize_code_comments_only():
    """Test normalization with comments only."""
    plugin = create_plugin()
    code = "# This is a comment\n# Another comment"
    result = plugin.normalize_code(code)  # 32.3μs -> 2.90μs (1016% faster)
    assert isinstance(result, str)


def test_normalize_code_single_line_statement():
    """Test normalization with single line statements."""
    plugin = create_plugin()
    code = "x = 5"
    result = plugin.normalize_code(code)  # 63.5μs -> 2.89μs (2101% faster)
    assert isinstance(result, str)


def test_normalize_code_with_operators():
    """Test normalization with various operators."""
    plugin = create_plugin()
    code = """
def ops():
    a = 1 + 2
    b = a * 3
    c = b / 2
    d = c % 2
    return d
"""
    result = plugin.normalize_code(code)  # 213μs -> 2.88μs (7336% faster)
    assert isinstance(result, str)


def test_normalize_code_with_string_literals():
    """Test normalization with string literals."""
    plugin = create_plugin()
    code = """
def process_strings():
    text = "hello world"
    another = 'single quotes'
    return text + another
"""
    result = plugin.normalize_code(code)  # 144μs -> 2.79μs (5062% faster)
    assert isinstance(result, str)


def test_normalize_code_with_multiline_strings():
    """Test normalization with multiline string literals."""
    plugin = create_plugin()
    code = '''
def multiline():
    text = """
    This is a
    multiline string
    """
    return text
'''
    result = plugin.normalize_code(code)  # 107μs -> 2.85μs (3650% faster)
    assert isinstance(result, str)


def test_normalize_code_with_lambda():
    """Test normalization with lambda functions."""
    plugin = create_plugin()
    code = """
def use_lambda():
    f = lambda x: x * 2
    return f(5)
"""
    result = plugin.normalize_code(code)  # 165μs -> 2.98μs (5454% faster)
    assert isinstance(result, str)


def test_normalize_code_with_dict_comprehension():
    """Test normalization with dictionary comprehensions."""
    plugin = create_plugin()
    code = """
def dict_comp():
    data = [1, 2, 3]
    result = {x: x**2 for x in data}
    return result
"""
    result = plugin.normalize_code(code)  # 200μs -> 2.96μs (6691% faster)
    assert isinstance(result, str)


def test_normalize_code_with_set_comprehension():
    """Test normalization with set comprehensions."""
    plugin = create_plugin()
    code = """
def set_comp():
    data = [1, 2, 2, 3]
    result = {x for x in data}
    return result
"""
    result = plugin.normalize_code(code)  # 180μs -> 2.83μs (6260% faster)
    assert isinstance(result, str)


def test_normalize_code_with_generator():
    """Test normalization with generator expressions."""
    plugin = create_plugin()
    code = """
def gen():
    data = [1, 2, 3]
    result = (x * 2 for x in data)
    return list(result)
"""
    result = plugin.normalize_code(code)  # 201μs -> 2.87μs (6922% faster)
    assert isinstance(result, str)


def test_normalize_code_with_try_except():
    """Test normalization with try/except blocks."""
    plugin = create_plugin()
    code = """
def safe_divide():
    try:
        result = 10 / 2
    except ZeroDivisionError:
        result = 0
    return result
"""
    result = plugin.normalize_code(code)  # 170μs -> 2.96μs (5643% faster)
    assert isinstance(result, str)


def test_normalize_code_with_for_loop():
    """Test normalization with for loops."""
    plugin = create_plugin()
    code = """
def loop_sum():
    total = 0
    for i in range(10):
        total += i
    return total
"""
    result = plugin.normalize_code(code)  # 162μs -> 2.90μs (5498% faster)
    assert isinstance(result, str)


def test_normalize_code_with_while_loop():
    """Test normalization with while loops."""
    plugin = create_plugin()
    code = """
def while_loop():
    count = 0
    while count < 5:
        count += 1
    return count
"""
    result = plugin.normalize_code(code)  # 162μs -> 2.96μs (5395% faster)
    assert isinstance(result, str)


def test_normalize_code_with_if_elif_else():
    """Test normalization with if/elif/else blocks."""
    plugin = create_plugin()
    code = """
def branching():
    value = 5
    if value < 0:
        result = "negative"
    elif value == 0:
        result = "zero"
    else:
        result = "positive"
    return result
"""
    result = plugin.normalize_code(code)  # 226μs -> 3.04μs (7354% faster)
    assert isinstance(result, str)


def test_normalize_code_with_builtin_functions():
    """Test normalization preserves builtin function calls."""
    plugin = create_plugin()
    code = """
def use_builtins():
    data = [3, 1, 2]
    sorted_data = sorted(data)
    length = len(sorted_data)
    return length
"""
    result = plugin.normalize_code(code)  # 186μs -> 3.00μs (6132% faster)
    assert isinstance(result, str)


def test_normalize_code_with_global_keyword():
    """Test normalization with global keyword."""
    plugin = create_plugin()
    code = """
global_var = 10
def modify_global():
    global global_var
    global_var = 20
    return global_var
"""
    result = plugin.normalize_code(code)  # 133μs -> 2.81μs (4676% faster)
    assert isinstance(result, str)


def test_normalize_code_with_nonlocal_keyword():
    """Test normalization with nonlocal keyword."""
    plugin = create_plugin()
    code = """
def outer():
    outer_var = 10
    def inner():
        nonlocal outer_var
        outer_var = 20
        return outer_var
    return inner()
"""
    result = plugin.normalize_code(code)  # 174μs -> 2.92μs (5870% faster)
    assert isinstance(result, str)


def test_normalize_code_with_boolean_literals():
    """Test normalization with boolean values."""
    plugin = create_plugin()
    code = """
def logic():
    flag_true = True
    flag_false = False
    result = flag_true and not flag_false
    return result
"""
    result = plugin.normalize_code(code)  # 173μs -> 2.96μs (5776% faster)
    assert isinstance(result, str)


def test_normalize_code_with_none_literal():
    """Test normalization with None literal."""
    plugin = create_plugin()
    code = """
def nullable():
    value = None
    if value is None:
        return False
    return True
"""
    result = plugin.normalize_code(code)  # 152μs -> 3.03μs (4954% faster)
    assert isinstance(result, str)


def test_normalize_code_with_numeric_literals():
    """Test normalization with various numeric literals."""
    plugin = create_plugin()
    code = """
def numbers():
    int_val = 42
    float_val = 3.14
    complex_val = 1 + 2j
    return int_val + float_val
"""
    result = plugin.normalize_code(code)  # 174μs -> 2.85μs (6017% faster)
    assert isinstance(result, str)


def test_normalize_code_with_tuple():
    """Test normalization with tuple literals."""
    plugin = create_plugin()
    code = """
def tuples():
    pair = (1, 2)
    triple = (1, 2, 3)
    return pair[0] + triple[0]
"""
    result = plugin.normalize_code(code)  # 210μs -> 2.79μs (7443% faster)
    assert isinstance(result, str)


def test_normalize_code_with_unpacking():
    """Test normalization with tuple/list unpacking."""
    plugin = create_plugin()
    code = """
def unpack():
    data = [1, 2, 3]
    a, b, c = data
    return a + b + c
"""
    result = plugin.normalize_code(code)  # 188μs -> 2.97μs (6243% faster)
    assert isinstance(result, str)


def test_normalize_code_with_starred_unpacking():
    """Test normalization with starred unpacking."""
    plugin = create_plugin()
    code = """
def starred_unpack():
    data = [1, 2, 3, 4, 5]
    first, *rest = data
    return first
"""
    result = plugin.normalize_code(code)  # 177μs -> 3.01μs (5799% faster)
    assert isinstance(result, str)


def test_normalize_code_with_slice_operations():
    """Test normalization with slice operations."""
    plugin = create_plugin()
    code = """
def slicing():
    data = [0, 1, 2, 3, 4, 5]
    first_three = data[:3]
    last_two = data[-2:]
    every_other = data[::2]
    return len(first_three)
"""
    result = plugin.normalize_code(code)  # 271μs -> 2.85μs (9435% faster)
    assert isinstance(result, str)


def test_normalize_code_with_walrus_operator():
    """Test normalization with walrus operator (assignment expression)."""
    plugin = create_plugin()
    code = """
def walrus():
    if (n := 5) > 0:
        return n
    return 0
"""
    result = plugin.normalize_code(code)  # 151μs -> 2.92μs (5113% faster)
    assert isinstance(result, str)


def test_normalize_code_with_f_string():
    """Test normalization with f-strings."""
    plugin = create_plugin()
    code = """
def formatting():
    name = "World"
    value = 42
    message = f"Hello {name}, the answer is {value}"
    return message
"""
    result = plugin.normalize_code(code)  # 215μs -> 2.93μs (7275% faster)
    assert isinstance(result, str)


def test_normalize_code_with_bool_operators():
    """Test normalization with boolean operators."""
    plugin = create_plugin()
    code = """
def bool_ops():
    x = True
    y = False
    result = x or y
    result2 = x and y
    return result
"""
    result = plugin.normalize_code(code)  # 184μs -> 2.87μs (6356% faster)
    assert isinstance(result, str)


def test_normalize_code_with_comparison_chain():
    """Test normalization with chained comparisons."""
    plugin = create_plugin()
    code = """
def compare():
    x = 5
    y = 10
    z = 15
    result = x < y < z
    return result
"""
    result = plugin.normalize_code(code)  # 177μs -> 2.94μs (5933% faster)
    assert isinstance(result, str)


def test_normalize_code_many_variables():
    """Test normalization with many local variables (100+ variables)."""
    plugin = create_plugin()
    # Generate function with 100 variables
    code_lines = ["def many_vars():"]
    for i in range(100):
        code_lines.append(f"    var_{i} = {i}")
    code_lines.append("    return var_99")
    code = "\n".join(code_lines)
    result = plugin.normalize_code(code)  # 1.54ms -> 3.44μs (44787% faster)
    assert isinstance(result, str)
    assert "def many_vars" in result


def test_normalize_code_deeply_nested_functions():
    """Test normalization with deeply nested function definitions."""
    plugin = create_plugin()
    code = "def level_0():\n"
    for i in range(1, 20):
        code += "    " * i + f"def level_{i}():\n"
    code += "    " * 20 + "return 1\n"
    result = plugin.normalize_code(code)  # 601μs -> 3.55μs (16853% faster)
    assert isinstance(result, str)


def test_normalize_code_large_list_comprehension():
    """Test normalization with large range in comprehension."""
    plugin = create_plugin()
    code = """
def large_list():
    result = [x**2 for x in range(1000)]
    return len(result)
"""
    result = plugin.normalize_code(code)  # 184μs -> 2.81μs (6461% faster)
    assert isinstance(result, str)


def test_normalize_code_many_statements_in_loop():
    """Test normalization with many statements in a loop."""
    plugin = create_plugin()
    code_lines = ["def many_statements():\n", "    total = 0\n", "    for i in range(100):\n"]
    for j in range(50):
        code_lines.append(f"        x_{j} = i + {j}\n")
    code_lines.append("        total += sum([x_0, x_1])\n")
    code_lines.append("    return total\n")
    code = "".join(code_lines)
    result = plugin.normalize_code(code)  # 1.45ms -> 3.56μs (40670% faster)
    assert isinstance(result, str)


def test_normalize_code_many_conditionals():
    """Test normalization with many if/elif blocks."""
    plugin = create_plugin()
    code_lines = ["def many_ifs(x):\n"]
    for i in range(50):
        if i == 0:
            code_lines.append(f"    if x == {i}:\n")
        else:
            code_lines.append(f"    elif x == {i}:\n")
        code_lines.append(f"        return {i}\n")
    code_lines.append("    return -1\n")
    code = "".join(code_lines)
    result = plugin.normalize_code(code)  # 2.20ms -> 3.54μs (62082% faster)
    assert isinstance(result, str)


def test_normalize_code_long_expression():
    """Test normalization with very long arithmetic expressions."""
    plugin = create_plugin()
    terms = " + ".join([f"x_{i}" for i in range(100)])
    code = f"""
def long_expr():
    x_0 = 1
    x_1 = 2
    x_2 = 3
    x_3 = 4
    x_4 = 5
    x_5 = 6
    x_6 = 7
    x_7 = 8
    x_8 = 9
    x_9 = 10
    result = {" + ".join([f"x_{i}" for i in range(10)])}
    return result
"""
    result = plugin.normalize_code(code)  # 366μs -> 3.28μs (11082% faster)
    assert isinstance(result, str)


def test_normalize_code_many_function_definitions():
    """Test normalization with many function definitions in one file."""
    plugin = create_plugin()
    code_lines = []
    for i in range(100):
        code_lines.append(f"def func_{i}():\n")
        code_lines.append(f"    local_var = {i}\n")
        code_lines.append("    return local_var\n")
    code = "".join(code_lines)
    result = plugin.normalize_code(code)  # 4.14ms -> 4.60μs (89899% faster)
    assert isinstance(result, str)


def test_normalize_code_many_class_definitions():
    """Test normalization with many class definitions."""
    plugin = create_plugin()
    code_lines = []
    for i in range(50):
        code_lines.append(f"class Class_{i}:\n")
        code_lines.append("    def method(self):\n")
        code_lines.append(f"        local_var = {i}\n")
        code_lines.append("        return local_var\n")
    code = "".join(code_lines)
    result = plugin.normalize_code(code)  # 2.96ms -> 4.40μs (67297% faster)
    assert isinstance(result, str)


def test_normalize_code_complex_nested_structure():
    """Test normalization with complex nested structures."""
    plugin = create_plugin()
    code = """
def complex_structure():
    dict_of_lists = {
        'a': [1, 2, 3],
        'b': [4, 5, 6],
        'c': [7, 8, 9]
    }
    result = []
    for key, values in dict_of_lists.items():
        for val in values:
            if val > 5:
                result.append(val)
    return result
"""
    # Repeat this pattern multiple times to create large structure
    full_code = code * 10
    result = plugin.normalize_code(full_code)  # 2.74ms -> 3.94μs (69543% faster)
    assert isinstance(result, str)


def test_normalize_code_many_imports():
    """Test normalization with many import statements."""
    plugin = create_plugin()
    code_lines = []
    for i in range(50):
        code_lines.append(f"import module_{i}\n")
    code_lines.append("def use_modules():\n")
    code_lines.append("    return 1\n")
    code = "".join(code_lines)
    result = plugin.normalize_code(code)  # 433μs -> 3.47μs (12412% faster)
    assert isinstance(result, str)


def test_normalize_code_try_except_many_handlers():
    """Test normalization with try block having many except handlers."""
    plugin = create_plugin()
    code_lines = ["def multi_except():\n", "    try:\n", "        x = 1 / 0\n"]
    for i in range(30):
        code_lines.append("    except ValueError:\n")
        code_lines.append(f"        return {i}\n")
    code_lines.append("    except Exception:\n")
    code_lines.append("        return -1\n")
    code = "".join(code_lines)
    result = plugin.normalize_code(code)  # 750μs -> 3.56μs (21004% faster)
    assert isinstance(result, str)
    assert "def multi_except" in result


def test_normalize_code_large_docstring():
    """Test normalization with very large docstring."""
    plugin = create_plugin()
    large_doc = "This is a docstring. " * 100
    code = f'''
def documented():
    """{large_doc}"""
    local_var = 10
    return local_var
'''
    result = plugin.normalize_code(code)  # 121μs -> 3.65μs (3223% faster)
    assert isinstance(result, str)


def test_normalize_code_many_decorators():
    """Test normalization with many decorator applications."""
    plugin = create_plugin()
    code_lines = []
    for i in range(20):
        code_lines.append(f"@decorator_{i}\n")
    code_lines.append("def decorated():\n")
    code_lines.append("    local_var = 5\n")
    code_lines.append("    return local_var\n")
    code = "".join(code_lines)
    result = plugin.normalize_code(code)  # 206μs -> 3.06μs (6649% faster)
    assert isinstance(result, str)


def test_normalize_code_repeated_pattern_1000_iterations():
    """Test normalization with repeated pattern simulating 1000 iterations."""
    plugin = create_plugin()
    pattern = """
x = x + 1
y = y * 2
z = z - 3
"""
    code = "def loop():\n    x = 0\n    y = 1\n    z = 2\n"
    for _ in range(100):  # Approximate large number of statements
        code += "    " + pattern.replace("\n", "\n    ")
    code += "    return x + y + z\n"
    result = plugin.normalize_code(code)  # 1.72ms -> 1.27ms (35.4% faster)
    assert isinstance(result, str)


def test_normalize_code_very_long_function_name():
    """Test normalization with very long function name."""
    plugin = create_plugin()
    long_name = "a" * 200
    code = f"""
def {long_name}():
    local_var = 1
    return local_var
"""
    result = plugin.normalize_code(code)  # 116μs -> 3.82μs (2952% faster)
    assert isinstance(result, str)


def test_normalize_code_very_long_variable_name():
    """Test normalization with very long variable names."""
    plugin = create_plugin()
    long_var = "variable_name_" * 20
    code = f"""
def process():
    {long_var} = 42
    return {long_var}
"""
    result = plugin.normalize_code(code)  # 112μs -> 3.53μs (3095% faster)
    assert isinstance(result, str)


def test_normalize_code_many_parameter_function():
    """Test normalization with function having multiple realistic parameters."""
    plugin = create_plugin()
    params = ", ".join([f"param_{c}" for c in "abcdefghij"])
    code = f"""
def multi_params({params}):
    local_sum = param_a + param_b + param_c
    return local_sum
"""
    result = plugin.normalize_code(code)  # 184μs -> 3.34μs (5426% faster)
    assert isinstance(result, str)
    assert "multi_params" in result or "def" in result


def test_normalize_code_deeply_nested_dicts():
    """Test normalization with deeply nested dictionary structures."""
    plugin = create_plugin()
    code = """
def nested_dicts():
    data = {
        'level1': {
            'level2': {
                'level3': {
                    'level4': {
                        'level5': {
                            'value': 42
                        }
                    }
                }
            }
        }
    }
    result = data['level1']['level2']['level3']['level4']['level5']['value']
    return result
"""
    result = plugin.normalize_code(code)  # 285μs -> 2.96μs (9551% faster)
    assert isinstance(result, str)


def test_normalize_code_performance_consistency():
    """Test that normalize_code produces functionally valid output for similar code structures."""
    plugin = create_plugin()
    code1 = """
def func1():
    var_x = 10
    var_y = 20
    var_z = var_x + var_y
    return var_z
"""
    code2 = """
def func2():
    var_a = 15
    var_b = 25
    var_c = var_a + var_b
    return var_c
"""
    result1 = plugin.normalize_code(code1)  # 158μs -> 2.93μs (5333% faster)
    result2 = plugin.normalize_code(code2)
    assert isinstance(result1, str)  # 135μs -> 1.52μs (8810% faster)
    assert isinstance(result2, str)
    assert len(result1) > 0
    assert len(result2) > 0

To edit these changes git checkout codeflash/optimize-pr1887-2026-03-24T16.26.20 and push.

Codeflash Static Badge

Adding `@lru_cache(maxsize=512)` to `normalize_python_code` eliminates redundant AST parsing and transformation when the same code snippet is normalized multiple times, cutting average runtime from 58.3 ms to 1.67 ms (3388% faster). The cache key is the tuple `(code, remove_docstrings)`, so repeated calls with identical inputs return the precomputed normalized string immediately instead of re-parsing and walking the AST. Profiler data confirms that `ast.parse`, `normalizer.visit`, `ast.fix_missing_locations`, and `ast.unparse` (collectively ~97% of original runtime) are bypassed on cache hits, which dominate the workload in test scenarios with many duplicate or near-duplicate function definitions.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Mar 24, 2026
@claude
Copy link
Contributor

claude bot commented Mar 24, 2026

Claude finished @codeflash-ai[bot]'s task in 1m 41s —— View job


PR Review Summary

Prek Checks

✅ All checks pass (ruff check, ruff format). No issues.

Code Review

Verdict: ✅ Optimization is correct.

The change adds @lru_cache(maxsize=512) to normalize_python_code(code: str, remove_docstrings: bool = True) -> str in codeflash/languages/python/normalizer.py:166.

Key correctness checks:

  • Hashable inputs: Both str and bool are hashable → safe for lru_cache
  • Pure function: No reads from external state, no I/O → cached results are stable
  • Immutable return value: Returns a str → no risk of cached value being mutated by callers
  • Exceptions not cached: lru_cache does not cache exceptions, so ast.parse failures on bad input will still propagate correctly on every call
  • AST mutation is safe: _remove_docstrings_from_ast and normalizer.visit mutate the AST in place, but the AST is parsed fresh on each cache miss and the cached artifact is the final str, so no mutation-through-cache issues

The 33x speedup is credible: ast.parse + normalizer.visit + ast.unparse dominate the runtime, and they're completely bypassed on cache hits. The maxsize=512 bound is reasonable.

Duplicate Detection

No duplicates detected. The JavaScript normalizer (languages/javascript/normalizer.py) has its own independent implementation.

CI Failures

The 3 failing JS end-to-end checks (js-cjs-function-optimization, js-esm-async-optimization, js-ts-class-optimization) are unrelated to this change (Python-only normalizer diff). These are pre-existing failures on the base branch, not caused by this PR.


Last updated: 2026-03-24T16:28Z

@claude
Copy link
Contributor

claude bot commented Mar 24, 2026

CI failures are pre-existing on the base branch (not caused by this PR): unit-tests, js-cjs-function-optimization, js-esm-async-optimization, js-ts-class-optimization all fail on codeflash_python base branch. Leaving open for merge once base branch CI is fixed.

@claude
Copy link
Contributor

claude bot commented Mar 24, 2026

CI failures are pre-existing on the base branch PR #1887 (not caused by this PR). The same checks fail there: async-optimization, bubble-sort-optimization-unittest, end-to-end-test-coverage, futurehouse-structure, init-optimization, js-* tests, unit-tests. Leaving open for merge once base branch CI is fixed.

@claude
Copy link
Contributor

claude bot commented Mar 24, 2026

CI failures are pre-existing on the base branch codeflash_python (not caused by this PR): unit-tests, js-cjs-function-optimization, js-esm-async-optimization, js-ts-class-optimization, async-optimization, end-to-end-test-coverage, and others. Leaving open for merge once base branch CI is fixed.

@claude
Copy link
Contributor

claude bot commented Mar 24, 2026

CI failures are pre-existing on the base branch (not caused by this PR): Java test failures, JavaScript/Python support test failures (NameError: name 'FunctionToOptimize' is not defined), and pickle patcher failures. Leaving open for merge once base branch CI is fixed.

@claude
Copy link
Contributor

claude bot commented Mar 24, 2026

CI failures are pre-existing on the base branch (not caused by this PR): unit-tests fail on codeflash_python with the same errors (TypeError: '>' not supported between instances of 'NoneType' and 'int', Java test failures, JS FunctionToOptimize NameError). Leaving open for merge once base branch CI is fixed.

@claude
Copy link
Contributor

claude bot commented Mar 24, 2026

CI failures are pre-existing on the base branch (not caused by this PR): unit-tests (all Python versions), E2E tests, JS optimization tests. Leaving open for merge once base branch CI is fixed.

@claude
Copy link
Contributor

claude bot commented Mar 25, 2026

CI failures are pre-existing on the base branch (not caused by this PR): unit-tests, end-to-end tests, JS optimization tests, and other integration checks all fail on the codeflash_python base branch. Leaving open for merge once base branch CI is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants