⚡️ Speed up function `_parse_latex_header_span` by 37% #389

codeflash-ai · 2025-12-02T07:12:58Z

📄 37% (0.37x) speedup for `_parse_latex_header_span` in `pandas/io/formats/style_render.py`

⏱️ Runtime : 913 microseconds → 668 microseconds (best of 139 runs)

📝 Explanation and details

The optimized code achieves a 36% speedup through several targeted performance improvements in the LaTeX style processing functions:

Key optimizations in _parse_latex_cell_styles:

Eliminated redundant dictionary creation: The original code created a 5-element formatter dictionary on every loop iteration, even when most entries were never used. The optimized version only generates the specific format string needed for the detected wrap argument.
Reduced string conversions: Added early conversion of options to string once per iteration instead of repeatedly calling str(options) in the inner loop.
Used reversed() instead of slicing: Replaced latex_styles[::-1] with reversed(latex_styles) to avoid creating a full reversed copy of the list.
Streamlined conditional logic: Replaced the nested loop-and-break pattern with cleaner if-elif chains after finding the wrap argument.

Key optimizations in _parse_latex_header_span:

Optimized string parsing: Instead of calling find() twice for the same substring (once for detection, once for extraction), the code now caches the index and uses direct slicing with find('"', start) for the end position.
Used .get() for safer attribute access: Replaced "attributes" in cell check with cell.get("attributes") to handle missing keys more efficiently.

Performance impact by test category:

Basic cases: 8-20% speedup across most standard scenarios
Style-heavy workloads: Up to 254% speedup for tests with many styles (e.g., test_large_number_of_styles) due to eliminated dictionary creation overhead
Complex style chains: 40-109% speedup for tests combining multiple CSS attributes, benefiting from both optimizations

The optimizations particularly excel in scenarios with multiple CSS styles or frequent LaTeX processing, making them valuable for pandas DataFrame styling operations that process many cells with complex formatting.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	✅ 56 Passed
🌀 Generated Regression Tests	✅ 282 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

⚙️ Existing Unit Tests and Runtime

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`io/formats/style/test_to_latex.py::test_parse_latex_header_span`	8.81μs	7.22μs	22.1%✅

🌀 Generated Regression Tests and Runtime

from __future__ import annotations


# imports
import pytest
from pandas.io.formats.style_render import _parse_latex_header_span

# unit tests

# ---------------- BASIC TEST CASES ----------------


def test_no_span_no_style():
    # Simple cell, no span, no style
    cell = {"cellstyle": [], "display_value": "foo", "attributes": ""}
    codeflash_output = _parse_latex_header_span(
        cell, "t", "c"
    )  # 1.42μs -> 1.21μs (17.7% faster)


def test_colspan_basic():
    # Basic colspan
    cell = {"cellstyle": [], "display_value": "bar", "attributes": 'colspan="3"'}
    codeflash_output = _parse_latex_header_span(
        cell, "t", "c"
    )  # 3.22μs -> 2.71μs (18.8% faster)


def test_rowspan_basic():
    # Basic rowspan
    cell = {"cellstyle": [], "display_value": "baz", "attributes": 'rowspan="2"'}
    codeflash_output = _parse_latex_header_span(
        cell, "t", "c"
    )  # 3.02μs -> 2.79μs (8.16% faster)


def test_colspan_naive_l():
    # Colspan with naive-l alignment
    cell = {"cellstyle": [], "display_value": "abc", "attributes": 'colspan="2"'}
    codeflash_output = _parse_latex_header_span(cell, "t", "naive-l")
    out = codeflash_output  # 3.01μs -> 2.75μs (9.56% faster)


def test_colspan_naive_r():
    # Colspan with naive-r alignment
    cell = {"cellstyle": [], "display_value": "def", "attributes": 'colspan="2"'}
    codeflash_output = _parse_latex_header_span(cell, "t", "naive-r")
    out = codeflash_output  # 3.04μs -> 2.59μs (17.1% faster)


def test_rowspan_naive():
    # Rowspan with naive multirow_align
    cell = {"cellstyle": [], "display_value": "ghi", "attributes": 'rowspan="3"'}
    codeflash_output = _parse_latex_header_span(cell, "naive", "c")
    out = codeflash_output  # 1.38μs -> 1.72μs (20.0% slower)


def test_wrap_true():
    # Wrap True with no span
    cell = {"cellstyle": [], "display_value": "foo", "attributes": ""}
    codeflash_output = _parse_latex_header_span(
        cell, "t", "c", wrap=True
    )  # 1.70μs -> 1.43μs (19.3% faster)


def test_colspan_wrap_naive_l():
    # Colspan with wrap and naive-l
    cell = {"cellstyle": [], "display_value": "xyz", "attributes": 'colspan="3"'}
    codeflash_output = _parse_latex_header_span(cell, "t", "naive-l", wrap=True)
    out = codeflash_output  # 3.63μs -> 3.34μs (8.65% faster)


def test_colspan_wrap_naive_r():
    # Colspan with wrap and naive-r
    cell = {"cellstyle": [], "display_value": "uvw", "attributes": 'colspan="3"'}
    codeflash_output = _parse_latex_header_span(cell, "t", "naive-r", wrap=True)
    out = codeflash_output  # 3.63μs -> 3.23μs (12.5% faster)


def test_colspan_and_styles():
    # Colspan with style
    cell = {
        "cellstyle": [("font-weight", "bold")],
        "display_value": "boldtext",
        "attributes": 'colspan="2"',
    }
    codeflash_output = _parse_latex_header_span(cell, "t", "c")
    out = codeflash_output  # 4.90μs -> 3.32μs (47.4% faster)


def test_rowspan_and_styles():
    # Rowspan with style
    cell = {
        "cellstyle": [("font-style", "italic")],
        "display_value": "italictext",
        "attributes": 'rowspan="2"',
    }
    codeflash_output = _parse_latex_header_span(cell, "t", "c")
    out = codeflash_output  # 4.76μs -> 3.48μs (36.7% faster)


def test_colspan_and_multiple_styles():
    # Colspan with multiple styles
    cell = {
        "cellstyle": [("font-weight", "bold"), ("font-style", "italic")],
        "display_value": "both",
        "attributes": 'colspan="2"',
    }
    codeflash_output = _parse_latex_header_span(cell, "t", "c")
    out = codeflash_output  # 5.81μs -> 3.74μs (55.6% faster)


# ---------------- EDGE TEST CASES ----------------


def test_no_attributes_key():
    # No 'attributes' key at all
    cell = {"cellstyle": [], "display_value": "foo"}
    codeflash_output = _parse_latex_header_span(
        cell, "t", "c"
    )  # 1.07μs -> 939ns (14.0% faster)


def test_empty_attributes():
    # Empty attributes string
    cell = {"cellstyle": [], "display_value": "foo", "attributes": ""}
    codeflash_output = _parse_latex_header_span(
        cell, "t", "c"
    )  # 1.24μs -> 969ns (28.1% faster)


def test_colspan_one():
    # Colspan of 1 should still produce multicolumn
    cell = {"cellstyle": [], "display_value": "foo", "attributes": 'colspan="1"'}
    codeflash_output = _parse_latex_header_span(
        cell, "t", "c"
    )  # 2.92μs -> 2.59μs (12.8% faster)


def test_rowspan_one():
    # Rowspan of 1 should still produce multirow
    cell = {"cellstyle": [], "display_value": "foo", "attributes": 'rowspan="1"'}
    codeflash_output = _parse_latex_header_span(
        cell, "t", "c"
    )  # 2.80μs -> 2.69μs (4.09% faster)


def test_colspan_and_rowspan_simultaneous():
    # Both in attributes: only the first found is used (colspan takes precedence)
    cell = {
        "cellstyle": [],
        "display_value": "foo",
        "attributes": 'colspan="2" rowspan="3"',
    }
    # Should process colspan, not rowspan
    codeflash_output = _parse_latex_header_span(cell, "t", "c")
    out = codeflash_output  # 2.69μs -> 2.42μs (10.9% faster)


def test_rowspan_and_colspan_simultaneous():
    # Both in attributes, but rowspan first
    cell = {
        "cellstyle": [],
        "display_value": "foo",
        "attributes": 'rowspan="2" colspan="3"',
    }
    # Should process rowspan, not colspan
    codeflash_output = _parse_latex_header_span(cell, "t", "c")
    out = codeflash_output  # 2.82μs -> 2.50μs (13.0% faster)


def test_colspan_with_css_conversion():
    # Colspan with CSS conversion enabled
    cell = {
        "cellstyle": [("font-weight", "bold")],
        "display_value": "foo",
        "attributes": 'colspan="2"',
    }
    codeflash_output = _parse_latex_header_span(cell, "t", "c", convert_css=True)
    out = codeflash_output  # 8.37μs -> 6.91μs (21.1% faster)


def test_styles_with_latex_directive():
    # Style with --latex directive should not be converted
    cell = {
        "cellstyle": [("font-weight", "bold--latex")],
        "display_value": "foo",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(cell, "t", "c", convert_css=True)
    out = codeflash_output  # 7.12μs -> 6.03μs (18.0% faster)


def test_colspan_with_large_number():
    # Large colspan value
    cell = {"cellstyle": [], "display_value": "foo", "attributes": 'colspan="999"'}
    codeflash_output = _parse_latex_header_span(cell, "t", "c")
    out = codeflash_output  # 3.11μs -> 2.73μs (13.9% faster)


def test_rowspan_with_large_number():
    # Large rowspan value
    cell = {"cellstyle": [], "display_value": "foo", "attributes": 'rowspan="999"'}
    codeflash_output = _parse_latex_header_span(cell, "t", "c")
    out = codeflash_output  # 3.00μs -> 2.83μs (5.97% faster)


def test_colspan_with_nonint_value_raises():
    # Non-integer colspan should raise ValueError
    cell = {"cellstyle": [], "display_value": "foo", "attributes": 'colspan="abc"'}
    with pytest.raises(ValueError):
        _parse_latex_header_span(cell, "t", "c")  # 5.04μs -> 4.69μs (7.48% faster)


def test_rowspan_with_nonint_value_raises():
    # Non-integer rowspan should raise ValueError
    cell = {"cellstyle": [], "display_value": "foo", "attributes": 'rowspan="abc"'}
    with pytest.raises(ValueError):
        _parse_latex_header_span(cell, "t", "c")  # 4.63μs -> 4.37μs (6.02% faster)


def test_colspan_with_extra_attributes():
    # Colspan with extra unrelated attributes
    cell = {
        "cellstyle": [],
        "display_value": "foo",
        "attributes": 'colspan="2" data-x="y"',
    }
    codeflash_output = _parse_latex_header_span(cell, "t", "c")
    out = codeflash_output  # 2.95μs -> 2.54μs (15.8% faster)


def test_rowspan_with_extra_attributes():
    # Rowspan with extra unrelated attributes
    cell = {
        "cellstyle": [],
        "display_value": "foo",
        "attributes": 'rowspan="2" data-x="y"',
    }
    codeflash_output = _parse_latex_header_span(cell, "t", "c")
    out = codeflash_output  # 2.91μs -> 2.66μs (9.16% faster)


def test_colspan_with_wrap_and_styles():
    # Colspan with wrap and style
    cell = {
        "cellstyle": [("font-style", "italic")],
        "display_value": "foo",
        "attributes": 'colspan="2"',
    }
    codeflash_output = _parse_latex_header_span(cell, "t", "naive-l", wrap=True)
    out = codeflash_output  # 5.65μs -> 4.01μs (41.0% faster)


def test_colspan_with_zero():
    # Colspan of zero should produce multicolumn with 0
    cell = {"cellstyle": [], "display_value": "foo", "attributes": 'colspan="0"'}
    codeflash_output = _parse_latex_header_span(cell, "t", "c")
    out = codeflash_output  # 2.83μs -> 2.43μs (16.5% faster)


def test_rowspan_with_zero():
    # Rowspan of zero should produce multirow with 0
    cell = {"cellstyle": [], "display_value": "foo", "attributes": 'rowspan="0"'}
    codeflash_output = _parse_latex_header_span(cell, "t", "c")
    out = codeflash_output  # 2.94μs -> 2.64μs (11.3% faster)


# ---------------- LARGE SCALE TEST CASES ----------------


def test_large_number_of_styles():
    # Many styles applied, check nesting order and performance
    styles = [("font-weight", "bold")] * 100
    cell = {"cellstyle": styles, "display_value": "foo", "attributes": ""}
    codeflash_output = _parse_latex_header_span(cell, "t", "c")
    out = codeflash_output  # 95.2μs -> 26.9μs (254% faster)
    # Should nest 100 \bfseries
    expected = "foo"
    for _ in range(100):
        expected = r"\bfseries " + expected


def test_large_colspan_naive_l():
    # Large colspan with naive-l
    cell = {"cellstyle": [], "display_value": "foo", "attributes": 'colspan="1000"'}
    codeflash_output = _parse_latex_header_span(cell, "t", "naive-l")
    out = codeflash_output  # 4.05μs -> 3.49μs (16.2% faster)


def test_large_colspan_naive_r():
    # Large colspan with naive-r
    cell = {"cellstyle": [], "display_value": "foo", "attributes": 'colspan="1000"'}
    codeflash_output = _parse_latex_header_span(cell, "t", "naive-r")
    out = codeflash_output  # 3.72μs -> 3.06μs (21.3% faster)


def test_large_number_of_cells_with_styles():
    # Simulate many cells in a table with styles
    for i in range(100):
        cell = {
            "cellstyle": [("font-style", "italic"), ("font-weight", "bold")],
            "display_value": f"cell{i}",
            "attributes": 'colspan="2"',
        }
        codeflash_output = _parse_latex_header_span(cell, "t", "c")
        out = codeflash_output  # 194μs -> 111μs (75.2% faster)


def test_large_number_of_cells_with_rowspan():
    # Simulate many cells in a table with rowspans
    for i in range(1, 100):
        cell = {
            "cellstyle": [],
            "display_value": f"cell{i}",
            "attributes": f'rowspan="{i}"',
        }
        codeflash_output = _parse_latex_header_span(cell, "t", "c")
        out = codeflash_output  # 71.9μs -> 74.4μs (3.29% slower)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from pandas.io.formats.style_render import _parse_latex_header_span

# function to test
# (The function and its dependencies are defined above.)

# ---------------------------
# Basic Test Cases
# ---------------------------


def test_basic_multicolumn():
    # Basic multicolumn usage
    cell = {"cellstyle": [], "display_value": "Header", "attributes": 'colspan="2"'}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c"
    )
    result = codeflash_output  # 2.89μs -> 2.85μs (1.44% faster)


def test_basic_multirow():
    # Basic multirow usage
    cell = {"cellstyle": [], "display_value": "RowHeader", "attributes": 'rowspan="3"'}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="b", multicol_align="c"
    )
    result = codeflash_output  # 3.11μs -> 2.98μs (4.09% faster)


def test_basic_no_span():
    # No span attributes, should return display_value
    cell = {"cellstyle": [], "display_value": "Simple", "attributes": ""}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c"
    )
    result = codeflash_output  # 1.55μs -> 1.31μs (18.6% faster)


def test_basic_wrap():
    # Wrap argument should wrap display_value in braces
    cell = {"cellstyle": [], "display_value": "WrapMe", "attributes": ""}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", wrap=True
    )
    result = codeflash_output  # 1.77μs -> 1.48μs (19.7% faster)


def test_basic_multicolumn_wrap():
    # Multicolumn with wrap
    cell = {"cellstyle": [], "display_value": "Header", "attributes": 'colspan="3"'}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", wrap=True
    )
    result = codeflash_output  # 3.47μs -> 2.98μs (16.6% faster)


def test_basic_multirow_wrap():
    # Multirow with wrap
    cell = {"cellstyle": [], "display_value": "RowHeader", "attributes": 'rowspan="2"'}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="b", multicol_align="c", wrap=True
    )
    result = codeflash_output  # 3.30μs -> 3.13μs (5.27% faster)


def test_basic_cellstyle_bold():
    # Cellstyle with bold font-weight
    cell = {
        "cellstyle": [("font-weight", "bold")],
        "display_value": "BoldText",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c"
    )
    result = codeflash_output  # 3.74μs -> 2.03μs (83.9% faster)


def test_basic_cellstyle_color():
    # Cellstyle with color
    cell = {
        "cellstyle": [("color", "red")],
        "display_value": "RedText",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c"
    )
    result = codeflash_output  # 3.44μs -> 1.93μs (78.4% faster)


def test_basic_multicolumn_naive_l():
    # Multicolumn with naive-l alignment
    cell = {"cellstyle": [], "display_value": "Left", "attributes": 'colspan="3"'}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="naive-l"
    )
    result = codeflash_output  # 3.63μs -> 3.26μs (11.2% faster)


def test_basic_multicolumn_naive_r():
    # Multicolumn with naive-r alignment
    cell = {"cellstyle": [], "display_value": "Right", "attributes": 'colspan="2"'}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="naive-r"
    )
    result = codeflash_output  # 3.40μs -> 3.03μs (12.5% faster)


def test_basic_multicolumn_naive_l_wrap():
    # Multicolumn with naive-l alignment and wrap
    cell = {"cellstyle": [], "display_value": "LWrap", "attributes": 'colspan="2"'}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="naive-l", wrap=True
    )
    result = codeflash_output  # 3.54μs -> 3.11μs (13.6% faster)


def test_basic_multicolumn_naive_r_wrap():
    # Multicolumn with naive-r alignment and wrap
    cell = {"cellstyle": [], "display_value": "RWrap", "attributes": 'colspan="3"'}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="naive-r", wrap=True
    )
    result = codeflash_output  # 3.71μs -> 3.18μs (16.5% faster)


def test_basic_multirow_naive():
    # Multirow with naive alignment
    cell = {"cellstyle": [], "display_value": "NaiveRow", "attributes": 'rowspan="2"'}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="naive", multicol_align="c"
    )
    result = codeflash_output  # 1.70μs -> 2.00μs (15.2% slower)


# ---------------------------
# Edge Test Cases
# ---------------------------


def test_edge_empty_cellstyle():
    # Empty cellstyle list
    cell = {"cellstyle": [], "display_value": "EmptyStyle", "attributes": 'colspan="1"'}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c"
    )
    result = codeflash_output  # 3.21μs -> 2.79μs (14.9% faster)


def test_edge_missing_attributes_key():
    # Cell dict missing 'attributes' key
    cell = {"cellstyle": [], "display_value": "NoAttr"}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c"
    )
    result = codeflash_output  # 1.35μs -> 1.21μs (11.6% faster)


def test_edge_empty_attributes_string():
    # Attributes is empty string
    cell = {"cellstyle": [], "display_value": "EmptyAttr", "attributes": ""}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c"
    )
    result = codeflash_output  # 1.51μs -> 1.22μs (24.2% faster)


def test_edge_colspan_zero():
    # Colspan is zero (should not happen, but test behavior)
    cell = {
        "cellstyle": [],
        "display_value": "ZeroColspan",
        "attributes": 'colspan="0"',
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c"
    )
    result = codeflash_output  # 3.37μs -> 3.03μs (11.2% faster)


def test_edge_rowspan_zero():
    # Rowspan is zero (should not happen, but test behavior)
    cell = {
        "cellstyle": [],
        "display_value": "ZeroRowspan",
        "attributes": 'rowspan="0"',
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="b", multicol_align="c"
    )
    result = codeflash_output  # 3.34μs -> 3.13μs (6.85% faster)


def test_edge_colspan_large():
    # Large colspan value
    cell = {
        "cellstyle": [],
        "display_value": "BigColspan",
        "attributes": 'colspan="999"',
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c"
    )
    result = codeflash_output  # 3.22μs -> 2.97μs (8.45% faster)


def test_edge_rowspan_large():
    # Large rowspan value
    cell = {
        "cellstyle": [],
        "display_value": "BigRowspan",
        "attributes": 'rowspan="999"',
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="b", multicol_align="c"
    )
    result = codeflash_output  # 3.32μs -> 3.18μs (4.47% faster)


def test_edge_colspan_and_rowspan():
    # Both colspan and rowspan present (should only pick one)
    cell = {
        "cellstyle": [],
        "display_value": "Both",
        "attributes": 'colspan="2" rowspan="3"',
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="b", multicol_align="c"
    )
    result = codeflash_output  # 3.09μs -> 2.77μs (11.8% faster)


def test_edge_cellstyle_multiple():
    # Multiple cellstyles applied
    cell = {
        "cellstyle": [("font-weight", "bold"), ("color", "blue")],
        "display_value": "Styled",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c"
    )
    result = codeflash_output  # 5.07μs -> 2.42μs (109% faster)


def test_edge_cellstyle_conversion():
    # Cellstyle with CSS conversion
    cell = {
        "cellstyle": [("background-color", "#f0e")],
        "display_value": "HexColor",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 11.1μs -> 9.73μs (14.1% faster)


def test_edge_cellstyle_latex_tag():
    # Cellstyle with --latex tag
    cell = {
        "cellstyle": [("font-weight", "bold--latex")],
        "display_value": "LatexTag",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 6.80μs -> 5.91μs (15.1% faster)


def test_edge_cellstyle_with_wrap_flag():
    # Cellstyle with wrap flag
    cell = {
        "cellstyle": [("font-weight", "bold--wrap")],
        "display_value": "WrapBold",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 8.14μs -> 6.79μs (19.8% faster)


def test_edge_cellstyle_with_lwrap_flag():
    # Cellstyle with lwrap flag
    cell = {
        "cellstyle": [("background-color", "red--lwrap")],
        "display_value": "LWrapColor",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 9.48μs -> 8.32μs (13.9% faster)


def test_edge_cellstyle_with_nowrap_flag():
    # Cellstyle with nowrap flag
    cell = {
        "cellstyle": [("color", "green--nowrap")],
        "display_value": "NoWrapColor",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 8.99μs -> 7.66μs (17.5% faster)


def test_edge_cellstyle_with_dwrap_flag():
    # Cellstyle with dwrap flag
    cell = {
        "cellstyle": [("color", "purple--dwrap")],
        "display_value": "DoubleWrap",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 9.26μs -> 8.10μs (14.3% faster)


def test_edge_cellstyle_with_rwrap_flag():
    # Cellstyle with rwrap flag
    cell = {
        "cellstyle": [("color", "orange--rwrap")],
        "display_value": "RWrap",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 9.42μs -> 8.04μs (17.2% faster)


def test_edge_cellstyle_with_css_comment():
    # Cellstyle with CSS comment
    cell = {
        "cellstyle": [("color", "red /* --wrap */ ")],
        "display_value": "Commented",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 9.33μs -> 8.01μs (16.4% faster)


def test_edge_cellstyle_rgb_color():
    # Cellstyle with rgb color
    cell = {
        "cellstyle": [("color", "rgb(128, 255, 0)")],
        "display_value": "RgbColor",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 17.2μs -> 15.9μs (8.16% faster)


def test_edge_cellstyle_rgba_color():
    # Cellstyle with rgba color
    cell = {
        "cellstyle": [("color", "rgba(128, 255, 0, 0.5)")],
        "display_value": "RgbaColor",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 16.0μs -> 14.7μs (8.84% faster)


def test_edge_cellstyle_rgb_percent():
    # Cellstyle with rgb percent
    cell = {
        "cellstyle": [("color", "rgb(50%, 100%, 0%)")],
        "display_value": "RgbPercent",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 16.5μs -> 15.3μs (7.54% faster)


def test_edge_cellstyle_font_style_italic():
    # Cellstyle with italic font-style
    cell = {
        "cellstyle": [("font-style", "italic")],
        "display_value": "ItalicText",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 6.08μs -> 5.05μs (20.3% faster)


def test_edge_cellstyle_font_style_oblique():
    # Cellstyle with oblique font-style
    cell = {
        "cellstyle": [("font-style", "oblique")],
        "display_value": "ObliqueText",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 6.21μs -> 5.01μs (24.1% faster)


def test_edge_non_standard_attribute():
    # Cellstyle with non-standard attribute (should be ignored)
    cell = {
        "cellstyle": [("unknown-attr", "value")],
        "display_value": "Unknown",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 3.44μs -> 3.15μs (9.10% faster)


def test_edge_cellstyle_empty_value():
    # Cellstyle with empty value
    cell = {
        "cellstyle": [("font-weight", "")],
        "display_value": "EmptyValue",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 4.22μs -> 3.89μs (8.59% faster)


def test_edge_cellstyle_float_value():
    # Cellstyle with float value
    cell = {
        "cellstyle": [("font-weight", 1.0)],
        "display_value": "FloatValue",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 5.37μs -> 5.48μs (1.95% slower)


def test_edge_cellstyle_multiple_conversion_flags():
    # Cellstyle with multiple conversion flags (should pick first)
    cell = {
        "cellstyle": [("color", "red--wrap--nowrap")],
        "display_value": "MultiFlag",
        "attributes": "",
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 9.88μs -> 8.30μs (19.0% faster)


# ---------------------------
# Large Scale Test Cases
# ---------------------------


def test_large_colspan():
    # Large number of columns, naive-l alignment
    cell = {"cellstyle": [], "display_value": "Big", "attributes": 'colspan="1000"'}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="naive-l"
    )
    result = codeflash_output  # 4.58μs -> 4.26μs (7.61% faster)


def test_large_rowspan():
    # Large number of rows
    cell = {"cellstyle": [], "display_value": "Tall", "attributes": 'rowspan="1000"'}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="b", multicol_align="c"
    )
    result = codeflash_output  # 3.49μs -> 3.27μs (6.72% faster)


def test_large_multicolumn():
    # Large multicolumn with standard alignment
    cell = {"cellstyle": [], "display_value": "Wide", "attributes": 'colspan="999"'}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c"
    )
    result = codeflash_output  # 3.25μs -> 2.99μs (8.98% faster)


def test_large_cellstyle_chain():
    # Large cellstyle chain, should nest correctly
    styles = [("font-weight", "bold")] * 10
    cell = {"cellstyle": styles, "display_value": "Chain", "attributes": ""}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c"
    )
    result = codeflash_output  # 10.7μs -> 4.17μs (157% faster)


def test_large_cellstyle_conversion_chain():
    # Large cellstyle chain with conversion
    styles = [("color", "red")] * 10
    cell = {"cellstyle": styles, "display_value": "Chain", "attributes": ""}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 20.7μs -> 14.7μs (40.9% faster)


def test_large_cellstyle_mixed_chain():
    # Mixed cellstyle chain
    styles = [("font-weight", "bold"), ("color", "blue"), ("font-style", "italic")] * 5
    cell = {"cellstyle": styles, "display_value": "MixedChain", "attributes": ""}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 25.3μs -> 17.2μs (47.3% faster)


def test_large_colspan_naive_r_wrap():
    # Large naive-r with wrap
    cell = {
        "cellstyle": [],
        "display_value": "BigRWrap",
        "attributes": 'colspan="1000"',
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="naive-r", wrap=True
    )
    result = codeflash_output  # 4.66μs -> 4.16μs (11.9% faster)


def test_large_colspan_naive_l_wrap():
    # Large naive-l with wrap
    cell = {
        "cellstyle": [],
        "display_value": "BigLWrap",
        "attributes": 'colspan="1000"',
    }
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="naive-l", wrap=True
    )
    result = codeflash_output  # 4.39μs -> 4.27μs (2.90% faster)


def test_large_cellstyle_rgb_chain():
    # Large chain of rgb colors
    styles = [("color", "rgb(255,0,0)")] * 10
    cell = {"cellstyle": styles, "display_value": "RGBChain", "attributes": ""}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 56.8μs -> 50.2μs (13.2% faster)


def test_large_cellstyle_hex_chain():
    # Large chain of hex colors
    styles = [("color", "#ff23ee")] * 10
    cell = {"cellstyle": styles, "display_value": "HexChain", "attributes": ""}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 22.2μs -> 16.4μs (35.4% faster)


def test_large_cellstyle_background_color_chain():
    # Large chain of background colors
    styles = [("background-color", "yellow")] * 10
    cell = {"cellstyle": styles, "display_value": "BgChain", "attributes": ""}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 26.7μs -> 19.3μs (38.2% faster)


def test_large_cellstyle_mixed_flags():
    # Large chain with mixed flags
    styles = [("color", "red--wrap"), ("font-weight", "bold--nowrap")] * 5
    cell = {"cellstyle": styles, "display_value": "FlagChain", "attributes": ""}
    codeflash_output = _parse_latex_header_span(
        cell, multirow_align="t", multicol_align="c", convert_css=True
    )
    result = codeflash_output  # 26.3μs -> 19.6μs (34.3% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_parse_latex_header_span-mio8qx6b and push.

The optimized code achieves a **36% speedup** through several targeted performance improvements in the LaTeX style processing functions: **Key optimizations in `_parse_latex_cell_styles`:** - **Eliminated redundant dictionary creation**: The original code created a 5-element `formatter` dictionary on every loop iteration, even when most entries were never used. The optimized version only generates the specific format string needed for the detected wrap argument. - **Reduced string conversions**: Added early conversion of `options` to string once per iteration instead of repeatedly calling `str(options)` in the inner loop. - **Used `reversed()` instead of slicing**: Replaced `latex_styles[::-1]` with `reversed(latex_styles)` to avoid creating a full reversed copy of the list. - **Streamlined conditional logic**: Replaced the nested loop-and-break pattern with cleaner if-elif chains after finding the wrap argument. **Key optimizations in `_parse_latex_header_span`:** - **Optimized string parsing**: Instead of calling `find()` twice for the same substring (once for detection, once for extraction), the code now caches the index and uses direct slicing with `find('"', start)` for the end position. - **Used `.get()` for safer attribute access**: Replaced `"attributes" in cell` check with `cell.get("attributes")` to handle missing keys more efficiently. **Performance impact by test category:** - **Basic cases**: 8-20% speedup across most standard scenarios - **Style-heavy workloads**: Up to 254% speedup for tests with many styles (e.g., `test_large_number_of_styles`) due to eliminated dictionary creation overhead - **Complex style chains**: 40-109% speedup for tests combining multiple CSS attributes, benefiting from both optimizations The optimizations particularly excel in scenarios with multiple CSS styles or frequent LaTeX processing, making them valuable for pandas DataFrame styling operations that process many cells with complex formatting.

codeflash-ai bot requested a review from mashraf-222 December 2, 2025 07:13

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `_parse_latex_header_span` by 37% #389

⚡️ Speed up function `_parse_latex_header_span` by 37% #389

Uh oh!

codeflash-ai bot commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function _parse_latex_header_span by 37% #389

Are you sure you want to change the base?

⚡️ Speed up function _parse_latex_header_span by 37% #389

Uh oh!

Conversation

codeflash-ai bot commented Dec 2, 2025

📄 37% (0.37x) speedup for _parse_latex_header_span in pandas/io/formats/style_render.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `_parse_latex_header_span` by 37% #389

⚡️ Speed up function `_parse_latex_header_span` by 37% #389

📄 37% (0.37x) speedup for `_parse_latex_header_span` in `pandas/io/formats/style_render.py`