Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 2, 2025

📄 35% (0.35x) speedup for _maybe_wrap_formatter in pandas/io/formats/style_render.py

⏱️ Runtime : 223 microseconds 164 microseconds (best of 183 runs)

📝 Explanation and details

The optimization achieves a 35% speedup through targeted fast-path optimizations and reduced function call overhead:

Key Optimizations:

  1. get_option Fast Path: Added a direct lookup optimization for common option patterns. Instead of always calling _get_single_key() and _get_root() (which handle regex matching and validation), the code now:

    • Checks for simple keys without dots and does direct _global_config lookup
    • For dotted keys, manually splits and traverses the nested dict structure
    • Only falls back to the original expensive pattern-matching logic when needed
  2. _default_formatter String Creation: Moved f-string template creation outside the formatting call to avoid repeated string construction, reducing per-call overhead for float/complex formatting.

  3. _maybe_wrap_formatter Closure Optimization:

    • Replaced conditional get_option() call with explicit if-else to avoid the expensive config lookup when precision is already provided
    • Converted lambda closures to proper function definitions to reduce closure creation overhead
    • For the na_rep wrapper, created a function with default parameters to avoid repeated attribute lookups in the closure

Performance Impact Based on Tests:

  • Biggest gains (40-58% faster) occur when formatter=None and get_option() is called, as the fast-path config lookup dramatically reduces overhead
  • Moderate gains (6-15% faster) for cases with explicit precision values, due to reduced closure overhead
  • Minimal impact (0-3%) for cases using custom formatters, since they bypass the optimized paths

Hot Path Relevance: The function references show _maybe_wrap_formatter is called in pandas styling operations (format(), format_index(), format_index_names()) which are frequently used in data presentation workflows, making these optimizations particularly valuable for styling-heavy applications.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 58 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import math
from typing import Callable, Union

# imports
import pytest
from pandas.io.formats.style_render import _maybe_wrap_formatter

# --- Function and minimal dependencies under test ---


def isna(x):
    # Simple isna implementation for test purposes
    return x is None or (isinstance(x, float) and math.isnan(x))


def _str_escape(x, escape: str = "html"):
    # Simple escape implementation
    if isinstance(x, str):
        if escape == "html":
            return (
                x.replace("&", "&")
                .replace("<", "&lt;")
                .replace(">", "&gt;")
                .replace('"', "&quot;")
                .replace("'", "&#x27;")
                .replace("/", "&#x2F;")
            )
        elif escape == "latex":
            return x.replace("_", "\\_")
    return x


def _render_href(x, format="html"):
    # Simple hyperlink rendering
    if isinstance(x, str) and x.startswith("http"):
        if format == "html":
            return f'<a href="{x}">{x}</a>'
        elif format == "latex":
            return f"\\href{{{x}}}{{{x}}}"
    return x


BaseFormatter = Union[str, Callable]

# --- Unit tests ---

# 1. Basic Test Cases


def test_formatter_with_str_template():
    # Should use str.format on the input
    codeflash_output = _maybe_wrap_formatter(formatter="Value: {}")
    f = codeflash_output  # 1.41μs -> 1.38μs (2.61% faster)


def test_formatter_with_callable():
    # Should use the provided callable
    codeflash_output = _maybe_wrap_formatter(formatter=lambda x: f"**{x}**")
    f = codeflash_output  # 1.35μs -> 1.30μs (3.86% faster)


def test_formatter_with_none_default_precision():
    # Should use _default_formatter with default precision (from get_option)
    codeflash_output = _maybe_wrap_formatter(formatter=None)
    f = codeflash_output  # 4.90μs -> 3.33μs (47.1% faster)


def test_formatter_with_none_custom_precision():
    # Should use _default_formatter with given precision
    codeflash_output = _maybe_wrap_formatter(formatter=None, precision=2)
    f = codeflash_output  # 1.93μs -> 1.75μs (10.5% faster)


def test_formatter_with_thousands_separator():
    # Should insert thousands separator for int/float
    codeflash_output = _maybe_wrap_formatter(formatter=None, thousands=",")
    f = codeflash_output  # 4.87μs -> 3.26μs (49.4% faster)


def test_formatter_with_na_rep():
    # Should replace None/NaN with na_rep
    codeflash_output = _maybe_wrap_formatter(formatter=None, na_rep="MISSING")
    f = codeflash_output  # 4.94μs -> 3.28μs (50.5% faster)


def test_formatter_with_escape_html():
    # Should escape HTML special chars in string input
    codeflash_output = _maybe_wrap_formatter(formatter=None, escape="html")
    f = codeflash_output  # 4.84μs -> 3.23μs (50.0% faster)


def test_formatter_with_hyperlinks_html():
    # Should wrap http links in <a> tag
    codeflash_output = _maybe_wrap_formatter(formatter=None, hyperlinks="html")
    f = codeflash_output  # 4.69μs -> 3.28μs (42.9% faster)


def test_formatter_with_all_features():
    # Compose all features together
    codeflash_output = _maybe_wrap_formatter(
        formatter="{:.1f}",
        na_rep="NA",
        decimal=",",
        thousands=".",
        escape="html",
        hyperlinks="html",
    )
    f = codeflash_output  # 2.36μs -> 2.34μs (1.03% faster)
    # Hyperlink rendering and escaping
    url = "http://foo.com?a=1&b=2"
    # The formatter is "{:.1f}", so for a string it will error; so test with default formatter
    codeflash_output = _maybe_wrap_formatter(
        formatter=None,
        na_rep="NA",
        decimal=",",
        thousands=".",
        escape="html",
        hyperlinks="html",
    )
    f2 = codeflash_output  # 4.73μs -> 3.16μs (49.7% faster)


# 2. Edge Test Cases


def test_formatter_with_empty_string():
    # Should handle empty string input
    codeflash_output = _maybe_wrap_formatter(formatter=None)
    f = codeflash_output  # 4.51μs -> 3.08μs (46.5% faster)


def test_formatter_with_zero_and_negative_numbers():
    codeflash_output = _maybe_wrap_formatter(formatter=None, precision=2)
    f = codeflash_output  # 1.79μs -> 1.63μs (10.2% faster)


def test_formatter_with_large_numbers_and_custom_separators():
    codeflash_output = _maybe_wrap_formatter(
        formatter=None, precision=1, decimal=",", thousands=" "
    )
    f = codeflash_output  # 2.37μs -> 2.35μs (0.724% faster)


def test_formatter_with_bool_input():
    # Should treat bool as not int
    codeflash_output = _maybe_wrap_formatter(formatter=None)
    f = codeflash_output  # 4.72μs -> 3.05μs (54.9% faster)


def test_formatter_with_complex_numbers():
    codeflash_output = _maybe_wrap_formatter(formatter=None, precision=3)
    f = codeflash_output  # 1.78μs -> 1.67μs (6.83% faster)
    val = complex(1.23456, 7.89012)


def test_formatter_with_custom_callable_and_all_args():
    # Custom callable should override all formatting
    codeflash_output = _maybe_wrap_formatter(
        formatter=lambda x: f"!!{x}!!",
        na_rep="NONE",
        decimal=",",
        thousands=".",
        escape="latex",
        hyperlinks="latex",
    )
    f = codeflash_output  # 2.36μs -> 2.41μs (2.24% slower)


def test_formatter_with_nonstandard_decimal_only():
    codeflash_output = _maybe_wrap_formatter(formatter=None, decimal=",")
    f = codeflash_output  # 5.39μs -> 3.72μs (44.8% faster)


def test_formatter_with_nonstandard_thousands_only():
    codeflash_output = _maybe_wrap_formatter(formatter=None, thousands=" ")
    f = codeflash_output  # 5.15μs -> 3.68μs (40.1% faster)


def test_formatter_type_error():
    # Should raise TypeError for invalid formatter type
    with pytest.raises(TypeError):
        _maybe_wrap_formatter(formatter=123.456)  # 2.58μs -> 2.48μs (4.12% faster)


def test_formatter_with_nan_string():
    # Should not treat "nan" string as NA
    codeflash_output = _maybe_wrap_formatter(formatter=None, na_rep="NA")
    f = codeflash_output  # 4.72μs -> 3.48μs (35.8% faster)


def test_formatter_with_escape_latex():
    codeflash_output = _maybe_wrap_formatter(formatter=None, escape="latex")
    f = codeflash_output  # 4.79μs -> 3.40μs (40.9% faster)


def test_formatter_with_hyperlinks_latex():
    codeflash_output = _maybe_wrap_formatter(formatter=None, hyperlinks="latex")
    f = codeflash_output  # 4.92μs -> 3.38μs (45.4% faster)


# 3. Large Scale Test Cases


def test_formatter_large_list_of_numbers():
    # Test performance and correctness with a large list
    codeflash_output = _maybe_wrap_formatter(formatter=None, precision=2, thousands=",")
    f = codeflash_output  # 1.94μs -> 1.76μs (10.3% faster)
    inputs = list(range(1000))
    outputs = [f(x) for x in inputs]
    # All outputs should be str and match expected formatting
    for i, out in enumerate(outputs):
        pass


def test_formatter_large_list_of_floats():
    codeflash_output = _maybe_wrap_formatter(formatter=None, precision=3)
    f = codeflash_output  # 1.88μs -> 1.72μs (9.31% faster)
    inputs = [x + 0.123 for x in range(1000)]
    outputs = [f(x) for x in inputs]
    for inp, out in zip(inputs, outputs):
        pass


def test_formatter_large_list_with_na_rep():
    codeflash_output = _maybe_wrap_formatter(formatter=None, na_rep="NA")
    f = codeflash_output  # 5.12μs -> 3.52μs (45.4% faster)
    inputs = [None if i % 10 == 0 else i for i in range(1000)]
    outputs = [f(x) for x in inputs]
    for i, out in enumerate(outputs):
        if i % 10 == 0:
            pass
        else:
            pass


def test_formatter_large_list_with_escape():
    codeflash_output = _maybe_wrap_formatter(formatter=None, escape="html")
    f = codeflash_output  # 5.00μs -> 3.44μs (45.2% faster)
    # Mix of strings with escapable chars and numbers
    inputs = ["<foo>" if i % 2 == 0 else i for i in range(1000)]
    outputs = [f(x) for x in inputs]
    for i, out in enumerate(outputs):
        if i % 2 == 0:
            pass
        else:
            pass


def test_formatter_large_list_with_hyperlinks():
    codeflash_output = _maybe_wrap_formatter(formatter=None, hyperlinks="html")
    f = codeflash_output  # 4.98μs -> 3.49μs (42.6% faster)
    inputs = ["http://site.com/page" if i % 5 == 0 else f"item{i}" for i in range(1000)]
    outputs = [f(x) for x in inputs]
    for i, out in enumerate(outputs):
        if i % 5 == 0:
            pass
        else:
            pass


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import math
from typing import Callable, Union

# imports
import pytest
from pandas.io.formats.style_render import _maybe_wrap_formatter

# --- Begin: Minimal stubs and helpers to make the function testable in isolation ---


# Minimal isna implementation for test
def isna(x):
    # Treat None, float('nan'), math.nan, and pandas.NA-like as NA
    if x is None:
        return True
    try:
        return math.isnan(x)
    except Exception:
        return False


# Minimal _str_escape stub
def _str_escape(x, escape):
    # For test, just append escape string to value
    return f"{x}__{escape}"


# Minimal _render_href stub
def _render_href(x, format):
    # For test, just wrap value in <a> tags
    return f"<a href='{x}'>{x}</a>"


# --- Begin: Function under test (copied from above, using our stubs) ---


BaseFormatter = Union[str, Callable]

# --- Begin: Unit tests ---

# 1. BASIC TEST CASES


def test_formatter_with_none_returns_default_float():
    # Default precision is 6 (from get_option stub)
    codeflash_output = _maybe_wrap_formatter(None)
    func = codeflash_output  # 4.92μs -> 3.22μs (52.7% faster)


def test_formatter_with_none_returns_default_int():
    codeflash_output = _maybe_wrap_formatter(None)
    func = codeflash_output  # 4.76μs -> 3.01μs (58.4% faster)


def test_formatter_with_none_returns_default_str():
    codeflash_output = _maybe_wrap_formatter(None)
    func = codeflash_output  # 4.55μs -> 2.94μs (54.9% faster)


def test_formatter_with_str_template():
    codeflash_output = _maybe_wrap_formatter("{:.2f}")
    func = codeflash_output  # 1.27μs -> 1.25μs (1.76% faster)


def test_formatter_with_callable():
    codeflash_output = _maybe_wrap_formatter(lambda x: f"Value: {x}")
    func = codeflash_output  # 1.11μs -> 1.10μs (0.820% faster)


def test_formatter_with_na_rep_and_none():
    codeflash_output = _maybe_wrap_formatter(None, na_rep="MISSING")
    func = codeflash_output  # 5.13μs -> 3.59μs (42.9% faster)


def test_formatter_with_na_rep_and_custom_callable():
    codeflash_output = _maybe_wrap_formatter(lambda x: f"X{x}", na_rep="NA")
    func = codeflash_output  # 1.46μs -> 1.48μs (1.42% slower)


def test_formatter_with_precision_override():
    codeflash_output = _maybe_wrap_formatter(None, precision=2)
    func = codeflash_output  # 1.89μs -> 1.76μs (7.20% faster)


def test_formatter_with_thousands_separator():
    codeflash_output = _maybe_wrap_formatter(None, thousands=",")
    func = codeflash_output  # 4.83μs -> 3.29μs (46.7% faster)


def test_formatter_with_decimal_and_thousands():
    codeflash_output = _maybe_wrap_formatter(None, decimal=",", thousands=".")
    func = codeflash_output  # 5.32μs -> 3.85μs (37.9% faster)


def test_formatter_with_escape():
    codeflash_output = _maybe_wrap_formatter(None, escape="html")
    func = codeflash_output  # 4.96μs -> 3.37μs (47.5% faster)


def test_formatter_with_hyperlinks():
    codeflash_output = _maybe_wrap_formatter(None, hyperlinks="html")
    func = codeflash_output  # 4.76μs -> 3.31μs (43.7% faster)


def test_formatter_with_all_options():
    codeflash_output = _maybe_wrap_formatter(
        "{:.1f}",
        na_rep="NA",
        precision=1,
        decimal=",",
        thousands=".",
        escape="esc",
        hyperlinks="hl",
    )
    func = codeflash_output  # 2.42μs -> 2.49μs (2.70% slower)
    # Should format 12345.6 as '12.345,6', escape, hyperlink, but not use na_rep
    result = func(12345.6)


# 2. EDGE TEST CASES


def test_formatter_with_invalid_formatter_type_raises():
    with pytest.raises(TypeError):
        _maybe_wrap_formatter(123)  # 2.35μs -> 2.34μs (0.555% faster)


def test_formatter_with_nonstandard_decimal_only():
    codeflash_output = _maybe_wrap_formatter(None, decimal=",")
    func = codeflash_output  # 6.09μs -> 4.33μs (40.7% faster)


def test_formatter_with_nonstandard_thousands_only():
    codeflash_output = _maybe_wrap_formatter(None, thousands=" ")
    func = codeflash_output  # 5.13μs -> 3.77μs (36.1% faster)


def test_formatter_with_thousands_as_none():
    codeflash_output = _maybe_wrap_formatter(None, thousands=None)
    func = codeflash_output  # 4.67μs -> 3.19μs (46.2% faster)


def test_formatter_with_thousands_as_empty_string():
    codeflash_output = _maybe_wrap_formatter(None, thousands="")
    func = codeflash_output  # 5.21μs -> 3.68μs (41.4% faster)


def test_formatter_with_decimal_and_thousands_same():
    codeflash_output = _maybe_wrap_formatter(None, decimal=".", thousands=".")
    func = codeflash_output  # 5.32μs -> 3.64μs (46.3% faster)


def test_formatter_with_bool_input():
    codeflash_output = _maybe_wrap_formatter(None)
    func = codeflash_output  # 4.40μs -> 2.92μs (50.6% faster)


def test_formatter_with_complex_input():
    codeflash_output = _maybe_wrap_formatter(None, precision=2)
    func = codeflash_output  # 1.91μs -> 1.79μs (6.69% faster)


def test_formatter_with_large_precision():
    codeflash_output = _maybe_wrap_formatter(None, precision=15)
    func = codeflash_output  # 1.86μs -> 1.70μs (9.36% faster)


def test_formatter_with_very_large_int():
    codeflash_output = _maybe_wrap_formatter(None, thousands=",")
    func = codeflash_output  # 4.86μs -> 3.22μs (50.9% faster)
    val = 10**18 + 123456789


def test_formatter_with_non_string_na_rep():
    codeflash_output = _maybe_wrap_formatter(None, na_rep=123)
    func = codeflash_output  # 4.79μs -> 3.30μs (45.1% faster)


def test_formatter_with_escape_and_na_rep():
    codeflash_output = _maybe_wrap_formatter(None, escape="esc", na_rep="NA")
    func = codeflash_output  # 4.89μs -> 3.39μs (44.4% faster)


def test_formatter_with_hyperlinks_and_na_rep():
    codeflash_output = _maybe_wrap_formatter(None, hyperlinks="hl", na_rep="NA")
    func = codeflash_output  # 4.74μs -> 3.52μs (34.6% faster)


# 3. LARGE SCALE TEST CASES


def test_formatter_large_list_of_floats():
    codeflash_output = _maybe_wrap_formatter(None, precision=3)
    func = codeflash_output  # 1.85μs -> 1.77μs (4.18% faster)
    data = [i + 0.123456 for i in range(1000)]
    # All results should have 3 decimals
    result = [func(x) for x in data]
    for i, val in enumerate(result):
        pass


def test_formatter_large_list_of_ints_with_thousands():
    codeflash_output = _maybe_wrap_formatter(None, thousands=",")
    func = codeflash_output  # 4.81μs -> 3.28μs (46.7% faster)
    data = [i * 1000 for i in range(1000)]
    result = [func(x) for x in data]
    for i, val in enumerate(result):
        expected = f"{i * 1000:,}"


def test_formatter_large_list_with_na_rep():
    codeflash_output = _maybe_wrap_formatter(None, na_rep="NA")
    func = codeflash_output  # 4.80μs -> 3.40μs (41.2% faster)
    data = [i if i % 100 else None for i in range(1000)]
    result = [func(x) for x in data]
    for i, val in enumerate(result):
        if i % 100 == 0:
            pass
        else:
            pass


def test_formatter_large_list_with_all_options():
    codeflash_output = _maybe_wrap_formatter(
        "{:.2f}",
        na_rep="NA",
        precision=2,
        decimal=",",
        thousands=" ",
        escape="esc",
        hyperlinks="hl",
    )
    func = codeflash_output  # 2.45μs -> 2.53μs (2.89% slower)
    data = [i * 1000 + 0.5 for i in range(1000)]
    result = [func(x) for x in data]
    for i, val in enumerate(result):
        # "{:.2f}".format(x) -> e.g. "1000.50"
        # thousands=" ", decimal=",", so "1 000,50"
        expected_val = (
            f"{i * 1000 + 0.5:,.2f}".replace(",", "§_§-")
            .replace(".", ",")
            .replace("§_§-", " ")
        )
        expected_val = f"<a href='{expected_val}'>{expected_val}</a>"


def test_formatter_large_list_with_many_nans():
    codeflash_output = _maybe_wrap_formatter(None, na_rep="NA")
    func = codeflash_output  # 5.05μs -> 3.52μs (43.6% faster)
    data = [float("nan") if i % 2 == 0 else i for i in range(1000)]
    result = [func(x) for x in data]
    for i, val in enumerate(result):
        if i % 2 == 0:
            pass
        else:
            pass


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_maybe_wrap_formatter-mio6rkw9 and push.

Codeflash Static Badge

The optimization achieves a **35% speedup** through targeted fast-path optimizations and reduced function call overhead:

**Key Optimizations:**

1. **`get_option` Fast Path**: Added a direct lookup optimization for common option patterns. Instead of always calling `_get_single_key()` and `_get_root()` (which handle regex matching and validation), the code now:
   - Checks for simple keys without dots and does direct `_global_config` lookup
   - For dotted keys, manually splits and traverses the nested dict structure
   - Only falls back to the original expensive pattern-matching logic when needed

2. **`_default_formatter` String Creation**: Moved f-string template creation outside the formatting call to avoid repeated string construction, reducing per-call overhead for float/complex formatting.

3. **`_maybe_wrap_formatter` Closure Optimization**: 
   - Replaced conditional `get_option()` call with explicit if-else to avoid the expensive config lookup when precision is already provided
   - Converted lambda closures to proper function definitions to reduce closure creation overhead
   - For the `na_rep` wrapper, created a function with default parameters to avoid repeated attribute lookups in the closure

**Performance Impact Based on Tests:**
- **Biggest gains** (40-58% faster) occur when `formatter=None` and `get_option()` is called, as the fast-path config lookup dramatically reduces overhead
- **Moderate gains** (6-15% faster) for cases with explicit precision values, due to reduced closure overhead
- **Minimal impact** (0-3%) for cases using custom formatters, since they bypass the optimized paths

**Hot Path Relevance**: The function references show `_maybe_wrap_formatter` is called in pandas styling operations (`format()`, `format_index()`, `format_index_names()`) which are frequently used in data presentation workflows, making these optimizations particularly valuable for styling-heavy applications.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 2, 2025 06:17
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant