Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 2, 2025

📄 1,093% (10.93x) speedup for label_from_attrs in xarray/plot/utils.py

⏱️ Runtime : 308 milliseconds 25.8 milliseconds (best of 29 runs)

📝 Explanation and details

The optimization moves the expensive DuckArrayModule("pint").type lookup from inside the _get_units_from_attrs function to module-level initialization, where it's cached as _pint_array_type. This provides a 1093% speedup by eliminating repeated expensive imports.

Key optimization: The original code called DuckArrayModule("pint").type on every function call (5,057 times in the profiler), taking 908ms out of 915ms total runtime (99.3%). The optimized version performs this lookup only once at module import time and caches the result.

Performance impact by test case:

  • Single calls: 9-1369% faster across all test scenarios
  • Large-scale tests: Most dramatic improvements in bulk operations (1850-1853% faster for 1000 DataArray operations)

Why this works: DuckArrayModule involves dynamic imports and type checking that's expensive to repeat. Since the pint array type doesn't change during program execution, caching it at module level is safe and eliminates redundant work.

Real-world impact: Based on the function references, label_from_attrs is called extensively in plotting workflows - from line(), hist(), scatter(), and other plot functions for axis labeling, legends, and colorbars. This optimization will significantly speed up any plotting operation that generates labels, especially when creating multiple plots or faceted plots where the function gets called repeatedly.

Error handling: The optimization includes proper exception handling during module initialization to gracefully handle cases where pint is not available, maintaining the same behavior as the original code.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 418 Passed
🌀 Generated Regression Tests 5051 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 95.7%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_plot.py::TestPlot.test_label_from_attrs 573μs 64.4μs 791%✅
🌀 Generated Regression Tests and Runtime
import textwrap

# function to test
from typing import Any

# imports
import pytest
from xarray.plot.utils import label_from_attrs


class DummyPintArray:
    """Dummy class to simulate a Pint array with units attribute."""

    def __init__(self, units):
        self.units = units


# Minimal DataArray mock for testing
class DataArray:
    def __init__(self, name=None, attrs=None, data=None):
        self.name = name
        self.attrs = attrs if attrs is not None else {}
        self.data = data


from xarray.plot.utils import label_from_attrs

# unit tests

# --------------------- Basic Test Cases ---------------------


def test_none_input_returns_empty_string():
    # Test that None returns empty string
    codeflash_output = label_from_attrs(None)  # 626ns -> 573ns (9.25% faster)


def test_name_only():
    # Test DataArray with only name
    da = DataArray(name="temperature")
    codeflash_output = label_from_attrs(da)  # 147μs -> 15.0μs (883% faster)


def test_long_name_precedence():
    # long_name should take precedence over name and standard_name
    da = DataArray(
        name="temp",
        attrs={"long_name": "Air Temperature", "standard_name": "air_temperature"},
    )
    codeflash_output = label_from_attrs(da)  # 116μs -> 16.8μs (593% faster)


def test_standard_name_used_if_no_long_name():
    # standard_name should be used if long_name missing
    da = DataArray(name="temp", attrs={"standard_name": "air_temperature"})
    codeflash_output = label_from_attrs(da)  # 111μs -> 13.8μs (706% faster)


def test_units_from_attrs():
    # Test units from attrs
    da = DataArray(name="temp", attrs={"long_name": "Air Temperature", "units": "K"})
    codeflash_output = label_from_attrs(da)  # 117μs -> 17.3μs (575% faster)


def test_unit_from_attrs():
    # Test 'unit' key instead of 'units'
    da = DataArray(
        name="temp", attrs={"long_name": "Air Temperature", "unit": "Kelvin"}
    )
    codeflash_output = label_from_attrs(da)  # 113μs -> 17.9μs (535% faster)


def test_pint_array_units():
    # Test units from pint array type
    da = DataArray(
        name="temp", attrs={"long_name": "Air Temperature"}, data=DummyPintArray("degC")
    )
    codeflash_output = label_from_attrs(da)  # 112μs -> 16.0μs (604% faster)


def test_extra_appended():
    # Test extra string is appended
    da = DataArray(name="temp", attrs={"long_name": "Air Temperature"})
    codeflash_output = label_from_attrs(
        da, extra=" (surface)"
    )  # 116μs -> 17.4μs (566% faster)


def test_no_name_or_attrs():
    # Test DataArray with no name or attrs
    da = DataArray()
    codeflash_output = label_from_attrs(da)  # 104μs -> 7.48μs (1292% faster)


# --------------------- Edge Test Cases ---------------------


def test_long_name_latex_wrapping():
    # Test latex string with even number of $ and wrapping
    long_latex = "$x^2 + y^2 = z^2$"
    da = DataArray(name="temp", attrs={"long_name": long_latex})
    # Should wrap at 60 chars, but this is short so no wrapping
    codeflash_output = label_from_attrs(da)  # 115μs -> 19.2μs (503% faster)


def test_long_latex_with_extra_and_units():
    # Test latex string with extra and units, wrapping at 60 chars
    long_latex = "$" + "a" * 70 + "$"
    da = DataArray(name="temp", attrs={"long_name": long_latex, "units": "m"})
    codeflash_output = label_from_attrs(da, extra=" extra")
    label = codeflash_output  # 121μs -> 23.5μs (415% faster)
    # Should wrap at 60 chars, splitting latex with $\n$, and include units
    expected = "$\n$".join(
        textwrap.wrap(long_latex + " extra [m]", 60, break_long_words=False)
    )


def test_odd_number_of_dollar_signs():
    # Test latex string with odd number of $ (should NOT use latex wrapping)
    long_latex = "$x^2 + y^2 = z^2"
    da = DataArray(name="temp", attrs={"long_name": long_latex})
    # Should wrap at 30 chars, not latex wrapping
    expected = "\n".join(textwrap.wrap(long_latex, 30))
    codeflash_output = label_from_attrs(da)  # 109μs -> 10.0μs (991% faster)


def test_long_name_exactly_30_chars():
    # Test wrapping at exactly 30 chars
    long_name = "a" * 30
    da = DataArray(name="temp", attrs={"long_name": long_name})
    # Should not insert any newline
    codeflash_output = label_from_attrs(da)  # 109μs -> 13.5μs (714% faster)


def test_long_name_just_over_30_chars():
    # Test wrapping at just over 30 chars
    long_name = "a" * 31
    da = DataArray(name="temp", attrs={"long_name": long_name})
    expected = "\n".join(textwrap.wrap(long_name, 30))
    codeflash_output = label_from_attrs(da)  # 108μs -> 9.25μs (1075% faster)


def test_long_name_with_spaces_wrap():
    # Test wrapping with spaces, should break at spaces
    long_name = "This is a very long variable name that should wrap nicely"
    da = DataArray(name="temp", attrs={"long_name": long_name})
    expected = "\n".join(textwrap.wrap(long_name, 30))
    codeflash_output = label_from_attrs(da)  # 117μs -> 14.9μs (687% faster)


def test_long_name_with_extra_and_units_wrap():
    # Test wrapping with extra and units, total string > 30 chars
    da = DataArray(
        name="temp", attrs={"long_name": "Temperature"}, data=DummyPintArray("degC")
    )
    codeflash_output = label_from_attrs(da, extra=" at surface")
    label = codeflash_output  # 116μs -> 16.6μs (599% faster)
    expected = "\n".join(textwrap.wrap("Temperature at surface [degC]", 30))


def test_empty_attrs_and_name_none():
    # DataArray with name=None and no attrs
    da = DataArray(name=None, attrs={})
    codeflash_output = label_from_attrs(da)  # 105μs -> 7.18μs (1369% faster)


def test_units_and_unit_both_present():
    # If both units and unit present, units should take precedence
    da = DataArray(
        name="temp", attrs={"long_name": "Temp", "units": "K", "unit": "Kelvin"}
    )
    codeflash_output = label_from_attrs(da)  # 114μs -> 15.8μs (626% faster)


def test_units_and_pint_array_present():
    # If both pint array and units present, pint array takes precedence
    da = DataArray(
        name="temp",
        attrs={"long_name": "Temp", "units": "K"},
        data=DummyPintArray("degF"),
    )
    codeflash_output = label_from_attrs(da)  # 114μs -> 15.5μs (637% faster)


def test_extra_long_extra_string():
    # Test with a very long extra string
    da = DataArray(name="temp", attrs={"long_name": "Temperature"})
    extra = " " + "extra" * 20
    codeflash_output = label_from_attrs(da, extra=extra)
    label = codeflash_output  # 121μs -> 26.7μs (355% faster)
    expected = "\n".join(textwrap.wrap("Temperature" + extra, 30))


def test_non_string_units():
    # Test units as non-string type (e.g., int)
    da = DataArray(name="temp", attrs={"long_name": "Temp", "units": 123})
    codeflash_output = label_from_attrs(da)  # 111μs -> 15.5μs (616% faster)


def test_non_string_long_name():
    # Test long_name as non-string type
    da = DataArray(name="temp", attrs={"long_name": 456})
    codeflash_output = label_from_attrs(da)  # 109μs -> 12.6μs (766% faster)


def test_non_string_standard_name():
    # Test standard_name as non-string type
    da = DataArray(name="temp", attrs={"standard_name": 789})
    codeflash_output = label_from_attrs(da)  # 110μs -> 13.1μs (745% faster)


def test_non_string_unit():
    # Test 'unit' as non-string type
    da = DataArray(name="temp", attrs={"long_name": "Temp", "unit": 42})
    codeflash_output = label_from_attrs(da)  # 115μs -> 16.3μs (607% faster)


def test_non_string_name():
    # Test name as non-string type
    da = DataArray(name=101)
    codeflash_output = label_from_attrs(da)  # 109μs -> 13.3μs (728% faster)


# --------------------- Large Scale Test Cases ---------------------


def test_many_dataarrays_unique_names():
    # Test scalability with many DataArrays with unique names
    for i in range(1000):
        da = DataArray(name=f"var_{i}")
        codeflash_output = label_from_attrs(da)  # 56.5ms -> 2.90ms (1850% faster)


def test_many_dataarrays_long_names_and_units():
    # Test scalability with many DataArrays with long names and units
    for i in range(1000):
        long_name = "Variable " + str(i) + " " + ("x" * 25)
        da = DataArray(
            name=f"var_{i}", attrs={"long_name": long_name, "units": f"unit_{i}"}
        )
        expected = "\n".join(textwrap.wrap(f"{long_name} [unit_{i}]", 30))
        codeflash_output = label_from_attrs(da)  # 65.1ms -> 7.43ms (775% faster)


def test_many_dataarrays_latex_names():
    # Test scalability with many latex names that should wrap at 60 chars
    for i in range(1000):
        latex_name = "$" + "x" * (50 + i % 10) + "$"
        da = DataArray(
            name=f"var_{i}", attrs={"long_name": latex_name, "units": f"u_{i}"}
        )
        expected = "$\n$".join(
            textwrap.wrap(latex_name + f" [u_{i}]", 60, break_long_words=False)
        )
        codeflash_output = label_from_attrs(da)  # 63.6ms -> 6.91ms (820% faster)


def test_large_dataarray_with_long_extra():
    # Test a single DataArray with a very long name and extra
    long_name = "A" * 500
    extra = "B" * 400
    da = DataArray(name="var", attrs={"long_name": long_name, "units": "mm"})
    codeflash_output = label_from_attrs(da, extra=extra)
    label = codeflash_output  # 190μs -> 88.7μs (115% faster)
    expected = "\n".join(textwrap.wrap(long_name + extra + " [mm]", 30))


def test_large_dataarray_latex_long_extra():
    # Test a single DataArray with a very long latex name and extra
    latex_name = "$" + "A" * 500 + "$"
    extra = "B" * 400
    da = DataArray(name="var", attrs={"long_name": latex_name, "units": "mm"})
    codeflash_output = label_from_attrs(da, extra=extra)
    label = codeflash_output  # 159μs -> 61.8μs (158% faster)
    expected = "$\n$".join(
        textwrap.wrap(latex_name + extra + " [mm]", 60, break_long_words=False)
    )


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import textwrap

# imports
import pytest
from xarray.plot.utils import label_from_attrs


# Minimal DataArray mock for testing (since we cannot import xarray)
class DataArray:
    def __init__(self, name=None, attrs=None, data=None):
        self.name = name
        self.attrs = attrs if attrs is not None else {}
        self.data = data


# Minimal DuckArrayModule mock for pint type detection
class DummyPintArray:
    def __init__(self, units):
        self.units = units


from xarray.plot.utils import label_from_attrs

# unit tests

# --- Basic Test Cases ---


def test_none_input_returns_empty_string():
    # Should return empty string if da is None
    codeflash_output = label_from_attrs(None)  # 559ns -> 534ns (4.68% faster)


def test_label_from_long_name_and_units():
    # Uses long_name and units from attrs
    da = DataArray(name="foo", attrs={"long_name": "Temperature", "units": "K"})
    codeflash_output = label_from_attrs(da)  # 118μs -> 17.2μs (588% faster)


def test_label_from_standard_name_and_unit():
    # Uses standard_name and unit from attrs
    da = DataArray(name="foo", attrs={"standard_name": "Pressure", "unit": "Pa"})
    codeflash_output = label_from_attrs(da)  # 114μs -> 16.5μs (594% faster)


def test_label_from_name_fallback():
    # Uses name if no long_name or standard_name
    da = DataArray(name="humidity", attrs={})
    codeflash_output = label_from_attrs(da)  # 108μs -> 13.9μs (686% faster)


def test_label_from_name_and_no_units():
    # Uses name, no units
    da = DataArray(name="wind_speed", attrs={})
    codeflash_output = label_from_attrs(da)  # 112μs -> 13.5μs (733% faster)


def test_label_with_extra_string():
    # Appends extra string to label
    da = DataArray(name="foo", attrs={"long_name": "Temperature", "units": "K"})
    codeflash_output = label_from_attrs(da, extra=" (max)")
    result = codeflash_output  # 117μs -> 18.4μs (542% faster)


def test_label_from_pint_array_units():
    # Uses units from pint array type
    da = DataArray(name="foo", attrs={}, data=DummyPintArray("m/s"))
    codeflash_output = label_from_attrs(da)  # 113μs -> 12.7μs (797% faster)


# --- Edge Test Cases ---


def test_label_with_missing_name_and_attrs():
    # No name and no attrs
    da = DataArray(name=None, attrs={})
    codeflash_output = label_from_attrs(da)  # 105μs -> 7.14μs (1382% faster)


def test_label_with_empty_long_name():
    # long_name is empty string
    da = DataArray(name="foo", attrs={"long_name": ""})
    codeflash_output = label_from_attrs(da)  # 103μs -> 7.59μs (1267% faster)


def test_label_with_long_name_and_unit_and_units():
    # Both 'unit' and 'units' present, should prefer 'units'
    da = DataArray(name="foo", attrs={"long_name": "Temp", "units": "K", "unit": "C"})
    codeflash_output = label_from_attrs(da)  # 113μs -> 16.0μs (607% faster)


def test_label_with_latex_sequence():
    # Name is a latex sequence, should wrap at 60 chars and use $ separators
    latex_name = "$x = y + z$"
    da = DataArray(name=None, attrs={"long_name": latex_name, "units": "m"})
    codeflash_output = label_from_attrs(da)
    result = codeflash_output  # 118μs -> 19.2μs (518% faster)


def test_label_with_latex_sequence_and_extra_long():
    # Long latex string should wrap at 60 chars
    latex_name = "$" + "a" * 70 + "$"
    da = DataArray(name=None, attrs={"long_name": latex_name, "units": "kg"})
    codeflash_output = label_from_attrs(da)
    result = codeflash_output  # 121μs -> 21.9μs (452% faster)


def test_label_with_long_nonlatex_name_wrap():
    # Non-latex long name should wrap at 30 chars
    long_name = "VeryLongName_" + "x" * 40
    da = DataArray(name=None, attrs={"long_name": long_name, "units": "kg"})
    codeflash_output = label_from_attrs(da)
    result = codeflash_output  # 119μs -> 22.2μs (438% faster)


def test_label_with_extra_and_long_name_wrap():
    # Extra string causes wrapping
    long_name = "VeryLongName_" + "x" * 25
    da = DataArray(name=None, attrs={"long_name": long_name, "units": "kg"})
    codeflash_output = label_from_attrs(da, extra=" plus some extra info")
    result = codeflash_output  # 123μs -> 26.4μs (366% faster)


def test_label_with_units_and_unit_missing():
    # Neither units nor unit present
    da = DataArray(name="foo", attrs={"long_name": "Bar"})
    codeflash_output = label_from_attrs(da)  # 109μs -> 12.8μs (750% faster)


def test_label_with_units_and_unit_both_missing():
    # Both units and unit missing, and no data units
    da = DataArray(name="foo", attrs={})
    codeflash_output = label_from_attrs(da)  # 110μs -> 12.6μs (780% faster)


def test_label_with_units_as_none():
    # units is None, should not crash
    da = DataArray(name="foo", attrs={"units": None})
    codeflash_output = label_from_attrs(da)  # 115μs -> 16.7μs (590% faster)


def test_label_with_unit_as_none():
    # unit is None, should not crash
    da = DataArray(name="foo", attrs={"unit": None})
    codeflash_output = label_from_attrs(da)  # 115μs -> 16.1μs (616% faster)


def test_label_with_non_string_units():
    # units is a number, should convert to string
    da = DataArray(name="foo", attrs={"units": 123})
    codeflash_output = label_from_attrs(da)  # 114μs -> 15.9μs (621% faster)


def test_label_with_non_string_unit():
    # unit is a number, should convert to string
    da = DataArray(name="foo", attrs={"unit": 456})
    codeflash_output = label_from_attrs(da)  # 113μs -> 16.3μs (597% faster)


def test_label_with_extra_empty_string():
    # Extra is empty string, should not affect label
    da = DataArray(name="foo", attrs={"long_name": "Bar", "units": "kg"})
    codeflash_output = label_from_attrs(da, extra="")  # 116μs -> 16.3μs (617% faster)


# --- Large Scale Test Cases ---


def test_label_large_scale_many_elements():
    # Test with many DataArray objects to check performance/scalability
    das = [
        DataArray(
            name=f"name_{i}", attrs={"long_name": f"LongName_{i}", "units": f"unit_{i}"}
        )
        for i in range(1000)
    ]
    for i, da in enumerate(das):
        expected = f"LongName_{i} [unit_{i}]"
        codeflash_output = label_from_attrs(da)  # 59.6ms -> 4.68ms (1174% faster)


def test_label_large_scale_long_names():
    # Test with very long names to trigger wrapping
    long_name = "A" * 100
    da = DataArray(name=None, attrs={"long_name": long_name, "units": "kg"})
    codeflash_output = label_from_attrs(da)
    result = codeflash_output  # 128μs -> 27.1μs (376% faster)
    # Should wrap every 30 chars (non-latex)
    lines = result.split("\n")


def test_label_large_scale_long_latex_names():
    # Test with very long latex names to trigger wrapping at 60 chars
    latex_name = "$" + "B" * 130 + "$"
    da = DataArray(name=None, attrs={"long_name": latex_name, "units": "m"})
    codeflash_output = label_from_attrs(da)
    result = codeflash_output  # 124μs -> 25.0μs (397% faster)
    # Should wrap every 60 chars (latex)
    lines = result.split("$\n$")


def test_label_large_scale_extra_long_extra():
    # Test with extra long 'extra' string
    long_name = "A" * 50
    extra = "B" * 50
    da = DataArray(name=None, attrs={"long_name": long_name, "units": "kg"})
    codeflash_output = label_from_attrs(da, extra=extra)
    result = codeflash_output  # 125μs -> 26.0μs (382% faster)
    # Should wrap at 30 chars (non-latex)
    lines = result.split("\n")


def test_label_large_scale_pint_arrays():
    # Test with many pint arrays
    for i in range(1000):
        da = DataArray(name=f"foo_{i}", attrs={}, data=DummyPintArray(f"unit_{i}"))
        expected = f"foo_{i} [unit_{i}]"
        codeflash_output = label_from_attrs(da)  # 56.8ms -> 2.91ms (1853% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
Timer unit: 1e-09 s

To edit these changes git checkout codeflash/optimize-label_from_attrs-mio9uazb and push.

Codeflash Static Badge

The optimization moves the expensive `DuckArrayModule("pint").type` lookup from inside the `_get_units_from_attrs` function to module-level initialization, where it's cached as `_pint_array_type`. This provides a **1093% speedup** by eliminating repeated expensive imports.

**Key optimization**: The original code called `DuckArrayModule("pint").type` on every function call (5,057 times in the profiler), taking 908ms out of 915ms total runtime (99.3%). The optimized version performs this lookup only once at module import time and caches the result.

**Performance impact by test case**:
- **Single calls**: 9-1369% faster across all test scenarios
- **Large-scale tests**: Most dramatic improvements in bulk operations (1850-1853% faster for 1000 DataArray operations)

**Why this works**: `DuckArrayModule` involves dynamic imports and type checking that's expensive to repeat. Since the pint array type doesn't change during program execution, caching it at module level is safe and eliminates redundant work.

**Real-world impact**: Based on the function references, `label_from_attrs` is called extensively in plotting workflows - from `line()`, `hist()`, `scatter()`, and other plot functions for axis labeling, legends, and colorbars. This optimization will significantly speed up any plotting operation that generates labels, especially when creating multiple plots or faceted plots where the function gets called repeatedly.

**Error handling**: The optimization includes proper exception handling during module initialization to gracefully handle cases where pint is not available, maintaining the same behavior as the original code.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 2, 2025 07:43
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant