Skip to content

Commit 8694818

Browse files
max-sixtypre-commit-ci[bot]keewisClaude
authored
Allow drop_conflicts to handle non-bool returns from __eq__ (#10741)
* Fix drop_conflicts to handle non-bool returns from __eq__ When merging attrs with drop_conflicts, objects whose __eq__ returns non-bool values (e.g., numpy arrays) would raise ValueError. Now these are caught and treated as non-equivalent, causing the attribute to be dropped rather than raising an error. Fixes regression from #10726 * Add changelog entry * Suppress DeprecationWarning for ambiguous array truth values The CI was failing because NumPy raises a DeprecationWarning before ValueError when evaluating ambiguous truth values. Since we're properly handling the ValueError, we suppress the warning in this specific context. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move DeprecationWarning suppression to equivalent function Rather than suppressing the warning in merge_attrs, handle it at the source in equivalent() where the 'or' operation happens. This is cleaner and includes a clear comment about when the suppression can be removed (when minimum numpy version >= 2.0). The warning only occurs in numpy < 2.0; numpy 2.0+ raises ValueError directly, which we already handle properly in merge_attrs. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move DeprecationWarning suppression from library to test The warning only occurs in our specific test case with custom objects that return numpy arrays from __eq__. This is a very edge case scenario, so it's more appropriate to suppress the warning in the test rather than in the library code. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Simplify drop_conflicts implementation Based on code review, simplified the logic to: - Process attributes in a single loop instead of nested operations - Eliminate intermediate dictionary creation - Make the control flow clearer and more efficient * Update whats-new.rst Co-authored-by: Justus Magin <keewis@users.noreply.github.com> * Update whats-new.rst Co-authored-by: Justus Magin <keewis@users.noreply.github.com> * Implement equivalent_attrs function as suggested by @keewis - Create dedicated equivalent_attrs function that wraps equivalent() - Returns False when ValueError is raised (e.g., for numpy arrays) - Simplifies merge_attrs logic by encapsulating error handling - Makes future behavior changes easier * Add pathological test cases suggested by @shoyer - Test NumPy object arrays with nested arrays - Test xarray.Dataset objects as attributes - Test Pandas Series as attributes - Update equivalent_attrs to catch TypeError in addition to ValueError - Ensure equivalent_attrs always returns a boolean (not Dataset/array-like) These cases all properly get dropped when they can't be reliably compared. * Use truthiness in equivalent_attrs for consistency with Python - Choose truthiness over strict boolean checking for consistency with Python's standard 'if a == b:' behavior - Accept edge cases with non-standard __eq__ methods as a deliberate tradeoff - Add comprehensive tests documenting the behavior and edge cases - Add clear documentation explaining the design choice and its implications This approach is more permissive and works better with numpy scalars and similar types, while still catching truly ambiguous cases (ValueError/TypeError). * Implement @shoyer's stricter equivalent() function - Rewrite equivalent() to reject non-boolean comparison results - Accept numpy bool scalars (np.bool_) but reject other non-bool types - Simplify equivalent_attrs() since equivalent() now handles non-bool cases - Update tests to reflect stricter behavior with non-standard __eq__ methods This makes comparisons more predictable by rejecting ambiguous cases like Dataset comparisons, custom objects with weird __eq__, etc. The tradeoff is being less permissive than Python's standard 'if a == b:' behavior. * Address @shoyer's review comments - Fix comment to say 'nested numpy arrays' instead of 'numpy arrays' - Use del instead of pop() to avoid creating unused return value - Split long test methods into smaller, focused test methods - Move all imports (warnings, pandas) to top of file Co-authored-by: Claude <claude@anthropic.com> * Address @shoyer's review comments - Split long test methods into smaller, focused test cases: - test_merge_attrs_drop_conflicts_ambiguous_array_returns split to add test_merge_attrs_drop_conflicts_all_true_array - test_merge_attrs_drop_conflicts_non_boolean_eq_returns split into test_merge_attrs_drop_conflicts_eq_returns_string and test_merge_attrs_drop_conflicts_eq_returns_number - Ran formatter which removed a blank line Co-authored-by: Claude <claude@anthropic.com> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Magin <keewis@users.noreply.github.com> Co-authored-by: Claude <claude@anthropic.com>
1 parent 8c5706d commit 8694818

File tree

4 files changed

+316
-14
lines changed

4 files changed

+316
-14
lines changed

doc/whats-new.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,11 +53,15 @@ Bug fixes
5353
- Fix error when encoding an empty :py:class:`numpy.datetime64` array
5454
(:issue:`10722`, :pull:`10723`). By `Spencer Clark
5555
<https://github.com/spencerkclark>`_.
56+
- Propagate coordinate attrs in :py:meth:`xarray.Dataset.map` (:issue:`9317`, :pull:`10602`).
5657
- Fix error from ``to_netcdf(..., compute=False)`` when using Dask Distributed
5758
(:issue:`10725`).
5859
By `Stephan Hoyer <https://github.com/shoyer>`_.
5960
- Propagation coordinate attrs in :py:meth:`xarray.Dataset.map` (:issue:`9317`, :pull:`10602`).
6061
By `Justus Magin <https://github.com/keewis>`_.
62+
- Allow ``combine_attrs="drop_conflicts"`` to handle objects with ``__eq__`` methods that return
63+
non-bool values (e.g., numpy arrays) without raising ``ValueError`` (:pull:`10726`).
64+
By `Maximilian Roos <https://github.com/max-sixty>`_.
6165

6266
Documentation
6367
~~~~~~~~~~~~~

xarray/core/utils.py

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -239,17 +239,34 @@ def equivalent(first: T, second: T) -> bool:
239239
"""Compare two objects for equivalence (identity or equality), using
240240
array_equiv if either object is an ndarray. If both objects are lists,
241241
equivalent is sequentially called on all the elements.
242+
243+
Returns False for any comparison that doesn't return a boolean,
244+
making this function safer to use with objects that have non-standard
245+
__eq__ implementations.
242246
"""
243247
# TODO: refactor to avoid circular import
244248
from xarray.core import duck_array_ops
245249

246250
if first is second:
247251
return True
252+
248253
if isinstance(first, np.ndarray) or isinstance(second, np.ndarray):
249254
return duck_array_ops.array_equiv(first, second)
255+
250256
if isinstance(first, list) or isinstance(second, list):
251257
return list_equiv(first, second) # type: ignore[arg-type]
252-
return (first == second) or (pd.isnull(first) and pd.isnull(second)) # type: ignore[call-overload]
258+
259+
# For non-array/list types, use == but require boolean result
260+
result = first == second
261+
if not isinstance(result, bool):
262+
# Accept numpy bool scalars as well
263+
if isinstance(result, np.bool_):
264+
return bool(result)
265+
# Reject any other non-boolean type (Dataset, Series, custom objects, etc.)
266+
return False
267+
268+
# Check for NaN equivalence
269+
return result or (pd.isnull(first) and pd.isnull(second)) # type: ignore[call-overload]
253270

254271

255272
def list_equiv(first: Sequence[T], second: Sequence[T]) -> bool:

xarray/structure/merge.py

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -607,6 +607,25 @@ def merge_coords(
607607
return variables, out_indexes
608608

609609

610+
def equivalent_attrs(a: Any, b: Any) -> bool:
611+
"""Check if two attribute values are equivalent.
612+
613+
Returns False if the comparison raises ValueError or TypeError.
614+
This handles cases like numpy arrays with ambiguous truth values
615+
and xarray Datasets which can't be directly converted to numpy arrays.
616+
617+
Since equivalent() now handles non-boolean returns by returning False,
618+
this wrapper mainly catches exceptions from comparisons that can't be
619+
evaluated at all.
620+
"""
621+
try:
622+
return equivalent(a, b)
623+
except (ValueError, TypeError):
624+
# These exceptions indicate the comparison is truly ambiguous
625+
# (e.g., nested numpy arrays that would raise "ambiguous truth value")
626+
return False
627+
628+
610629
def merge_attrs(variable_attrs, combine_attrs, context=None):
611630
"""Combine attributes from different variables according to combine_attrs"""
612631
if not variable_attrs:
@@ -633,20 +652,18 @@ def merge_attrs(variable_attrs, combine_attrs, context=None):
633652
elif combine_attrs == "drop_conflicts":
634653
result = {}
635654
dropped_keys = set()
655+
636656
for attrs in variable_attrs:
637-
result.update(
638-
{
639-
key: value
640-
for key, value in attrs.items()
641-
if key not in result and key not in dropped_keys
642-
}
643-
)
644-
result = {
645-
key: value
646-
for key, value in result.items()
647-
if key not in attrs or equivalent(attrs[key], value)
648-
}
649-
dropped_keys |= {key for key in attrs if key not in result}
657+
for key, value in attrs.items():
658+
if key in dropped_keys:
659+
continue
660+
661+
if key not in result:
662+
result[key] = value
663+
elif not equivalent_attrs(result[key], value):
664+
del result[key]
665+
dropped_keys.add(key)
666+
650667
return result
651668
elif combine_attrs == "identical":
652669
result = dict(variable_attrs[0])

xarray/tests/test_merge.py

Lines changed: 264 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
from __future__ import annotations
22

3+
import warnings
4+
35
import numpy as np
6+
import pandas as pd
47
import pytest
58

69
import xarray as xr
@@ -235,6 +238,267 @@ def test_merge_attrs_drop_conflicts(self):
235238
expected = xr.Dataset(attrs={"a": 0, "d": 0, "e": 0})
236239
assert_identical(actual, expected)
237240

241+
def test_merge_attrs_drop_conflicts_numpy_arrays(self):
242+
"""Test drop_conflicts with numpy arrays."""
243+
# Test with numpy arrays (which return arrays from ==)
244+
arr1 = np.array([1, 2, 3])
245+
arr2 = np.array([1, 2, 3])
246+
arr3 = np.array([4, 5, 6])
247+
248+
ds1 = xr.Dataset(attrs={"arr": arr1, "scalar": 1})
249+
ds2 = xr.Dataset(attrs={"arr": arr2, "scalar": 1}) # Same array values
250+
ds3 = xr.Dataset(attrs={"arr": arr3, "other": 2}) # Different array values
251+
252+
# Arrays are considered equivalent if they have the same values
253+
actual = xr.merge([ds1, ds2], combine_attrs="drop_conflicts")
254+
assert "arr" in actual.attrs # Should keep the array since they're equivalent
255+
assert actual.attrs["scalar"] == 1
256+
257+
# Different arrays cause the attribute to be dropped
258+
actual = xr.merge([ds1, ds3], combine_attrs="drop_conflicts")
259+
assert "arr" not in actual.attrs # Should drop due to conflict
260+
assert "other" in actual.attrs
261+
262+
def test_merge_attrs_drop_conflicts_custom_eq_returns_array(self):
263+
"""Test drop_conflicts with custom objects that return arrays from __eq__."""
264+
265+
# Test with custom objects that return non-bool from __eq__
266+
class CustomEq:
267+
"""Object whose __eq__ returns a non-bool value."""
268+
269+
def __init__(self, value):
270+
self.value = value
271+
272+
def __eq__(self, other):
273+
if not isinstance(other, CustomEq):
274+
return False
275+
# Return a numpy array (truthy if all elements are non-zero)
276+
return np.array([self.value == other.value])
277+
278+
def __repr__(self):
279+
return f"CustomEq({self.value})"
280+
281+
obj1 = CustomEq(42)
282+
obj2 = CustomEq(42) # Same value
283+
obj3 = CustomEq(99) # Different value
284+
285+
ds4 = xr.Dataset(attrs={"custom": obj1, "x": 1})
286+
ds5 = xr.Dataset(attrs={"custom": obj2, "x": 1})
287+
ds6 = xr.Dataset(attrs={"custom": obj3, "y": 2})
288+
289+
# Suppress DeprecationWarning from numpy < 2.0 about ambiguous truth values
290+
# when our custom __eq__ returns arrays that are evaluated in boolean context
291+
with warnings.catch_warnings():
292+
warnings.filterwarnings("ignore", category=DeprecationWarning)
293+
294+
# Objects returning arrays are dropped (non-boolean return)
295+
actual = xr.merge([ds4, ds5], combine_attrs="drop_conflicts")
296+
assert "custom" not in actual.attrs # Dropped - returns array, not bool
297+
assert actual.attrs["x"] == 1
298+
299+
# Different values also dropped (returns array, not bool)
300+
actual = xr.merge([ds4, ds6], combine_attrs="drop_conflicts")
301+
assert "custom" not in actual.attrs # Dropped - returns non-boolean
302+
assert actual.attrs["x"] == 1
303+
assert actual.attrs["y"] == 2
304+
305+
def test_merge_attrs_drop_conflicts_ambiguous_array_returns(self):
306+
"""Test drop_conflicts with objects returning ambiguous arrays from __eq__."""
307+
308+
# Test edge case: object whose __eq__ returns empty array (ambiguous truth value)
309+
class EmptyArrayEq:
310+
def __eq__(self, other):
311+
if not isinstance(other, EmptyArrayEq):
312+
return False
313+
return np.array([]) # Empty array has ambiguous truth value
314+
315+
def __repr__(self):
316+
return "EmptyArrayEq()"
317+
318+
empty_obj1 = EmptyArrayEq()
319+
empty_obj2 = EmptyArrayEq()
320+
321+
ds7 = xr.Dataset(attrs={"empty": empty_obj1})
322+
ds8 = xr.Dataset(attrs={"empty": empty_obj2})
323+
324+
# With new behavior: ambiguous truth values are treated as non-equivalent
325+
# So the attribute is dropped instead of raising an error
326+
with warnings.catch_warnings():
327+
warnings.filterwarnings("ignore", category=DeprecationWarning)
328+
actual = xr.merge([ds7, ds8], combine_attrs="drop_conflicts")
329+
assert "empty" not in actual.attrs # Dropped due to ambiguous comparison
330+
331+
# Test with object that returns multi-element array (also ambiguous)
332+
class MultiArrayEq:
333+
def __eq__(self, other):
334+
if not isinstance(other, MultiArrayEq):
335+
return False
336+
return np.array([True, False]) # Multi-element array is ambiguous
337+
338+
def __repr__(self):
339+
return "MultiArrayEq()"
340+
341+
multi_obj1 = MultiArrayEq()
342+
multi_obj2 = MultiArrayEq()
343+
344+
ds9 = xr.Dataset(attrs={"multi": multi_obj1})
345+
ds10 = xr.Dataset(attrs={"multi": multi_obj2})
346+
347+
# With new behavior: ambiguous arrays are treated as non-equivalent
348+
with warnings.catch_warnings():
349+
warnings.filterwarnings("ignore", category=DeprecationWarning)
350+
actual = xr.merge([ds9, ds10], combine_attrs="drop_conflicts")
351+
assert "multi" not in actual.attrs # Dropped due to ambiguous comparison
352+
353+
def test_merge_attrs_drop_conflicts_all_true_array(self):
354+
"""Test drop_conflicts with all-True multi-element array from __eq__."""
355+
356+
# Test with all-True multi-element array (unambiguous truthy)
357+
class AllTrueArrayEq:
358+
def __eq__(self, other):
359+
if not isinstance(other, AllTrueArrayEq):
360+
return False
361+
return np.array([True, True, True]) # All True, but still multi-element
362+
363+
def __repr__(self):
364+
return "AllTrueArrayEq()"
365+
366+
alltrue1 = AllTrueArrayEq()
367+
alltrue2 = AllTrueArrayEq()
368+
369+
ds11 = xr.Dataset(attrs={"alltrue": alltrue1})
370+
ds12 = xr.Dataset(attrs={"alltrue": alltrue2})
371+
372+
# Multi-element arrays are ambiguous even if all True
373+
actual = xr.merge([ds11, ds12], combine_attrs="drop_conflicts")
374+
assert "alltrue" not in actual.attrs # Dropped due to ambiguous comparison
375+
376+
def test_merge_attrs_drop_conflicts_nested_arrays(self):
377+
"""Test drop_conflicts with NumPy object arrays containing nested arrays."""
378+
# Test 1: NumPy object arrays with nested arrays
379+
# These can have complex comparison behavior
380+
x = np.array([None], dtype=object)
381+
x[0] = np.arange(3)
382+
y = np.array([None], dtype=object)
383+
y[0] = np.arange(10, 13)
384+
385+
ds1 = xr.Dataset(attrs={"nested_array": x, "common": 1})
386+
ds2 = xr.Dataset(attrs={"nested_array": y, "common": 1})
387+
388+
# Different nested arrays should cause attribute to be dropped
389+
actual = xr.merge([ds1, ds2], combine_attrs="drop_conflicts")
390+
assert (
391+
"nested_array" not in actual.attrs
392+
) # Dropped due to different nested arrays
393+
assert actual.attrs["common"] == 1
394+
395+
# Test with identical nested arrays
396+
# Note: Even identical nested arrays will be dropped because comparison
397+
# raises ValueError due to ambiguous truth value
398+
z = np.array([None], dtype=object)
399+
z[0] = np.arange(3) # Same as x
400+
ds3 = xr.Dataset(attrs={"nested_array": z, "other": 2})
401+
402+
actual = xr.merge([ds1, ds3], combine_attrs="drop_conflicts")
403+
assert (
404+
"nested_array" not in actual.attrs
405+
) # Dropped due to ValueError in comparison
406+
assert actual.attrs["other"] == 2
407+
408+
def test_merge_attrs_drop_conflicts_dataset_attrs(self):
409+
"""Test drop_conflicts with xarray.Dataset objects as attributes."""
410+
# xarray.Dataset objects as attributes (raises TypeError in equivalent)
411+
attr_ds1 = xr.Dataset({"foo": 1})
412+
attr_ds2 = xr.Dataset({"bar": 1}) # Different dataset
413+
attr_ds3 = xr.Dataset({"foo": 1}) # Same as attr_ds1
414+
415+
ds4 = xr.Dataset(attrs={"dataset_attr": attr_ds1, "scalar": 42})
416+
ds5 = xr.Dataset(attrs={"dataset_attr": attr_ds2, "scalar": 42})
417+
ds6 = xr.Dataset(attrs={"dataset_attr": attr_ds3, "other": 99})
418+
419+
# Different datasets raise TypeError and should be dropped
420+
actual = xr.merge([ds4, ds5], combine_attrs="drop_conflicts")
421+
assert "dataset_attr" not in actual.attrs # Dropped due to TypeError
422+
assert actual.attrs["scalar"] == 42
423+
424+
# Identical datasets are also dropped (comparison returns Dataset, not bool)
425+
actual = xr.merge([ds4, ds6], combine_attrs="drop_conflicts")
426+
assert "dataset_attr" not in actual.attrs # Dropped - returns Dataset, not bool
427+
assert actual.attrs["other"] == 99
428+
429+
def test_merge_attrs_drop_conflicts_pandas_series(self):
430+
"""Test drop_conflicts with Pandas Series as attributes."""
431+
# Pandas Series (raises ValueError due to ambiguous truth value)
432+
series1 = pd.Series([1, 2])
433+
series2 = pd.Series([3, 4]) # Different values
434+
series3 = pd.Series([1, 2]) # Same as series1
435+
436+
ds7 = xr.Dataset(attrs={"series": series1, "value": "a"})
437+
ds8 = xr.Dataset(attrs={"series": series2, "value": "a"})
438+
ds9 = xr.Dataset(attrs={"series": series3, "value": "a"})
439+
440+
# Suppress potential warnings from pandas comparisons
441+
with warnings.catch_warnings():
442+
warnings.filterwarnings("ignore", category=DeprecationWarning)
443+
warnings.filterwarnings("ignore", category=FutureWarning)
444+
445+
# Different series raise ValueError and get dropped
446+
actual = xr.merge([ds7, ds8], combine_attrs="drop_conflicts")
447+
assert "series" not in actual.attrs # Dropped due to ValueError
448+
assert actual.attrs["value"] == "a"
449+
450+
# Even identical series raise ValueError in equivalent() and get dropped
451+
# because Series comparison returns another Series with ambiguous truth value
452+
actual = xr.merge([ds7, ds9], combine_attrs="drop_conflicts")
453+
assert "series" not in actual.attrs # Dropped due to ValueError
454+
assert actual.attrs["value"] == "a"
455+
456+
def test_merge_attrs_drop_conflicts_eq_returns_string(self):
457+
"""Test objects whose __eq__ returns strings are dropped."""
458+
459+
# Case 1: Objects whose __eq__ returns non-boolean strings
460+
class ReturnsString:
461+
def __init__(self, value):
462+
self.value = value
463+
464+
def __eq__(self, other):
465+
# Always returns a string (non-boolean)
466+
return "comparison result"
467+
468+
obj1 = ReturnsString("A")
469+
obj2 = ReturnsString("B") # Different object
470+
471+
ds1 = xr.Dataset(attrs={"obj": obj1})
472+
ds2 = xr.Dataset(attrs={"obj": obj2})
473+
474+
actual = xr.merge([ds1, ds2], combine_attrs="drop_conflicts")
475+
476+
# Strict behavior: drops attribute because __eq__ returns non-boolean
477+
assert "obj" not in actual.attrs
478+
479+
def test_merge_attrs_drop_conflicts_eq_returns_number(self):
480+
"""Test objects whose __eq__ returns numbers are dropped."""
481+
482+
# Case 2: Objects whose __eq__ returns numbers
483+
class ReturnsZero:
484+
def __init__(self, value):
485+
self.value = value
486+
487+
def __eq__(self, other):
488+
# Always returns 0 (non-boolean)
489+
return 0
490+
491+
obj3 = ReturnsZero("same")
492+
obj4 = ReturnsZero("same") # Different object, same value
493+
494+
ds3 = xr.Dataset(attrs={"zero": obj3})
495+
ds4 = xr.Dataset(attrs={"zero": obj4})
496+
497+
actual = xr.merge([ds3, ds4], combine_attrs="drop_conflicts")
498+
499+
# Strict behavior: drops attribute because __eq__ returns non-boolean
500+
assert "zero" not in actual.attrs
501+
238502
def test_merge_attrs_no_conflicts_compat_minimal(self):
239503
"""make sure compat="minimal" does not silence errors"""
240504
ds1 = xr.Dataset({"a": ("x", [], {"a": 0})})

0 commit comments

Comments
 (0)