Skip to content

Commit 3ac64a7

Browse files
[backport 2.3.x] BUG: Fix Series.str.contains with compiled regex on Arrow string dtype (#61946) (#62116)
Co-authored-by: Aniket <148300120+Aniketsy@users.noreply.github.com>
1 parent 1f2dc4f commit 3ac64a7

File tree

3 files changed

+17
-2
lines changed

3 files changed

+17
-2
lines changed

doc/source/whatsnew/v2.3.2.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@ Bug fixes
2525
- Fix :meth:`~DataFrame.to_json` with ``orient="table"`` to correctly use the
2626
"string" type in the JSON Table Schema for :class:`StringDtype` columns
2727
(:issue:`61889`)
28-
- Fixed ``~Series.str.match`` and ``~Series.str.fullmatch`` with compiled regex
29-
for the Arrow-backed string dtype (:issue:`61964`)
28+
- Fixed ``~Series.str.match``, ``~Series.str.fullmatch`` and ``~Series.str.contains``
29+
with compiled regex for the Arrow-backed string dtype (:issue:`61964`, :issue:`61942`)
3030

3131
.. ---------------------------------------------------------------------------
3232
.. _whatsnew_232.contributors:

pandas/core/arrays/string_arrow.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -355,6 +355,8 @@ def _str_contains(
355355
):
356356
if flags:
357357
return super()._str_contains(pat, case, flags, na, regex)
358+
if isinstance(pat, re.Pattern):
359+
pat = pat.pattern
358360

359361
return ArrowStringArrayMixin._str_contains(self, pat, case, flags, na, regex)
360362

pandas/tests/strings/test_find_replace.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,19 @@ def test_contains_nan(any_string_dtype):
290290
tm.assert_series_equal(result, expected)
291291

292292

293+
def test_contains_compiled_regex(any_string_dtype):
294+
# GH#61942
295+
ser = Series(["foo", "bar", "baz"], dtype=any_string_dtype)
296+
pat = re.compile("ba.")
297+
result = ser.str.contains(pat)
298+
299+
expected_dtype = (
300+
np.bool_ if is_object_or_nan_string_dtype(any_string_dtype) else "boolean"
301+
)
302+
expected = Series([False, True, True], dtype=expected_dtype)
303+
tm.assert_series_equal(result, expected)
304+
305+
293306
# --------------------------------------------------------------------------------------
294307
# str.startswith
295308
# --------------------------------------------------------------------------------------

0 commit comments

Comments
 (0)