Skip to content

Commit cc3e221

Browse files
committed
Merge branch 'main' into bug-bdate_range-with-cbh-fails
2 parents a4d86bd + ce3298b commit cc3e221

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+985
-509
lines changed

.github/workflows/docbuild-and-upload.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ jobs:
9393
run: mv doc/build/html web/build/docs
9494

9595
- name: Save website as an artifact
96-
uses: actions/upload-artifact@v4
96+
uses: actions/upload-artifact@v5
9797
with:
9898
name: website
9999
path: web/build

.github/workflows/wheels.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ jobs:
6464
python -m pip install build
6565
python -m build --sdist
6666
67-
- uses: actions/upload-artifact@v4
67+
- uses: actions/upload-artifact@v5
6868
with:
6969
name: sdist
7070
path: ./dist/*
@@ -138,7 +138,7 @@ jobs:
138138
# removes unnecessary files from the release
139139
- name: Download sdist (not macOS)
140140
#if: ${{ matrix.buildplat[1] != 'macosx_*' }}
141-
uses: actions/download-artifact@v5
141+
uses: actions/download-artifact@v6
142142
with:
143143
name: sdist
144144
path: ./dist
@@ -196,7 +196,7 @@ jobs:
196196
shell: bash -el {0}
197197
run: for whl in $(ls wheelhouse); do wheel unpack wheelhouse/$whl -d /tmp; done
198198

199-
- uses: actions/upload-artifact@v4
199+
- uses: actions/upload-artifact@v5
200200
with:
201201
name: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}
202202
path: ./wheelhouse/*.whl
@@ -238,11 +238,11 @@ jobs:
238238

239239
steps:
240240
- name: Download all artefacts
241-
uses: actions/download-artifact@v5
241+
uses: actions/download-artifact@v6
242242
with:
243243
path: dist # everything lands in ./dist/**
244244

245-
# TODO: This step can be probably be achieved by actions/download-artifact@v5
245+
# TODO: This step can be probably be achieved by actions/download-artifact@v6
246246
# by specifying merge-multiple: true, and a glob pattern
247247
- name: Collect files
248248
run: |

doc/source/reference/aliases.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ Alias Meaning
6565
:py:type:`NaPosition` Argument type for ``na_position`` in :meth:`sort_index` and :meth:`sort_values`
6666
:py:type:`NsmallestNlargestKeep` Argument type for ``keep`` in :meth:`nlargest` and :meth:`nsmallest`
6767
:py:type:`OpenFileErrors` Argument type for ``errors`` in :meth:`to_hdf` and :meth:`to_csv`
68-
:py:type:`Ordered` Return type for :py:attr:`ordered`` in :class:`CategoricalDtype` and :class:`Categorical`
68+
:py:type:`Ordered` Return type for :py:attr:`ordered` in :class:`CategoricalDtype` and :class:`Categorical`
6969
:py:type:`ParquetCompressionOptions` Argument type for ``compression`` in :meth:`DataFrame.to_parquet`
7070
:py:type:`QuantileInterpolation` Argument type for ``interpolation`` in :meth:`quantile`
7171
:py:type:`ReadBuffer` Additional argument type corresponding to buffers for various file reading methods
@@ -89,7 +89,7 @@ Alias Meaning
8989
:py:type:`ToTimestampHow` Argument type for ``how`` in :meth:`to_timestamp` and ``convention`` in :meth:`resample`
9090
:py:type:`UpdateJoin` Argument type for ``join`` in :meth:`DataFrame.update`
9191
:py:type:`UsecolsArgType` Argument type for ``usecols`` in :meth:`pandas.read_clipboard`, :meth:`pandas.read_csv` and :meth:`pandas.read_excel`
92-
:py:type:`WindowingRankType` Argument type for ``method`` in :meth:`rank`` in rolling and expanding window operations
92+
:py:type:`WindowingRankType` Argument type for ``method`` in :meth:`rank` in rolling and expanding window operations
9393
:py:type:`WriteBuffer` Additional argument type corresponding to buffers for various file output methods
9494
:py:type:`WriteExcelBuffer` Additional argument type corresponding to buffers for :meth:`to_excel`
9595
:py:type:`XMLParsers` Argument type for ``parser`` in :meth:`DataFrame.to_xml` and :meth:`pandas.read_xml`

doc/source/user_guide/io.rst

Lines changed: 1 addition & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -2366,52 +2366,7 @@ Read a URL with no options:
23662366

23672367
The data from the above URL changes every Monday so the resulting data above may be slightly different.
23682368

2369-
Read a URL while passing headers alongside the HTTP request:
2370-
2371-
.. code-block:: ipython
2372-
2373-
In [322]: url = 'https://www.sump.org/notes/request/' # HTTP request reflector
2374-
2375-
In [323]: pd.read_html(url)
2376-
Out[323]:
2377-
[ 0 1
2378-
0 Remote Socket: 51.15.105.256:51760
2379-
1 Protocol Version: HTTP/1.1
2380-
2 Request Method: GET
2381-
3 Request URI: /notes/request/
2382-
4 Request Query: NaN,
2383-
0 Accept-Encoding: identity
2384-
1 Host: www.sump.org
2385-
2 User-Agent: Python-urllib/3.8
2386-
3 Connection: close]
2387-
2388-
In [324]: headers = {
2389-
.....: 'User-Agent':'Mozilla Firefox v14.0',
2390-
.....: 'Accept':'application/json',
2391-
.....: 'Connection':'keep-alive',
2392-
.....: 'Auth':'Bearer 2*/f3+fe68df*4'
2393-
.....: }
2394-
2395-
In [325]: pd.read_html(url, storage_options=headers)
2396-
Out[325]:
2397-
[ 0 1
2398-
0 Remote Socket: 51.15.105.256:51760
2399-
1 Protocol Version: HTTP/1.1
2400-
2 Request Method: GET
2401-
3 Request URI: /notes/request/
2402-
4 Request Query: NaN,
2403-
0 User-Agent: Mozilla Firefox v14.0
2404-
1 AcceptEncoding: gzip, deflate, br
2405-
2 Accept: application/json
2406-
3 Connection: keep-alive
2407-
4 Auth: Bearer 2*/f3+fe68df*4]
2408-
2409-
.. note::
2410-
2411-
We see above that the headers we passed are reflected in the HTTP request.
2412-
2413-
Read in the content of the file from the above URL and pass it to ``read_html``
2414-
as a string:
2369+
Read in HTML content from a file using ``read_html``:
24152370

24162371
.. ipython:: python
24172372

doc/source/whatsnew/v3.0.0.rst

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,7 @@ Other enhancements
201201
- :class:`Rolling` and :class:`Expanding` now support ``nunique`` (:issue:`26958`)
202202
- :class:`Rolling` and :class:`Expanding` now support aggregations ``first`` and ``last`` (:issue:`33155`)
203203
- :func:`read_parquet` accepts ``to_pandas_kwargs`` which are forwarded to :meth:`pyarrow.Table.to_pandas` which enables passing additional keywords to customize the conversion to pandas, such as ``maps_as_pydicts`` to read the Parquet map data type as python dictionaries (:issue:`56842`)
204+
- :func:`to_numeric` on big integers converts to ``object`` datatype with python integers when not coercing. (:issue:`51295`)
204205
- :meth:`.DataFrameGroupBy.transform`, :meth:`.SeriesGroupBy.transform`, :meth:`.DataFrameGroupBy.agg`, :meth:`.SeriesGroupBy.agg`, :meth:`.SeriesGroupBy.apply`, :meth:`.DataFrameGroupBy.apply` now support ``kurt`` (:issue:`40139`)
205206
- :meth:`DataFrame.apply` supports using third-party execution engines like the Bodo.ai JIT compiler (:issue:`60668`)
206207
- :meth:`DataFrame.iloc` and :meth:`Series.iloc` now support boolean masks in ``__getitem__`` for more consistent indexing behavior (:issue:`60994`)
@@ -737,6 +738,7 @@ Other Deprecations
737738
- Deprecated backward-compatibility behavior for :meth:`DataFrame.select_dtypes` matching "str" dtype when ``np.object_`` is specified (:issue:`61916`)
738739
- Deprecated option "future.no_silent_downcasting", as it is no longer used. In a future version accessing this option will raise (:issue:`59502`)
739740
- Deprecated slicing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` using a ``datetime.date`` object, explicitly cast to :class:`Timestamp` instead (:issue:`35830`)
741+
- Deprecated the 'inplace' keyword from :meth:`Resampler.interpolate`, as passing ``True`` raises ``AttributeError`` (:issue:`58690`)
740742

741743
.. ---------------------------------------------------------------------------
742744
.. _whatsnew_300.prior_deprecations:
@@ -981,6 +983,7 @@ Datetimelike
981983
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` does not raise on Custom business days frequencies bigger then "1C" (:issue:`58664`)
982984
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` returning ``False`` on double-digit frequencies (:issue:`58523`)
983985
- Bug in :meth:`DatetimeIndex.union` and :meth:`DatetimeIndex.intersection` when ``unit`` was non-nanosecond (:issue:`59036`)
986+
- Bug in :meth:`DatetimeIndex.where` and :meth:`TimedeltaIndex.where` failing to set ``freq=None`` in some cases (:issue:`24555`)
984987
- Bug in :meth:`Index.union` with a ``pyarrow`` timestamp dtype incorrectly returning ``object`` dtype (:issue:`58421`)
985988
- Bug in :meth:`Series.dt.microsecond` producing incorrect results for pyarrow backed :class:`Series`. (:issue:`59154`)
986989
- Bug in :meth:`Timestamp.normalize` and :meth:`DatetimeArray.normalize` returning incorrect results instead of raising on integer overflow for very small (distant past) values (:issue:`60583`)
@@ -997,7 +1000,6 @@ Datetimelike
9971000
- Bug in constructing arrays with a timezone-aware :class:`ArrowDtype` from timezone-naive datetime objects incorrectly treating those as UTC times instead of wall times like :class:`DatetimeTZDtype` (:issue:`61775`)
9981001
- Bug in setting scalar values with mismatched resolution into arrays with non-nanosecond ``datetime64``, ``timedelta64`` or :class:`DatetimeTZDtype` incorrectly truncating those scalars (:issue:`56410`)
9991002

1000-
10011003
Timedelta
10021004
^^^^^^^^^
10031005
- Accuracy improvement in :meth:`Timedelta.to_pytimedelta` to round microseconds consistently for large nanosecond based Timedelta (:issue:`57841`)
@@ -1036,6 +1038,7 @@ Conversion
10361038

10371039
Strings
10381040
^^^^^^^
1041+
- Bug in :meth:`Series.str.replace` raising an error on valid group references (``\1``, ``\2``, etc.) on series converted to PyArrow backend dtype (:issue:`62653`)
10391042
- Bug in :meth:`Series.str.zfill` raising ``AttributeError`` for :class:`ArrowDtype` (:issue:`61485`)
10401043
- Bug in :meth:`Series.value_counts` would not respect ``sort=False`` for series having ``string`` dtype (:issue:`55224`)
10411044
- Bug in multiplication with a :class:`StringDtype` incorrectly allowing multiplying by bools; explicitly cast to integers instead (:issue:`62595`)
@@ -1112,7 +1115,7 @@ I/O
11121115
- Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)
11131116
- Bug in :meth:`read_csv` raising ``TypeError`` when ``nrows`` and ``iterator`` are specified without specifying a ``chunksize``. (:issue:`59079`)
11141117
- Bug in :meth:`read_csv` where the order of the ``na_values`` makes an inconsistency when ``na_values`` is a list non-string values. (:issue:`59303`)
1115-
- Bug in :meth:`read_csv` with ``engine="c"`` reading big integers as strings. Now reads them as python integers. (:issue:`51295`)
1118+
- Bug in :meth:`read_csv` with ``c`` and ``python`` engines reading big integers as strings. Now reads them as python integers. (:issue:`51295`)
11161119
- Bug in :meth:`read_csv` with ``engine="c"`` reading large float numbers with preceding integers as strings. Now reads them as floats. (:issue:`51295`)
11171120
- Bug in :meth:`read_csv` with ``engine="pyarrow"`` and ``dtype="Int64"`` losing precision (:issue:`56136`)
11181121
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype="boolean"``. (:issue:`58159`)
@@ -1152,6 +1155,7 @@ Groupby/resample/rolling
11521155
- Bug in :meth:`.DataFrameGroupBy.groups` and :meth:`.SeriesGroupby.groups` that would not respect groupby argument ``dropna`` (:issue:`55919`)
11531156
- Bug in :meth:`.DataFrameGroupBy.median` where nat values gave an incorrect result. (:issue:`57926`)
11541157
- Bug in :meth:`.DataFrameGroupBy.quantile` when ``interpolation="nearest"`` is inconsistent with :meth:`DataFrame.quantile` (:issue:`47942`)
1158+
- Bug in :meth:`.DataFrameGroupBy` reductions where non-Boolean values were allowed for the ``numeric_only`` argument; passing a non-Boolean value will now raise (:issue:`62778`)
11551159
- Bug in :meth:`.Resampler.interpolate` on a :class:`DataFrame` with non-uniform sampling and/or indices not aligning with the resulting resampled index would result in wrong interpolation (:issue:`21351`)
11561160
- Bug in :meth:`.Series.rolling` when used with a :class:`.BaseIndexer` subclass and computing min/max (:issue:`46726`)
11571161
- Bug in :meth:`DataFrame.ewm` and :meth:`Series.ewm` when passed ``times`` and aggregation functions other than mean (:issue:`51695`)
@@ -1173,6 +1177,7 @@ Groupby/resample/rolling
11731177

11741178
Reshaping
11751179
^^^^^^^^^
1180+
- Bug in :func:`concat` with mixed integer and bool dtypes incorrectly casting the bools to integers (:issue:`45101`)
11761181
- Bug in :func:`qcut` where values at the quantile boundaries could be incorrectly assigned (:issue:`59355`)
11771182
- Bug in :meth:`DataFrame.combine_first` not preserving the column order (:issue:`60427`)
11781183
- Bug in :meth:`DataFrame.explode` producing incorrect result for :class:`pyarrow.large_list` type (:issue:`61091`)
@@ -1208,6 +1213,7 @@ ExtensionArray
12081213
- Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`)
12091214
- Bug in constructing pandas data structures when passing into ``dtype`` a string of the type followed by ``[pyarrow]`` while PyArrow is not installed would raise ``NameError`` rather than ``ImportError`` (:issue:`57928`)
12101215
- Bug in various :class:`DataFrame` reductions for pyarrow temporal dtypes returning incorrect dtype when result was null (:issue:`59234`)
1216+
- Fixed flex arithmetic with :class:`ExtensionArray` operands raising when ``fill_value`` was passed. (:issue:`62467`)
12111217

12121218
Styler
12131219
^^^^^^

pandas/_libs/arrays.pyx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,10 @@ cdef class NDArrayBacked:
100100
if len(state) == 1 and isinstance(state[0], dict):
101101
self.__setstate__(state[0])
102102
return
103+
elif len(state) == 2:
104+
# GH#62820: Handle missing attrs dict during auto-unpickling
105+
self.__setstate__((*state, {}))
106+
return
103107
raise NotImplementedError(state) # pragma: no cover
104108

105109
data, dtype = state[:2]

pandas/_libs/lib.pyx

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1386,6 +1386,7 @@ cdef class Seen:
13861386
bint nan_ # seen_np.nan
13871387
bint uint_ # seen_uint (unsigned integer)
13881388
bint sint_ # seen_sint (signed integer)
1389+
bint overflow_ # seen_overflow
13891390
bint float_ # seen_float
13901391
bint object_ # seen_object
13911392
bint complex_ # seen_complex
@@ -1414,6 +1415,7 @@ cdef class Seen:
14141415
self.nan_ = False
14151416
self.uint_ = False
14161417
self.sint_ = False
1418+
self.overflow_ = False
14171419
self.float_ = False
14181420
self.object_ = False
14191421
self.complex_ = False
@@ -2379,6 +2381,9 @@ def maybe_convert_numeric(
23792381
ndarray[uint64_t, ndim=1] uints = cnp.PyArray_EMPTY(
23802382
1, values.shape, cnp.NPY_UINT64, 0
23812383
)
2384+
ndarray[object, ndim=1] pyints = cnp.PyArray_EMPTY(
2385+
1, values.shape, cnp.NPY_OBJECT, 0
2386+
)
23822387
ndarray[uint8_t, ndim=1] bools = cnp.PyArray_EMPTY(
23832388
1, values.shape, cnp.NPY_UINT8, 0
23842389
)
@@ -2421,18 +2426,24 @@ def maybe_convert_numeric(
24212426

24222427
val = int(val)
24232428
seen.saw_int(val)
2429+
pyints[i] = val
24242430

24252431
if val >= 0:
24262432
if val <= oUINT64_MAX:
24272433
uints[i] = val
2428-
else:
2434+
elif seen.coerce_numeric:
24292435
seen.float_ = True
2436+
else:
2437+
seen.overflow_ = True
24302438

24312439
if oINT64_MIN <= val <= oINT64_MAX:
24322440
ints[i] = val
24332441

24342442
if val < oINT64_MIN or (seen.sint_ and seen.uint_):
2435-
seen.float_ = True
2443+
if seen.coerce_numeric:
2444+
seen.float_ = True
2445+
else:
2446+
seen.overflow_ = True
24362447

24372448
elif util.is_bool_object(val):
24382449
floats[i] = uints[i] = ints[i] = bools[i] = val
@@ -2476,6 +2487,7 @@ def maybe_convert_numeric(
24762487

24772488
if maybe_int:
24782489
as_int = int(val)
2490+
pyints[i] = as_int
24792491

24802492
if as_int in na_values:
24812493
mask[i] = 1
@@ -2490,7 +2502,7 @@ def maybe_convert_numeric(
24902502
if seen.coerce_numeric:
24912503
seen.float_ = True
24922504
else:
2493-
raise ValueError("Integer out of range.")
2505+
seen.overflow_ = True
24942506
else:
24952507
if as_int >= 0:
24962508
uints[i] = as_int
@@ -2529,11 +2541,15 @@ def maybe_convert_numeric(
25292541
return (floats, None)
25302542
elif seen.int_:
25312543
if seen.null_ and convert_to_masked_nullable:
2532-
if seen.uint_:
2544+
if seen.overflow_:
2545+
return (pyints, mask.view(np.bool_))
2546+
elif seen.uint_:
25332547
return (uints, mask.view(np.bool_))
25342548
else:
25352549
return (ints, mask.view(np.bool_))
2536-
if seen.uint_:
2550+
if seen.overflow_:
2551+
return (pyints, None)
2552+
elif seen.uint_:
25372553
return (uints, None)
25382554
else:
25392555
return (ints, None)

pandas/_libs/tslibs/timedeltas.pyx

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2026,6 +2026,19 @@ class Timedelta(_Timedelta):
20262026
"milliseconds, microseconds, nanoseconds]"
20272027
)
20282028

2029+
if (
2030+
unit is not None
2031+
and not (is_float_object(value) or is_integer_object(value))
2032+
):
2033+
# GH#53198
2034+
warnings.warn(
2035+
"The 'unit' keyword is only used when the Timedelta input is "
2036+
f"an integer or float, not {type(value).__name__}. "
2037+
"To specify the storage unit of the output use `td.as_unit(unit)`",
2038+
UserWarning,
2039+
stacklevel=find_stack_level(),
2040+
)
2041+
20292042
if value is _no_input:
20302043
if not len(kwargs):
20312044
raise ValueError("cannot construct a Timedelta without a "

pandas/_libs/tslibs/timestamps.pyx

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ from pandas._libs.tslibs.dtypes cimport (
6767
)
6868
from pandas._libs.tslibs.util cimport (
6969
is_array,
70+
is_float_object,
7071
is_integer_object,
7172
)
7273

@@ -2654,6 +2655,19 @@ class Timestamp(_Timestamp):
26542655
if hasattr(ts_input, "fold"):
26552656
ts_input = ts_input.replace(fold=fold)
26562657
2658+
if (
2659+
unit is not None
2660+
and not (is_float_object(ts_input) or is_integer_object(ts_input))
2661+
):
2662+
# GH#53198
2663+
warnings.warn(
2664+
"The 'unit' keyword is only used when the Timestamp input is "
2665+
f"an integer or float, not {type(ts_input).__name__}. "
2666+
"To specify the storage unit of the output use `ts.as_unit(unit)`",
2667+
UserWarning,
2668+
stacklevel=find_stack_level(),
2669+
)
2670+
26572671
# GH 30543 if pd.Timestamp already passed, return it
26582672
# check that only ts_input is passed
26592673
# checking verbosely, because cython doesn't optimize

pandas/conftest.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1447,6 +1447,9 @@ def any_string_dtype(request):
14471447
return pd.StringDtype(storage, na_value)
14481448

14491449

1450+
any_string_dtype2 = any_string_dtype
1451+
1452+
14501453
@pytest.fixture(params=tm.DATETIME64_DTYPES)
14511454
def datetime64_dtype(request):
14521455
"""

0 commit comments

Comments
 (0)