@@ -188,6 +188,14 @@ let pandas do the inference. But if you want to be specific, you can specify the
188188 This is actually compatible with pandas 2.x as well, since in pandas < 3,
189189``dtype="str" `` was essentially treated as an alias for object dtype.
190190
191+ .. attention ::
192+
193+ While using ``dtype="str" `` in constructors is compatible with pandas 2.x,
194+ specifying it as the dtype in :meth: `~Series.astype ` runs into the issue
195+ of also stringifying missing values in pandas 2.x. See the section
196+ :ref: `string_migration_guide-astype_str ` for more details.
197+
198+
191199The missing value sentinel is now always NaN
192200~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
193201
@@ -310,52 +318,69 @@ case.
310318Notable bug fixes
311319~~~~~~~~~~~~~~~~~
312320
321+ .. _string_migration_guide-astype_str :
322+
313323``astype(str) `` preserving missing values
314324^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
315325
316- This is a long standing "bug" or misfeature, as discussed in https://github.com/pandas-dev/pandas/issues/25353.
326+ The stringifying of missing values is a long standing "bug" or misfeature, as
327+ discussed in https://github.com/pandas-dev/pandas/issues/25353, but fixing it
328+ introduces a significant behaviour change.
317329
318- With pandas < 3, when using ``astype(str) `` (using the built-in :func: `str `, not
319- ``astype("str") ``!), the operation would convert every element to a string,
320- including the missing values:
330+ With pandas < 3, when using ``astype(str) `` or ``astype("str") ``, the operation
331+ would convert every element to a string, including the missing values:
321332
322333.. code-block :: python
323334
324335 # OLD behavior in pandas < 3
325- >> > ser = pd.Series([" a " , np.nan], dtype = object )
336+ >> > ser = pd.Series([1.5 , np.nan])
326337 >> > ser
327- 0 a
338+ 0 1.5
328339 1 NaN
329- dtype: object
330- >> > ser.astype(str )
331- 0 a
340+ dtype: float64
341+ >> > ser.astype(" str" )
342+ 0 1.5
332343 1 nan
333344 dtype: object
334- >> > ser.astype(str ).to_numpy()
335- array([' a ' , ' nan' ], dtype = object )
345+ >> > ser.astype(" str" ).to_numpy()
346+ array([' 1.5 ' , ' nan' ], dtype = object )
336347
337348 Note how ``NaN `` (``np.nan ``) was converted to the string ``"nan" ``. This was
338349not the intended behavior, and it was inconsistent with how other dtypes handled
339350missing values.
340351
341- With pandas 3, this behavior has been fixed, and now ``astype(str) `` is an alias
342- for ``astype("str") ``, i.e. casting to the new string dtype, which will preserve
343- the missing values:
352+ With pandas 3, this behavior has been fixed, and now ``astype("str") `` will cast
353+ to the new string dtype, which preserves the missing values:
344354
345355.. code-block :: python
346356
347357 # NEW behavior in pandas 3
348358 >> > pd.options.future.infer_string = True
349- >> > ser = pd.Series([" a " , np.nan], dtype = object )
350- >> > ser.astype(str )
351- 0 a
359+ >> > ser = pd.Series([1.5 , np.nan])
360+ >> > ser.astype(" str" )
361+ 0 1.5
352362 1 NaN
353363 dtype: str
354- >> > ser.astype(str ).values
355- array([' a ' , nan], dtype = object )
364+ >> > ser.astype(" str" ).to_numpy()
365+ array([' 1.5 ' , nan], dtype = object )
356366
357367 If you want to preserve the old behaviour of converting every object to a
358- string, you can use ``ser.map(str) `` instead.
368+ string, you can use ``ser.map(str) `` instead. If you want do such conversion
369+ while preserving the missing values in a way that works with both pandas 2.x and
370+ 3.x, you can use ``ser.map(str, na_action="ignore") `` (for pandas 3.x only, you
371+ can do ``ser.astype("str") ``).
372+
373+ If you want to convert to object or string dtype for pandas 2.x and 3.x,
374+ respectively, without needing to stringify each individual element, you will
375+ have to use a conditional check on the pandas version.
376+ For example, to convert a categorical Series with string categories to its
377+ dense non-categorical version with object or string dtype:
378+
379+ .. code-block :: python
380+
381+ >> > import pandas as pd
382+ >> > ser = pd.Series([" a" , np.nan], dtype = " category" )
383+ >> > ser.astype(object if pd.__version__ < " 3" else " str" )
359384
360385
361386 ``prod() `` raising for string data
0 commit comments