-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
CLN: Enforce deprecation of not validating na argument to string methods #62399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
d593ffa
1d262ab
1e8555d
17c4112
b29fc23
3e81813
e95b333
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -22,6 +22,7 @@ | |||||
is_bool, | ||||||
is_integer, | ||||||
) | ||||||
from pandas.core.dtypes.missing import isna | ||||||
|
||||||
BoolishT = TypeVar("BoolishT", bool, int) | ||||||
BoolishNoneT = TypeVar("BoolishNoneT", bool, int, None) | ||||||
|
@@ -269,6 +270,37 @@ def validate_bool_kwarg( | |||||
return value | ||||||
|
||||||
|
||||||
def validate_na_arg( | ||||||
value, name: str, allow_no_default: bool = False, allow_bool: bool = False | ||||||
): | ||||||
""" | ||||||
Validate na arguments. | ||||||
|
||||||
Parameters | ||||||
---------- | ||||||
value : object | ||||||
Value to validate. | ||||||
name : str | ||||||
Name of the argument, used to raise an informative error message. | ||||||
allow_no_default : bool, default False | ||||||
Whether to allow ``value`` to be ``lib.no_default``. | ||||||
allow_bool : bool, default False | ||||||
Whether to allow ``value`` to be an instance of bool. | ||||||
|
||||||
Raises | ||||||
______ | ||||||
ValueError | ||||||
When ``value`` is determined to be invalid. | ||||||
""" | ||||||
if allow_no_default and value is lib.no_default: | ||||||
return | ||||||
if allow_bool and isinstance(value, bool): | ||||||
return | ||||||
if isna(value): | ||||||
return | ||||||
raise ValueError(f"{name} must be a valid NA value; got {value}") | ||||||
|
raise ValueError(f"{name} must be a valid NA value; got {value}") | |
raise ValueError(f"{name} must be a bool (True/False) or a valid NA value; got {value}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I don't know if there is a case for which you are foreseeing allow_no_default
and allow_bool
to be used, but if they would actually be used, then the error message would also have to be updated to follow that.
(so for now could also simplify things leaving out the keywords?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you feel about ...must be a valid value; got {value}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think it is useful to give an indication about what a valid value is (or say "a valid value for dtype ..", so then if it is bool, then it is clearer that it should be True/False or a missing value (for nullable bool)?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then it is clearer that it should be True/False or a missing value (for nullable bool?)
Any one of True/False/None/np.nan/pd.NA works with any dtype as far as I've seen. If there is an issue here, I think we should open a new issue and tackle in another PR.
for dtype in ["string", "str", "object"]:
for na in [True, False, None, np.nan, pd.NA]:
ser = pd.Series([np.nan], dtype=dtype)
print(f"dtype={ser.dtype}, {na=}, result={ser.str.startswith('b', na=na).iloc[0]}")
# dtype=string, na=True, result=True
# dtype=string, na=False, result=False
# dtype=string, na=None, result=<NA>
# dtype=string, na=nan, result=<NA>
# dtype=string, na=<NA>, result=<NA>
# dtype=str, na=True, result=True
# dtype=str, na=False, result=False
# dtype=str, na=None, result=False
# dtype=str, na=nan, result=False
# dtype=str, na=<NA>, result=False
# dtype=object, na=True, result=True
# dtype=object, na=False, result=False
# dtype=object, na=None, result=None
# dtype=object, na=nan, result=nan
# dtype=object, na=<NA>, result=<NA>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche - are you good here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, looks good!
Uh oh!
There was an error while loading. Please reload this page.