-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
BUG: Do not ignore sort in concat for DatetimeIndex #62752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm fine with this, but given the discussion on the issue would also be OK with doing this as a deprecation if that's viable. |
|
I'm on board with deprecating instead. Two things:
|
|
Yah, IIRC deprecating that behavior was tricky because there was also a non-concat codepath that led to that particular unwanted If anyone else wants to merge this as is I won't complain. Otherwise we can bring it up at the dev call on Wednesday. |
|
Discussed on today’s dev call. Consensus was “deprecation might be nice if not too difficult but I don’t feel strongly about it and am willing to defer to Richard” |
|
Are we good with warning anytime the non-concat index is datetime and the |
|
you want that warning in place forever, not as a deprecation? im not wild about that |
|
No, just to deprecate. |
|
Oh ok. Then its a tradeoff between potentially false-positive warnings and difficult/costly is_monotonic_increasing calls? |
|
[Demonstrating here with integer index; same applies to datetimes] Yes, and note that monotone increasing can be sufficient but not necessary: ser1 = pd.Series([1, 2], index=[1, 2])
ser2 = pd.Series([3, 4], index=[2, 1])
print(ser2.index.is_monotonic_increasing)
# False
print(pd.concat([ser1, ser2], sort=False, axis=1))
# 0 1
# 1 1 4
# 2 2 3So there are still false positives. One also needs to check that the tail element of each is less than the head of the next, so monotonic increasing alone is not sufficient. ser1 = pd.Series([1, 2], index=[1, 3])
ser2 = pd.Series([3, 4], index=[2, 4])
print(ser1.index.is_monotonic_increasing)
# True
print(ser2.index.is_monotonic_increasing)
# True
print(pd.concat([ser1, ser2], sort=False, axis=1))
# 0 1
# 1 1.0 NaN
# 3 2.0 NaN
# 2 NaN 3.0
# 4 NaN 4.0 |
|
Ok. Per the call, I’m happy to defer to you on this. Let me know when ready for review |
|
In the commits In either case, we need to handle the 60ish places where we call concat internally. I spent a while trying to produce bugs due to the sorting behavior of datetimes on main from internal uses of concat, haven't found any yet. It needs to be a situation where the indices are not equal, and many places we align prior to calling concat. Do we just pass
Admittedly not great. Edit: Here is an example of what I think is a bug. idx = date_range("1/1/2000", periods=3, freq="h")
df = DataFrame(
{"a": [1, 2, 3], "b": [True, True, False]},
index=[idx[2], idx[0], idx[1]],
)
print(df)
print(df.groupby("b")[["a"]].shift([0, 1], freq="h"))
# a b
# 2000-01-01 02:00:00 1 True
# 2000-01-01 00:00:00 2 True
# 2000-01-01 01:00:00 3 False
# a_0 a_1
# 2000-01-01 00:00:00 2.0 NaN
# 2000-01-01 01:00:00 3.0 2.0
# 2000-01-01 02:00:00 1.0 3.0
# 2000-01-01 03:00:00 NaN 1.0 |
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.