Skip to content

Conversation

@rhshadrach
Copy link
Member

@rhshadrach rhshadrach added Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 19, 2025
@jbrockmendel
Copy link
Member

I'm fine with this, but given the discussion on the issue would also be OK with doing this as a deprecation if that's viable.

@rhshadrach
Copy link
Member Author

rhshadrach commented Oct 19, 2025

I'm on board with deprecating instead. Two things:

  • I think we should make it so that specifying sort=False does not sort when the deprecation is introduced. Only unspecified sort will still sort by default for DatetimeIndex.
  • In order to make this less noisy, it seems beneficial to determine if concat-without-sorting would give a different result and only emit the warning when it does. I'm using union_indexes, not sure yet if that will be correct in all cases. Trying to determine this deeper in the code seems hard. Regardless this will be somewhat of an expensive op. Before I sink more time, I'd like to make sure there is agreement that this is worth the computation.

@rhshadrach rhshadrach marked this pull request as draft October 19, 2025 19:03
@jbrockmendel
Copy link
Member

Yah, IIRC deprecating that behavior was tricky because there was also a non-concat codepath that led to that particular unwanted sort = True line.

If anyone else wants to merge this as is I won't complain. Otherwise we can bring it up at the dev call on Wednesday.

@jbrockmendel
Copy link
Member

Discussed on today’s dev call. Consensus was “deprecation might be nice if not too difficult but I don’t feel strongly about it and am willing to defer to Richard”

@rhshadrach
Copy link
Member Author

Are we good with warning anytime the non-concat index is datetime and the sort argument is not specified? This would be my preferred approach, contrary to what I mentioned above. It seems difficult and computationally expensive to not warn anytime the behavior wouldn't change because the result would be naturally sorted.

@jbrockmendel
Copy link
Member

you want that warning in place forever, not as a deprecation? im not wild about that

@rhshadrach
Copy link
Member Author

No, just to deprecate.

@jbrockmendel
Copy link
Member

Oh ok. Then its a tradeoff between potentially false-positive warnings and difficult/costly is_monotonic_increasing calls?

@rhshadrach
Copy link
Member Author

[Demonstrating here with integer index; same applies to datetimes]

Yes, and note that monotone increasing can be sufficient but not necessary:

ser1 = pd.Series([1, 2], index=[1, 2])
ser2 = pd.Series([3, 4], index=[2, 1])
print(ser2.index.is_monotonic_increasing)
# False
print(pd.concat([ser1, ser2], sort=False, axis=1))
#    0  1
# 1  1  4
# 2  2  3

So there are still false positives. One also needs to check that the tail element of each is less than the head of the next, so monotonic increasing alone is not sufficient.

ser1 = pd.Series([1, 2], index=[1, 3])
ser2 = pd.Series([3, 4], index=[2, 4])
print(ser1.index.is_monotonic_increasing)
# True
print(ser2.index.is_monotonic_increasing)
# True
print(pd.concat([ser1, ser2], sort=False, axis=1))
#      0    1
# 1  1.0  NaN
# 3  2.0  NaN
# 2  NaN  3.0
# 4  NaN  4.0

@jbrockmendel
Copy link
Member

Ok. Per the call, I’m happy to defer to you on this. Let me know when ready for review

@rhshadrach
Copy link
Member Author

rhshadrach commented Oct 25, 2025

In the commits Deprecate with union_indexes and More perfomrant, more warnings, there are two versions of this deprecation. One where we use union_indexes to check if sort will have an impact on the result, and one where we avoid this computation but still check two common cases. I'm 50/50 on which one to go with.

In either case, we need to handle the 60ish places where we call concat internally. I spent a while trying to produce bugs due to the sorting behavior of datetimes on main from internal uses of concat, haven't found any yet. It needs to be a situation where the indices are not equal, and many places we align prior to calling concat. Do we just pass sort=False everywhere internally? Should this be noted in the whatsnew? E.g.

There are many places where pandas internally uses concat and this behavior change can appear. While we have not been able to find any behavior differences, users with data that have an unsorted DatetimeIndex may end up with results that remain unsorted whereas in prior versions of pandas the result was sorted.

Admittedly not great.

Edit: Here is an example of what I think is a bug.

idx = date_range("1/1/2000", periods=3, freq="h")
df = DataFrame(
    {"a": [1, 2, 3], "b": [True, True, False]},
    index=[idx[2], idx[0], idx[1]],
)
print(df)
print(df.groupby("b")[["a"]].shift([0, 1], freq="h"))
#                      a      b
# 2000-01-01 02:00:00  1   True
# 2000-01-01 00:00:00  2   True
# 2000-01-01 01:00:00  3  False
#                      a_0  a_1
# 2000-01-01 00:00:00  2.0  NaN
# 2000-01-01 01:00:00  3.0  2.0
# 2000-01-01 02:00:00  1.0  3.0
# 2000-01-01 03:00:00  NaN  1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DEPR: pd.concat special cases DatetimeIndex to sort even when sort=False

2 participants