Expose PyUnicode_ToLower and friends

Hey folks! 👋 

This is about python/cpython#76535 (Unclear intention of deprecating `Py_UNICODE_TOLOWER` / `Py_UNICODE_TOUPPER`). As far as I understand, these were deprecated in the versions 3.3 - 3.12, but were undeprecated in 3.13. However, they're still pretty unusable, because of the fact that changing the case of a Unicode character might lead to two or more characters.

There's alternative APIs for this that do the right thing, `_PyUnicode_ToLowerFull` and friends. I'm suggesting to rename these to `PyUnicode_ToLower` etc. and expose them as public API. An implementation can be found on python/cpython#136176.

**Some context**: I'm working on making NumPy string array operations faster. Up until now, these were working by converting the array item (a UCS4 buffer) to a `PyObject`, calling the corresponding `str` method and then converting the result back to a UCS4 buffer to put back into the array. Instead of doing this, we're writing fast string ufuncs that operate on the buffer without having to do the back-and-forth with `PyObject`s. We've been able to do this with almost all of the string operations by traversing the array, then traversing the UCS4 buffer and doing the operation codepoint by codepoint. The only ones missing right now are `lower` and friends, because we don't have a good way of doing that same thing.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose PyUnicode_ToLower and friends #71

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expose PyUnicode_ToLower and friends #71

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions