Hey folks! 👋
This is about python/cpython#76535 (Unclear intention of deprecating Py_UNICODE_TOLOWER / Py_UNICODE_TOUPPER). As far as I understand, these were deprecated in the versions 3.3 - 3.12, but were undeprecated in 3.13. However, they're still pretty unusable, because of the fact that changing the case of a Unicode character might lead to two or more characters.
There's alternative APIs for this that do the right thing, _PyUnicode_ToLowerFull and friends. I'm suggesting to rename these to PyUnicode_ToLower etc. and expose them as public API. An implementation can be found on python/cpython#136176.
Some context: I'm working on making NumPy string array operations faster. Up until now, these were working by converting the array item (a UCS4 buffer) to a PyObject, calling the corresponding str method and then converting the result back to a UCS4 buffer to put back into the array. Instead of doing this, we're writing fast string ufuncs that operate on the buffer without having to do the back-and-forth with PyObjects. We've been able to do this with almost all of the string operations by traversing the array, then traversing the UCS4 buffer and doing the operation codepoint by codepoint. The only ones missing right now are lower and friends, because we don't have a good way of doing that same thing.
Hey folks! 👋
This is about python/cpython#76535 (Unclear intention of deprecating
Py_UNICODE_TOLOWER/Py_UNICODE_TOUPPER). As far as I understand, these were deprecated in the versions 3.3 - 3.12, but were undeprecated in 3.13. However, they're still pretty unusable, because of the fact that changing the case of a Unicode character might lead to two or more characters.There's alternative APIs for this that do the right thing,
_PyUnicode_ToLowerFulland friends. I'm suggesting to rename these toPyUnicode_ToLoweretc. and expose them as public API. An implementation can be found on python/cpython#136176.Some context: I'm working on making NumPy string array operations faster. Up until now, these were working by converting the array item (a UCS4 buffer) to a
PyObject, calling the correspondingstrmethod and then converting the result back to a UCS4 buffer to put back into the array. Instead of doing this, we're writing fast string ufuncs that operate on the buffer without having to do the back-and-forth withPyObjects. We've been able to do this with almost all of the string operations by traversing the array, then traversing the UCS4 buffer and doing the operation codepoint by codepoint. The only ones missing right now arelowerand friends, because we don't have a good way of doing that same thing.