Skip to content

Conversation

arnaudgallou
Copy link
Contributor

@arnaudgallou arnaudgallou commented Sep 25, 2025

Fixes #592

Let me know if you want me to add short comments explaining each regex pattern.

I'm not quite sure how you'd like to handle acronyms in str_to_camel(). I'm more inclined to treat acronyms like any other words (e.g. userId; current behavior), but I know some people prefer preserving uppercase in two-letter acronyms (e.g. userID).

Some tests might be redundant.

@arnaudgallou
Copy link
Contributor Author

I read a bit more on the differences between POSIX pre-built classes and unicode regex. The main difference is that POSIX classes are locale-dependent when unicode regex are not (see https://www.regular-expressions.info/posixbrackets.html).

I wasn't able to test it but POSIX classes may be able to handle the ij digraph properly if the locale is set to Dutch.

@arnaudgallou
Copy link
Contributor Author

Also, how would you like to handle superscript and subscript letters and numbers? Should we drop them, or normalize them using stringi::stri_trans_nfkd()? Normalizing the input string could be useful for handling certain units like , which also raises the question of how to treat digits.

Currently, digits are always separated from letters. Another option would be to keep digits attached when they directly follow a letter, but use the dash or underscore separator otherwise. The downside of this approach is that it makes the conversion from snake/kebab case to camel case non-reversible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider using unicode regex in str_to_snake() and friends
1 participant