-
Notifications
You must be signed in to change notification settings - Fork 26
Fallback to NFKC Normalization #14
Description
I was attempting to normalize some domain names that were using high ascii characters. I was able to properlly convert them to ASCII using java.net.IDN's toASCII(), but not Junidecode. I kept receiving "java.lang.NullPointerException" errors.
Here are some sample strings that caused errors (I hope this works):
- ℰ𝒳𝒜ℳ𝓟ℒℰ
- 🄴🅇🄰🄼🄿🄻🄴
- ⓔⓧⓐⓜⓟⓛⓔ
I was able to use Junidecode to convert to ASCII7, but the steps required testing if NullPointerException error occurs and then normalize using NFKC and reattempt Junidecode.
Here's the ColdFusion/Java function that I used to normalizeUTF. (Should this be performed on all strings before attempting to convert?)
/* NFKC: UTF Compatibility Decomposition, followed by Canonical Composition */
function normalizeUTF(inputString){
var normalizer = createObject( "java", "java.text.Normalizer" );
var normalizerForm = createObject( "java", "java.text.Normalizer$Form" );
return normalizer.normalize( javaCast( "string", arguments.inputString ), normalizerForm.NFKC );
}
I've also identified that if this approach is used, you won't have to add extra character matching to support "Enclosed Alphanumerics".