Skip to content

Fallback to NFKC Normalization #14

@JamoCA

Description

@JamoCA

I was attempting to normalize some domain names that were using high ascii characters. I was able to properlly convert them to ASCII using java.net.IDN's toASCII(), but not Junidecode. I kept receiving "java.lang.NullPointerException" errors.

Here are some sample strings that caused errors (I hope this works):

  • ℰ𝒳𝒜ℳ𝓟ℒℰ
  • 🄴🅇🄰🄼🄿🄻🄴
  • ⓔⓧⓐⓜⓟⓛⓔ

I was able to use Junidecode to convert to ASCII7, but the steps required testing if NullPointerException error occurs and then normalize using NFKC and reattempt Junidecode.

Here's the ColdFusion/Java function that I used to normalizeUTF. (Should this be performed on all strings before attempting to convert?)

/* NFKC: UTF Compatibility Decomposition, followed by Canonical Composition */
function normalizeUTF(inputString){
    var normalizer = createObject( "java", "java.text.Normalizer" );
    var normalizerForm = createObject( "java", "java.text.Normalizer$Form" );
    return normalizer.normalize( javaCast( "string", arguments.inputString ), normalizerForm.NFKC );
}

I've also identified that if this approach is used, you won't have to add extra character matching to support "Enclosed Alphanumerics".

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions