RA-8607: PSL data parser updates rebased onto latest upstream by digilob · Pull Request #106 · digicert/domain-control-validation

digilob · 2026-03-25T19:01:55Z

Rebased RA-8607 changes from domain-control-validation-review onto latest digicert/domain-control-validation master.

library/src/test/java/com/digicert/validation/psl/PslDataParserTest.java

Copilot

Pull request overview

Updates the Public Suffix List (PSL) parsing to better handle internationalized (Unicode) entries by inserting punycode variants, and expands test coverage for Unicode scripts and PSL rule types.

Changes:

Update PslDataParser.addToTrie to use IDN.toASCII(..., IDN.ALLOW_UNASSIGNED) and avoid failing the full parse on conversion errors.
Add parameterized tests covering multiple Unicode scripts, wildcard/exception rules, comment/empty-line skipping, and private-domain section handling.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
library/src/main/java/com/digicert/validation/psl/PslDataParser.java	Adjusts IDN/punycode conversion behavior (ALLOW_UNASSIGNED + exception handling) during trie insertion.
library/src/test/java/com/digicert/validation/psl/PslDataParserTest.java	Adds parameterized test suites for Unicode scripts and additional PSL parsing behaviors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T19:06:51Z

library/src/test/java/com/digicert/validation/psl/PslDataParserTest.java

+    private static Stream<Arguments> provideUnicodeScriptTestData() {
+        return Stream.of(
+                Arguments.of("ᬩᬮᬶ", "xn--9tfky", "Balinese script (.bali TLD)"),
+                Arguments.of("संगठन", "xn--i1b6b1a6a2e", "Devanagari script (.org in Hindi)"),
+                Arguments.of("বাংলা", "xn--54b7fta0cc", "Bengali script"),
+                Arguments.of("ελ", "xn--qxam", "Greek (.el for Greece)"),
+                Arguments.of("рф", "xn--p1ai", "Cyrillic (.rf for Russia)"),
+                Arguments.of("한국", "xn--3e0b707e", "Korean (.kr)"),
+                Arguments.of("ไทย", "xn--o3cw4h", "Thai (.th)"),
+                Arguments.of("இலங்கை", "xn--xkc2al3hye2a", "Tamil (.lk for Sri Lanka)"),
+                Arguments.of("مصر", "xn--wgbh1c", "Arabic (.eg for Egypt)"),
+                Arguments.of("קום", "xn--9dbq2a", "Hebrew (.com in Hebrew)"),
+                Arguments.of("münchen", "xn--mnchen-3ya", "German umlaut"),
+                Arguments.of("日本", "xn--wgv71a", "Japanese"),
+                Arguments.of("中国", "xn--fiqs8s", "Chinese"),
+                Arguments.of("example.com", "example.com", "ASCII domain")
+        );


The Unicode-script parameterized test hardcodes expected punycode strings. These values are derived from java.net.IDN and can change across JDK updates, making the test brittle even when the parser behavior is correct. Consider deriving the expected punycode at runtime via IDN.toASCII(unicodeEntry, IDN.ALLOW_UNASSIGNED) (or generating the punycode in the method source) and asserting that value is present in the trie, rather than pinning specific punycode literals.

Copilot · 2026-03-25T19:06:51Z

library/src/main/java/com/digicert/validation/psl/PslDataParser.java

+        // Using ALLOW_UNASSIGNED flag to handle edge cases like Balinese script (ᬩᬮᬶ)
+        try {
+            String punycode = IDN.toASCII(substring, IDN.ALLOW_UNASSIGNED);
+            if (!punycode.equals(substring)) {
+                trie.insert(punycode);
+            }
+        } catch (IllegalArgumentException e) {
+            // If conversion fails even with ALLOW_UNASSIGNED, the original substring
+            // has already been inserted above, so no entries are skipped
+            // This should be rare but we handle it gracefully
        }


addToTrie now swallows IllegalArgumentException from IDN.toASCII(...) without any logging or surfacing of the bad PSL entry. If conversion fails for an entry, the punycode form won’t be inserted, which can lead to incorrect suffix classification for punycode input while failing silently. Consider either (a) rethrowing with context (e.g., include the offending substring) to avoid loading a partially-correct PSL, or (b) logging at least once (warn/debug) so the issue can be diagnosed in production.

Copilot · 2026-03-25T19:06:51Z

library/src/main/java/com/digicert/validation/psl/PslDataParser.java

     * and validated correctly.
+     * <p>
+     * The method uses the {@code IDN.ALLOW_UNASSIGNED} flag to handle Unicode characters that are not yet assigned
+     * in the Unicode standard (such as Balinese script). This flag allows the conversion to succeed for a wider
+     * range of Unicode characters that may appear in the Public Suffix List.


The Javadoc says adding punycode “ensures that both Unicode and punycode versions ... are recognized”. With the new try/catch, punycode insertion is now best-effort (it can be skipped on IllegalArgumentException). Please update the wording to match the behavior, or make punycode insertion fail-fast so the guarantee holds.

…moving duplicate/unuseful tests

digilob · 2026-03-25T19:34:02Z

Closing in favor of a squashed, neutral branch/PR without RA ticket references.

Copilot AI review requested due to automatic review settings March 25, 2026 19:01

Copilot started reviewing on behalf of digilob March 25, 2026 19:02 View session

github-advanced-security bot found potential problems Mar 25, 2026

View reviewed changes

Copilot AI reviewed Mar 25, 2026

View reviewed changes

digilob force-pushed the RA-8607_psl_data_parser branch from e881baf to ec95178 Compare March 25, 2026 19:16

digilob added 4 commits March 25, 2026 13:17

RA-8607 Data Parsing PSL

b8dca5f

RA-8607 Data Parsing PSL - Refactoring Unit Tests

1c61558

RA-8607 Data Parsing PSL - Refactoring Unit Tests for comments and re…

7bcf333

…moving duplicate/unuseful tests

Refine PSL parser handling and stabilize unicode tests

9ea12bb

digilob force-pushed the RA-8607_psl_data_parser branch from ec95178 to 9ea12bb Compare March 25, 2026 19:17

digilob closed this Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RA-8607: PSL data parser updates rebased onto latest upstream#106

RA-8607: PSL data parser updates rebased onto latest upstream#106
digilob wants to merge 4 commits intomasterfrom
RA-8607_psl_data_parser

digilob commented Mar 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

digilob commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

digilob commented Mar 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

digilob commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants