feat: flag valid-UTF-8 but non-ASCII SNI as RFC 6066 violation by Zious11 · Pull Request #55 · Zious11/wirerust

Zious11 · 2026-04-08T23:35:33Z

Summary

Adds a new Anomaly / Inconclusive / Low finding for TLS SNI hostnames that are valid UTF-8 but contain non-ASCII codepoints (e.g. raw U-labels like "café.example", "пример.example", "🦀.example"). RFC 6066 §3 requires HostName to be ASCII, with internationalized names sent as A-labels per RFC 5890 (xn--… Punycode form). Closes TLS: flag valid-UTF-8 but non-ASCII SNI as RFC 6066 violation #51.
Renames SniValue::Utf8(String) → SniValue::Ascii(String) (the old name was misleading — ASCII is also valid UTF-8) and adds a new NonAsciiUtf8 { hostname, hex } variant. The existing NonUtf8 variant is unchanged.
Flips the pin-test from test: expand TLS SNI edge-case coverage #50 (test_valid_utf8_non_ascii_sni_currently_not_flagged → test_valid_utf8_non_ascii_sni_emits_finding) and adds 3 new positive cases (Cyrillic, emoji, Punycode regression).
Code review (code-reviewer) applied pre-merge: zero critical, zero important, all suggestions explicit defers.
Follow-up TLS: flag SNI hostnames containing ASCII control codes (BEL/ESC/DEL/C0) #54 filed for the related ASCII-control-code gap (S1 — pre-existing, out of TLS: flag valid-UTF-8 but non-ASCII SNI as RFC 6066 violation #51's scope).

Real-world false-positive risk

Validated with Perplexity that all major TLS clients auto-Punycode internationalized hostnames before sending the SNI:

Client	Behavior
rustls	Rejects raw UTF-8 at the `ServerName` API level, requires Punycode upfront
Chrome / BoringSSL	Always Punycode (per RFC 3492 + 6066)
Firefox / NSS	Always Punycode (URL bar may display Unicode for user-friendliness, but the wire is always Punycode)
curl / libcurl	Auto-converts via libidn2 before calling OpenSSL
OpenSSL raw API	Passes verbatim — but expects ASCII from the caller; only legitimate path to a raw U-label SNI is a custom application using OpenSSL directly without IDNA prep

So the false-positive surface is small: a buggy custom client, an OpenSSL-direct app that forgot to Punycode, or an attacker tool. Severity is Anomaly / Inconclusive / Low — same rationale as the non-UTF-8 finding from #49.

Implementation

enum SniValue {
    Ascii(String),                                  // RFC compliant
    NonAsciiUtf8 { hostname: String, hex: String }, // valid UTF-8, non-ASCII (NEW)
    NonUtf8 { lossy: String, hex: String },         // invalid UTF-8 (existing)
}

extract_sni uses s.is_ascii() to split the UTF-8 case. handle_client_hello uses an exhaustive match on the variants, so a future variant addition (e.g. for #54's ASCII-control-code case) will fail to compile until handled. Map-keying for NonAsciiUtf8 is the bare hostname (no <non-ascii:HEX> tagging needed — valid UTF-8 strings have no collision risk because from_utf8 is a bijection on well-formed bytes).

Findings shape

The new finding mirrors the existing NonUtf8 finding pattern for consistency:

category: Anomaly, verdict: Inconclusive, confidence: Low
Summary: "TLS SNI contains non-ASCII characters (RFC 6066 requires A-labels per RFC 5890): {hostname:?}" — {:?} Debug formatter to escape any control codepoints that might survive UTF-8 decoding (e.g. U+0085 NEL)
Evidence: vec![format!("hex: {hex}")] for forensic byte-level review
mitre_technique: None (matches existing TLS findings; no clean ATT&CK technique for protocol-format violations)

Test plan

cargo test --test tls_analyzer_tests — 21/21 pass (~650ms; was 18 before)
cargo test — full suite green
cargo clippy --all-targets -- -D warnings — clean
cargo fmt --check — clean
Code review pass (code-reviewer): zero critical, zero important; one defer-only suggestion (S1 — ASCII control codes) filed as TLS: flag SNI hostnames containing ASCII control codes (BEL/ESC/DEL/C0) #54
Real-world false-positive risk Perplexity-validated against rustls, OpenSSL, Chrome/BoringSSL, Firefox/NSS, curl/libcurl
RFC 6066 §3 ASCII requirement and RFC 5890 A-label requirement reconfirmed (already validated earlier this session via direct RFC fetch)

New tests

test_valid_utf8_non_ascii_sni_emits_finding (flipped from test: expand TLS SNI edge-case coverage #50's pin-test) — "café.example", asserts category/verdict/confidence/RFC mention/A-label mention/hex evidence (636166c3a92e6578616d706c65).
test_cyrillic_sni_emits_non_ascii_finding — "пример.example" (Cyrillic 2-byte UTF-8 sequences).
test_emoji_sni_emits_non_ascii_finding — "🦀.example" (4-byte UTF-8 codepoint).
test_punycode_a_label_does_not_emit_non_ascii_finding — "xn--caf-dma.example" regression: the RFC-compliant Punycode form is pure ASCII and must NOT be flagged.

Closes #51. PR #49 surfaced non-UTF-8 SNI bytes as a finding but deliberately left a related RFC 6066 §3 violation unflagged: SNI hostnames that are valid UTF-8 but contain non-ASCII codepoints, e.g. raw U-labels like "café.example" or "пример.example". The spec requires HostName to be ASCII (with internationalized names sent as A-labels per RFC 5890 Punycode `xn--…` form), so a non-ASCII byte on the wire is a real protocol violation. Real-world false-positive risk ------------------------------ Validated via Perplexity that all major TLS clients auto-Punycode internationalized hostnames before sending the SNI: - rustls — rejects raw UTF-8 at the ServerName API level, requires Punycode upfront - Chrome / BoringSSL — always Punycode (per RFC 3492 + 6066) - Firefox / NSS — always Punycode (URL bar may display Unicode for user-friendliness, but the wire is always Punycode) - curl / libcurl — auto-converts via libidn2 before calling OpenSSL - OpenSSL raw — passes verbatim, but expects ASCII from the caller; applications using OpenSSL directly without IDNA prep are the only legitimate path to a raw U-label SNI on the wire So the false-positive surface is small: a buggy custom client, an attacker tool, or an OpenSSL-direct app that forgot to Punycode. Severity is `Anomaly` / `Inconclusive` / `Low` — same rationale as the non-UTF-8 finding. SniValue enum ------------- Renamed `Utf8(String)` → `Ascii(String)` (the old name was misleading since ASCII is also valid UTF-8) and added a new variant: enum SniValue { Ascii(String), // RFC compliant NonAsciiUtf8 { hostname: String, hex: String }, // valid UTF-8, non-ASCII NonUtf8 { lossy: String, hex: String }, // invalid UTF-8 (existing) } `extract_sni` now uses `s.is_ascii()` to split the UTF-8 case into `Ascii` and `NonAsciiUtf8`. The hostname for `NonAsciiUtf8` is the decoded String (always valid UTF-8 by definition); `hex` is the lossless representation for forensic evidence. `handle_client_hello` keys `sni_counts` on the hostname directly for both Ascii and NonAsciiUtf8 (no collision risk for valid UTF-8) and keeps the existing `<non-utf8:HEX>` tagged form for NonUtf8. The finding summary uses `{hostname:?}` Debug formatter to escape any control codepoints that might survive UTF-8 decoding (e.g. U+0085 NEL). Tests ----- - Flipped `test_valid_utf8_non_ascii_sni_currently_not_flagged` to `test_valid_utf8_non_ascii_sni_emits_finding` — asserts the new finding fires for "café.example" with the right severity, RFC reference in summary, and hex evidence (`636166c3a92e6578616d706c65`). - Added `test_cyrillic_sni_emits_non_ascii_finding` for "пример.example". - Added `test_emoji_sni_emits_non_ascii_finding` for "🦀.example" (4-byte UTF-8 codepoint, all bytes ≥ 0x80). - Added `test_punycode_a_label_does_not_emit_non_ascii_finding` for "xn--caf-dma.example" — pins that the RFC-compliant Punycode form is NOT flagged.

Copilot

Pull request overview

This PR tightens TLS SNI analysis to flag RFC 6066 violations where the SNI hostname bytes are valid UTF-8 but include non-ASCII characters (raw U-labels), while preserving existing handling for non-UTF-8 SNI.

Changes:

Refines SNI decoding by splitting valid SNI into Ascii vs NonAsciiUtf8 (with hex evidence), and emits an Anomaly / Inconclusive / Low finding for the non-ASCII UTF-8 case.
Updates SNI counting to key Ascii and NonAsciiUtf8 by the decoded hostname string, while keeping the existing hex-tag keying for non-UTF-8.
Expands and flips tests to assert the new non-ASCII SNI finding, including Cyrillic/emoji positives and a Punycode A-label regression case.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
tests/tls_analyzer_tests.rs	Flips the prior pin-test and adds new cases to validate non-ASCII UTF-8 SNI finding behavior and Punycode non-flagging.
src/analyzer/tls.rs	Implements `NonAsciiUtf8` detection and finding emission; renames the ASCII-happy-path variant and updates SNI keying logic accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Addresses Copilot review on PR #55: "a RFC 6066" → "an RFC 6066". RFC is pronounced "arr-eff-see", which starts with a vowel sound, so the indefinite article is "an" — matches IETF style throughout the RFC corpus itself.

Zious11 requested a review from Copilot April 8, 2026 23:36

Copilot started reviewing on behalf of Zious11 April 8, 2026 23:36 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Comment thread tests/tls_analyzer_tests.rs Outdated

test: fix grammar in non-ASCII SNI test comment

82b31a9

Addresses Copilot review on PR #55: "a RFC 6066" → "an RFC 6066". RFC is pronounced "arr-eff-see", which starts with a vowel sound, so the indefinite article is "an" — matches IETF style throughout the RFC corpus itself.

Zious11 merged commit 51490bd into develop Apr 8, 2026
4 checks passed

Zious11 deleted the worktree-tls-non-ascii-sni branch April 8, 2026 23:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: flag valid-UTF-8 but non-ASCII SNI as RFC 6066 violation#55

feat: flag valid-UTF-8 but non-ASCII SNI as RFC 6066 violation#55
Zious11 merged 2 commits intodevelopfrom
worktree-tls-non-ascii-sni

Zious11 commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Zious11 commented Apr 8, 2026

Summary

Real-world false-positive risk

Implementation

Findings shape

Test plan

New tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants