feat(reporter): layered output sanitization + ADR 0003 by Zious11 · Pull Request #57 · Zious11/wirerust

Zious11 · 2026-04-09T16:02:51Z

Summary

Establishes ADR 0003 (Reporting Pipeline Layering) as the architectural rule for how attacker-controlled bytes flow through the reporting pipeline. The data layer (analyzers + Finding struct) now stores raw post-from_utf8_lossy bytes; each reporter (terminal, JSON, future CSV/SQLite) escapes for its own output medium at render time. Reverses PR #49's construction-site escaping in TLS and centrally fixes an injection vulnerability in the HTTP analyzer's 7 findings.

Motivating vulnerability

During PR #49 follow-up, an audit of HTTP finding construction at src/analyzer/http.rs:183, 209, 227, 242, 257, 271, 285 found that path-traversal, web-shell, admin-panel, unusual-method, missing-Host, long-URI, and empty-UA findings all interpolated parsed.uri / parsed.method / parsed.host directly via format!("{}", ...). An attacker-supplied URI containing \x1b[31mRED\x1b[0m would reach the operator's terminal unescaped (CWE-117).

PR #49 patched the TLS path with {:?} (Debug formatter), which worked for terminal safety but (a) permanently destroyed forensic bytes in Finding.summary — Cyrillic SNIs became \u{43f}\u{440}... in JSON exports — and (b) was tribal knowledge that didn't propagate to HTTP.

Solution

Move escaping from construction-site (analyzer) to display-time (terminal reporter), using a custom helper escape_for_terminal (~15 lines) built on char::escape_default gated by char::is_ascii_control() || ('\u{80}'..='\u{9f}').contains(&c) || c == '\\'. The gate preserves valid non-ASCII Unicode (Cyrillic, CJK, emoji) while escaping C0, DEL, C1, and backslash. str::escape_default was considered and rejected after empirical verification showed it mangles all non-ASCII (see commit 45cf649 and ADR 0003 Validation section).

Behavioral changes visible to users

JSON output preserves Cyrillic / CJK / emoji as raw UTF-8. A TLS SNI of пример.рф now appears readably in JSON exports instead of \u{43f}\u{440}.... Downstream tooling gets the actual hostname.
HTTP findings gain terminal-safety. An attacker who embeds \x1b[31m in a URI can no longer recolor the operator's terminal output. Zero HTTP analyzer code changes — the fix is centralized in the terminal reporter.
Both C0 and C1 control codepoints are escaped at terminal render. C1 codepoints like U+009B (CSI) — encoded in valid UTF-8 as 0xC2 0x9B — are escaped to close the DEC S8C1T attack vector. Narrow but real threat.

Non-changes

Finding struct shape (same fields, same serde derives)
JSON reporter (already safe via serde_json's automatic RFC 8259 escaping)
HTTP analyzer source code (raw interpolation is now correct under ADR 0003)
DNS analyzer (emits no findings)
No new dependencies

Test plan

cargo test --all — 166 tests pass (up from 163 pre-PR)
cargo clippy --all-targets -- -D warnings — clean
cargo fmt -- --check — clean
Unit tests on escape_for_terminal helper (11 total): ESC, BEL, DEL, tab/newline/CR, backslash, printable ASCII, Cyrillic preservation, emoji preservation, mixed content, empty string, C1 NEL+CSI, C1 range boundaries
Reporter integration test: terminal reporter escapes ESC in summary + evidence, preserves Cyrillic
Reporter integration test: JSON reporter preserves Cyrillic as readable Unicode, round-trips via serde_json::from_str
End-to-end contract test covering all three layers (struct preserves raw, terminal escapes, JSON escapes)
TLS analyzer test: raw ESC bytes survive to Finding.summary on the non-UTF-8 path (inverted from the pre-PR contract)
TLS analyzer test: raw Cyrillic substring present in Finding.summary on the non-ASCII path (guards the {hostname:?} → {hostname} rollback)

Multi-agent PR review

Three reviewer agents (code-reviewer, pr-test-analyzer, comment-analyzer) reviewed this branch and returned APPROVED_WITH_NITS / COVERAGE_ADEQUATE / APPROVED_WITH_NITS respectively. The code-reviewer's one Important finding (C1 codepoint bypass + ADR reasoning error) was applied in 7d2cd3c; the pr-test-analyzer's top three coverage suggestions (JSON Cyrillic, TLS non-ASCII rollback, terminal Cyrillic) were applied in b9cc820. One coverage suggestion was deferred to #56 (HTTP analyzer end-to-end through terminal reporter).

Commits

14 commits on top of develop. Notable:

f4c6098 + 45cf649 — ADR 0003 (original + correction)
1143d0b — Implementation plan
1f7e556 + 755837b — Escape helper + apply in render loop
0e42aad — Roll back TLS Debug formatter
7e4f1df — End-to-end layering contract test
7d2cd3c + b9cc820 — Post-review C1 fix + coverage additions

Follow-up: test: end-to-end HTTP analyzer → terminal reporter regression test #56 (HTTP end-to-end test)
Supersedes the construction-site approach from PR fix: surface non-UTF-8 SNI hostnames instead of dropping #49
References RFC 8259 §7 (JSON control byte escaping), CWE-117 (log injection), OWASP output encoding guidance

Establishes the architectural principle that the data layer (analyzers and the Finding struct) holds raw bytes from network captures, while each reporter (terminal, JSON, future CSV/SQLite/HTML/SIEM) is responsible for all formatting that depends on its own output medium — escaping, styling, truncation, localization. The motivating consequence is terminal-safe escaping: an audit during PR #49 follow-up found 7 unprotected interpolations in the HTTP analyzer where attacker-controlled bytes could reach the operator's terminal and be interpreted as ANSI escape sequences (CWE-117). PR #49 patched TLS by escaping at the construction site, which solved the immediate vulnerability but destroyed forensic data (Cyrillic SNIs became hex blobs in JSON output) and required every new analyzer to remember the rule — the HTTP audit shows the rule never propagated. The decision: escape at display time in the reporter, using stdlib str::escape_default in the terminal reporter. Analyzers store raw bytes; JSON output stays safe via serde_json's automatic RFC 8259 escaping; future reporters apply their own format's rules. Validated against OWASP output-encoding guidance (CWE-117) and Rust stdlib behavior via Perplexity 2026-04-08. Other formatting concerns (truncation, cipher-name fallback, verdict localization) follow the same principle but are explicitly deferred per YAGNI. Implementation follows in subsequent commits.

…::escape_default An initial Perplexity answer (relied on during the first draft) claimed str::escape_default preserves multi-byte UTF-8 while escaping only C0 control bytes. Empirical verification during plan self-review — a small rustc-compiled program that fed ESC, BEL, DEL, Cyrillic, emoji, and mixed content through str::escape_default — falsified this: the method routes every character through char::escape_default, which escapes all non-ASCII as \u{...}. A Cyrillic SNI like "пример.рф" would become "\u{43f}\u{440}..." — the same UX problem as PR #49's Debug formatter. The Perplexity answer had conflated str::escape_default (method on str, operates on chars, escapes all non-ASCII) with std::ascii::escape_default (free function, operates on bytes, uses \x{hh}). These are not equivalent. Corrections in this commit: - Replaced "The mechanism: str::escape_default" with an 8-line custom helper that iterates chars() and gates char::escape_default() on is_ascii_control() || '\\'. Empirically verified against the same test inputs. - Replaced the "char::escape_debug (per-char Debug)" alternative entry with "Stdlib str::escape_default", since they're functionally equivalent for our input space and the new entry correctly documents the rejection. - Updated Rationale bullet to describe the custom helper. - Updated Validation section to record the empirical check and the corrected Perplexity claim. - Fixed an incorrect "\x1b" reference in the behavioral-changes section (char::escape_default uses \u{1b}, not \x1b). No change to the layering principle, the decision to escape at display time, the rollback of TLS PR #49, the HTTP analyzer impact, or any of the file-level consequences. Only the escape primitive description changed.

Seven bite-sized TDD tasks implementing the reporting pipeline layering decision from ADR 0003 (commits f4c6098 + 45cf649): Task 1: Cross-reference ADR 0003 from ADR 0002 Task 2: Finding::Display doc warning Task 3: escape_for_terminal helper + 9 unit tests Task 4: Apply helper in terminal reporter render loop Task 5: Roll back TLS construction-site {:?} escaping Task 6: End-to-end regression test (data layer + both reporters) Task 7: cargo test + clippy + fmt Each task has complete code blocks, exact commands, and expected outputs. Test assertions were pre-validated empirically with a small rustc-compiled program (see commit 45cf649 for the primitive correction). No placeholders, no speculative steps. Total impact: ~100 lines of production-code change, ~120 lines of test code, 6-7 commits beyond the ADR.

Point analyzer authors at the layering principle so they know output sanitization is the reporter's responsibility, not theirs.

Doc comment on the impl Display block explaining that the raw text representation is for logging/debugging only and may contain attacker-controlled control bytes. Terminal rendering should go through the terminal reporter, which escapes per ADR 0003.

Private helper in src/reporter/terminal.rs built on stdlib char::escape_default, gated by char::is_ascii_control || backslash. Escapes C0 control bytes, DEL, and backslash while preserving all valid non-ASCII Unicode (Cyrillic, CJK, emoji). Unit tested for ESC, BEL, DEL, tab/newline/CR short escapes, backslash, printable ASCII, Cyrillic, emoji, mixed content, and empty string. Not yet applied to the rendering loop — that's the next commit, so the helper can land with dedicated tests before it's integrated. See ADR 0003 for the layering rationale.

Apply escape_for_terminal to every Finding.summary and every evidence entry in the terminal reporter's rendering loop. This moves output sanitization from the analyzer construction site (where PR #49 placed it) to the display layer, per ADR 0003. JSON output remains unchanged and is already safe via serde_json's automatic RFC 8259 escaping. HTTP findings (path traversal, web shell, admin panel, unusual method, etc.) — previously vulnerable to terminal injection via attacker-controlled URIs — are fixed by this change with zero analyzer-side code modifications.

…ed_ev Minor clarity improvement from Task 4 code quality review. "safe" is ambiguous — the escaped string still contains the escape sequences in textual form, which is what the terminal wants. "escaped" names the action (control-byte escaping via char::escape_default) rather than a misleading safety property. Zero behavioral change.

Remove {:?} Debug-format interpolation in both NonAsciiUtf8 and NonUtf8 SNI finding arms. The data layer now stores raw post-from_utf8_lossy bytes; the terminal reporter is responsible for escaping at render time per ADR 0003. Inverts test_non_utf8_sni_escapes_control_bytes_in_summary — renamed to test_non_utf8_sni_preserves_raw_bytes_in_summary — to enforce the new contract: raw ESC bytes MUST survive to Finding.summary. A separate regression test (next commit) verifies the end-to-end pipeline: terminal reporter escapes, JSON reporter preserves via serde, struct holds the raw byte. Behavioral change visible to JSON consumers: Cyrillic SNIs now appear as readable Unicode ("пример.рф") instead of hex-escaped placeholders ("\u{43f}\u{440}...").

Single Finding flows through the data layer and both reporters. Asserts: 1. Finding struct preserves raw ESC byte (forensic ground truth). 2. Terminal reporter escapes ESC to \u{1b} on display. 3. JSON reporter serializes ESC as \u001b per RFC 8259. 4. JSON round-trips back to the original raw byte (reversibility). Any regression that re-introduces construction-site escaping, removes the terminal reporter's helper, or changes the JSON path will fail this test.

The escape_for_terminal helper gated only on char::is_ascii_control(), which matches C0 + DEL but NOT the C1 range (U+0080-U+009F). C1 codepoints like U+009B (CSI) and U+0085 (NEL) can appear in a valid UTF-8 String — they're encoded as two-byte sequences (e.g., 0xC2 0x9B for U+009B) that survive from_utf8_lossy. A terminal in DEC S8C1T mode interprets U+009B as the 8-bit equivalent of ESC[, so an attacker-supplied SNI or HTTP URI containing 0xC2 0x9B could bypass the escape and inject ANSI control sequences. Modern terminals in UTF-8 mode default to NOT interpreting C1 this way (verified via Perplexity for xterm, iTerm2, gnome-terminal, Windows Terminal, VS Code, tmux), but the DEC mode exists and some legacy terminals honor it. Escape C1 as cheap insurance. The ADR's Coverage of the threat model section contained a subtly wrong claim: 'a standalone byte in the 0x80-0x9f range cannot appear in a valid UTF-8 string'. That's correct for a single raw byte but irrelevant — the codepoint U+009B is a valid multi-byte UTF-8 sequence and can absolutely appear in a String. ADR updated to reflect the actual helper behavior and the corrected reasoning. Adds two new unit tests (escapes_c1_nel_and_csi, escapes_c1_range_boundaries) covering U+0080, U+0085, U+009B, U+009F, plus the U+00A0 (NBSP) boundary — which is just past C1 and must pass through unchanged.

Three additions from the multi-agent PR review triage: 1. test_json_reporter_preserves_cyrillic_as_readable_unicode — pins the ADR 0003 user-visible claim that JSON consumers now see raw Cyrillic hostnames instead of the PR #49 hex-mangled form. Includes a round-trip serde_json::from_str check proving the Cyrillic survives the full trip. 2. test_terminal_reporter_escapes_esc_bytes_in_summary augmented with Cyrillic content in the summary payload. Previously only covered ASCII + ESC — now proves the render loop preserves Cyrillic while still escaping control bytes (catches accidental double-escape bugs in the for-findings loop). 3. test_cyrillic_sni_emits_non_ascii_finding augmented with raw- Cyrillic-substring assertions on Finding.summary. This directly guards the {hostname:?} → {hostname} rollback on the NonAsciiUtf8 match arm in src/analyzer/tls.rs, which was otherwise only structurally similar (and untested) compared to the non-UTF-8 branch that has the inverted dedicated test.

Copilot

Pull request overview

Implements ADR 0003’s “layered reporting pipeline” rule by preserving attacker-controlled bytes in the data layer (Finding) while performing output-medium-specific escaping at render time (terminal), restoring forensic fidelity (Unicode preserved) and centralizing terminal-safety.

Changes:

Add terminal-layer escaping (escape_for_terminal) and apply it when rendering finding summaries/evidence.
Roll back TLS analyzer construction-site Debug escaping so Finding.summary preserves raw bytes again.
Add ADR 0003 + tests that pin the layering contract across data/terminal/JSON outputs.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/reporter/terminal.rs`	Adds `escape_for_terminal` and applies it to `Finding.summary`/`evidence` during terminal rendering.
`src/analyzer/tls.rs`	Removes `{:?}`-based construction-site escaping for SNI summaries to preserve raw bytes.
`src/findings.rs`	Documents that `Finding`’s `Display` is raw and not terminal-safe.
`tests/reporter_tests.rs`	Adds integration/contract tests for terminal escaping + JSON Unicode preservation/round-trip.
`tests/tls_analyzer_tests.rs`	Inverts TLS test contract to assert raw bytes are preserved in `Finding.summary`; adds Cyrillic regression guard.
`docs/adr/0003-reporting-pipeline-layering.md`	Introduces ADR 0003 describing the layering rule and escape approach.
`docs/adr/0002-modular-protocol-analyzers.md`	Adds a guideline pointing analyzer authors to ADR 0003 (no construction-site escaping).
`docs/superpowers/plans/2026-04-09-reporting-pipeline-layering.md`	Adds an implementation plan document for the ADR 0003 work.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The C1 control codepoint fix in commit 7d2cd3c updated the helper's predicate to include the U+0080..=U+009F range, but only patched one section of the ADR ('Coverage of the threat model') and left four other places describing the pre-C1 predicate. Copilot's review of #57 caught all five mismatches (four in ADR 0003, one in the plan). Fixes: - ADR 0003 'Immediate scope' section: code sample updated to include the C1 range check, and the 'helper escapes' bullet list gains an explicit C1 bullet pointing to the later threat-model paragraph. - ADR 0003 'Rationale' bullet: 'Escapes exactly the bytes that constitute the threat (C0 + DEL + backslash)' -> 'C0 + C1 + DEL + backslash'. Also bumps the line count estimate (~8 -> ~15) to reflect the helper's actual size after the C1 addition. - ADR 0003 'Consequences' table row for src/reporter/terminal.rs: description now mentions C1 controls as part of the required predicate. - Plan Architecture paragraph: clarifies that the helper gates on ASCII control + C1 control + backslash. Adds a forward reference to commit 7d2cd3c so future plan-followers know the C1 range was added post-plan and Task 3's code samples reflect the pre-C1 predicate for historical accuracy. - Plan 'File Structure' clarification: 'No files are created' was technically inaccurate since this plan doc and ADR 0003 are both new files; reworded to 'No new code or runtime files are created' with an explicit note that the two docs are prerequisites committed before the implementation tasks begin. Per Copilot on #57.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TerminalReporter::render's ANALYZER SUMMARIES block was printing detail values (serde_json::Value) via {val} direct interpolation, relying on serde_json's Display impl to handle escaping. That works for C0 + DEL (serde_json escapes them per RFC 8259 §7) but NOT for C1 codepoints (U+0080-U+009F) — serde_json preserves those as raw UTF-8 two-byte sequences per RFC 8259, which is valid but leaves a C1 terminal injection vector open. Empirically verified: a serde_json::Value::Array containing "before\u{9b}31mafter" renders via Display to a JSON string that contains the raw 0xC2 0x9B bytes (U+009B CSI). The helper fix in commit 7d2cd3c closed this gap in the findings block but didn't reach the analyzer summaries block. Detail values include attacker-controlled strings from HTTP top_hosts / top_user_agents / recent_uris and TLS top_snis — these all come through String::from_utf8_lossy in the analyzers, so embedded C0 + C1 bytes can appear in the serde_json::Value leaves exposed to the terminal. Fix: apply escape_for_terminal to val.to_string() before writing to the output buffer, matching the findings-block pattern. The JSON rendering path now has a defense-in-depth double layer: serde_json's RFC 8259 escape for C0 (which produces \u001b that escape_for_terminal passes through as printable ASCII), plus escape_for_terminal's C1 gate catching any raw U+0080-U+009F bytes in the JSON string. Adds test_terminal_reporter_escapes_control_bytes_in_analyzer_summaries covering the C0 and C1 cases through the full ANALYZER SUMMARIES render path, plus a Cyrillic preservation assertion. Also fixes the plan's Task 2 doc-comment snippet which still mentioned str::escape_default — replaces with the correct escape_for_terminal reference (Copilot round 2 comment). Per Copilot round 2 review on #57.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

## Summary - Adds 3 end-to-end tests that drive `HttpAnalyzer::on_data` with crafted HTTP requests containing C1 CSI (U+009B) injection, then render through both `TerminalReporter` and `JsonReporter`, verifying ADR 0003's output sanitization contract. - Key discovery during issue validation: `httparse` rejects C0 control bytes (including ESC 0x1b) in URIs and header values, but **accepts** C1 codepoints (U+009B CSI = `0xC2 0x9B`) because their UTF-8 encoding uses high bytes (≥ 0x80). C1 CSI is the real viable injection vector through the HTTP analyzer. - Closes the coverage gap identified by `pr-test-analyzer` during the ADR 0003 PR review (#57). ### Tests added | Test | Vector | Path | |------|--------|------| | `test_http_finding_c1_csi_escaped_by_terminal_reporter` | C1 CSI in path-traversal URI | Finding → TerminalReporter | | `test_http_finding_c1_csi_in_json_reporter` | C1 CSI in path-traversal URI | Finding → JsonReporter round-trip | | `test_http_analyzer_summary_c1_csi_escaped_by_terminal_reporter` | C1 CSI in Host header | AnalyzerSummary → TerminalReporter | Closes #56 ## Test plan - [x] All 12 reporter tests pass (`cargo test --test reporter_tests`) - [x] Full suite passes (170+ tests, 0 failures) - [x] `cargo clippy --all-targets` clean - [x] `cargo fmt --check` clean - [x] Multi-agent PR review (test-analyzer, code-reviewer, comment-analyzer) — 0 critical issues

Zious11 added 14 commits April 9, 2026 09:59

docs(adr): cross-reference ADR 0003 from ADR 0002

902827f

Point analyzer authors at the layering principle so they know output sanitization is the reporter's responsibility, not theirs.

style: cargo fmt

96ed07d

style: cargo fmt

5bdecc7

Zious11 requested a review from Copilot April 9, 2026 16:03

Copilot started reviewing on behalf of Zious11 April 9, 2026 16:03 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Zious11 requested a review from Copilot April 9, 2026 16:12

Copilot started reviewing on behalf of Zious11 April 9, 2026 16:13 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Comment thread src/reporter/terminal.rs

Comment thread docs/superpowers/plans/2026-04-09-reporting-pipeline-layering.md Outdated

Zious11 requested a review from Copilot April 9, 2026 16:24

Copilot started reviewing on behalf of Zious11 April 9, 2026 16:24 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Zious11 merged commit e1d0a6f into develop Apr 9, 2026
8 checks passed

Zious11 deleted the worktree-adr-0003-output-sanitization branch April 9, 2026 16:31

Zious11 mentioned this pull request Apr 10, 2026

test: e2e HTTP analyzer → reporter regression tests #58

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(reporter): layered output sanitization + ADR 0003#57

feat(reporter): layered output sanitization + ADR 0003#57
Zious11 merged 16 commits intodevelopfrom
worktree-adr-0003-output-sanitization

Zious11 commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Zious11 commented Apr 9, 2026

Summary

Motivating vulnerability

Solution

Behavioral changes visible to users

Non-changes

Test plan

Multi-agent PR review

Commits

Related

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants