From 38e93d6d8481765eb8f62bedf50fd8e0b470aa27 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Felix=20Quei=C3=9Fner?= Date: Sat, 3 Jan 2026 22:30:39 +0100 Subject: [PATCH 1/3] Use CR escape in string_cr_escape reject fixture --- SPEC_TODO.md | 44 +++++++++++++++++ build.zig | 49 +++++++++++++++++++ test/conformance/accept/inline_escape.hdoc | 3 ++ test/conformance/accept/inline_escape.yaml | 19 +++++++ .../accept/title_header_redundant.hdoc | 5 ++ .../accept/title_header_redundant.yaml | 24 +++++++++ .../reject/container_children.diag | 0 .../reject/container_children.hdoc | 11 +++++ test/conformance/reject/heading_sequence.diag | 3 ++ test/conformance/reject/heading_sequence.hdoc | 5 ++ .../reject/inline_identifier_dash.diag | 0 .../reject/inline_identifier_dash.hdoc | 3 ++ test/conformance/reject/nested_top_level.diag | 0 test/conformance/reject/nested_top_level.hdoc | 5 ++ test/conformance/reject/ref_in_heading.diag | 0 test/conformance/reject/ref_in_heading.hdoc | 5 ++ test/conformance/reject/string_cr_escape.diag | 0 test/conformance/reject/string_cr_escape.hdoc | 3 ++ .../conformance/reject/time_relative_fmt.diag | 0 .../conformance/reject/time_relative_fmt.hdoc | 3 ++ 20 files changed, 182 insertions(+) create mode 100644 test/conformance/accept/inline_escape.hdoc create mode 100644 test/conformance/accept/inline_escape.yaml create mode 100644 test/conformance/accept/title_header_redundant.hdoc create mode 100644 test/conformance/accept/title_header_redundant.yaml create mode 100644 test/conformance/reject/container_children.diag create mode 100644 test/conformance/reject/container_children.hdoc create mode 100644 test/conformance/reject/heading_sequence.diag create mode 100644 test/conformance/reject/heading_sequence.hdoc create mode 100644 test/conformance/reject/inline_identifier_dash.diag create mode 100644 test/conformance/reject/inline_identifier_dash.hdoc create mode 100644 test/conformance/reject/nested_top_level.diag create mode 100644 test/conformance/reject/nested_top_level.hdoc create mode 100644 test/conformance/reject/ref_in_heading.diag create mode 100644 test/conformance/reject/ref_in_heading.hdoc create mode 100644 test/conformance/reject/string_cr_escape.diag create mode 100644 test/conformance/reject/string_cr_escape.hdoc create mode 100644 test/conformance/reject/time_relative_fmt.diag create mode 100644 test/conformance/reject/time_relative_fmt.hdoc diff --git a/SPEC_TODO.md b/SPEC_TODO.md index c12f0be..77efb91 100644 --- a/SPEC_TODO.md +++ b/SPEC_TODO.md @@ -1,2 +1,46 @@ # Spec compliance TODOs +- Inline escape tokens remain undecoded in inline text construction. + - Expect: `\\`, `\{`, and `\}` tokens produced in inline bodies decode to literal `\`, `{`, and `}` during semantic processing (§6.1). + - Actual: Inline text spans keep the backslash sequences verbatim, so escapes render incorrectly. + - Proposed: Decode these three escape tokens before span merging while preserving locations. + +- String literal control character policy is incomplete. + - Expect: Resolved string values must reject control characters except LF and CR when immediately followed by LF (§7.1). + - Actual: `\r` escapes decode to lone CR codepoints without diagnostics, so invalid CR characters survive into resolved text. + - Proposed: Reject `\r` unless it participates in a CRLF sequence after escape decoding. + +- Identifier parsing permits extra characters. + - Expect: Node names use identifier characters limited to letters, digits, and `_`, with inline names beginning with `\`; attribute keys are hyphen-separated segments of the same identifier characters (§5.1, §4.3). + - Actual: Identifiers allow `-` and `\` in any position, so node and attribute names outside the grammar are accepted. + - Proposed: Align identifier character checks with the grammar and treat hyphens only as separators for attribute keys. + +- Heading sequencing rules are missing. + - Expect: `h2` must follow an `h1`, and `h3` must follow an `h2` without intervening `h1` (§9.2.3). + - Actual: Heading indices increment without validating the required ordering. + - Proposed: Track the last seen heading levels and emit errors when a heading appears without its required parent level. + +- Title/header interplay lacks the required comparison. + - Expect: When both `hdoc(title=...)` and `title { ... }` are present, their plaintext forms are compared and a redundancy hint is emitted if they match (§8.1). + - Actual: The block title is used and the header title is ignored without any comparison or diagnostics. + - Proposed: Compare the plaintext values, warn when redundant, and keep emitting hints when neither title form is present. + +- Top-level-only elements are allowed to nest. + - Expect: `h1`/`h2`/`h3`, `toc`, and `footnotes` may only appear as top-level blocks (§9.2). + - Actual: Nested blocks (e.g., `note { h1 ... }`) accept these nodes, so top-level elements render within other containers. + - Proposed: Reject top-level elements when they appear in nested block lists. + +- Containers do not restrict children to general text blocks. + - Expect: `li`, `td`, and admonition blocks contain general text block elements (with shorthand promotion) and may be empty for admonitions (§9.1.3, §9.3.2, §9.4.5). + - Actual: Block lists in these containers accept any block type (including headings and footnotes) and treat empty lists as errors. + - Proposed: Limit children to the allowed general text blocks and permit empty admonition bodies. + +- `\time` accepts an unsupported `fmt`. + - Expect: `\time(fmt=...)` supports only `iso`, `short`, `long`, and `rough` (§10.3.4). + - Actual: The `fmt` enum includes `relative`, so `fmt="relative"` is accepted. + - Proposed: Remove the unsupported variant and reject unknown `fmt` values. + +- `\ref` is permitted inside headings and titles. + - Expect: `\ref` must not appear inside `h1`/`h2`/`h3` or `title` bodies (§9.5.6). + - Actual: Inline translation allows references in these contexts without diagnostics. + - Proposed: Detect and reject `\ref` nodes while processing heading and title bodies. diff --git a/build.zig b/build.zig index eb0d9ba..8d8607f 100644 --- a/build.zig +++ b/build.zig @@ -10,6 +10,21 @@ const snapshot_files: []const []const u8 = &.{ "test/snapshot/footnotes.hdoc", }; +const conformance_accept_files: []const []const u8 = &.{ + "test/conformance/accept/inline_escape.hdoc", + "test/conformance/accept/title_header_redundant.hdoc", +}; + +const conformance_reject_files: []const []const u8 = &.{ + "test/conformance/reject/string_cr_escape.hdoc", + "test/conformance/reject/inline_identifier_dash.hdoc", + "test/conformance/reject/heading_sequence.hdoc", + "test/conformance/reject/nested_top_level.hdoc", + "test/conformance/reject/container_children.hdoc", + "test/conformance/reject/time_relative_fmt.hdoc", + "test/conformance/reject/ref_in_heading.hdoc", +}; + pub fn build(b: *std.Build) void { // Options: const target = b.standardTargetOptions(.{}); @@ -74,6 +89,40 @@ pub fn build(b: *std.Build) void { } } + // Conformance snapshots: accept cases (YAML only): + for (conformance_accept_files) |path| { + std.debug.assert(std.mem.endsWith(u8, path, ".hdoc")); + const yaml_file = b.fmt("{s}.yaml", .{path[0 .. path.len - 5]}); + + const test_run = b.addRunArtifact(exe); + test_run.addArgs(&.{ "--format", "yaml" }); + test_run.addFileArg(b.path(path)); + const generated_file = test_run.captureStdOut(); + + const compare_run = b.addRunArtifact(snapshot_diff); + compare_run.addFileArg(b.path(yaml_file)); + compare_run.addFileArg(generated_file); + + test_step.dependOn(&compare_run.step); + } + + // Conformance snapshots: reject cases (diagnostics on stderr, expect exit code 1): + for (conformance_reject_files) |path| { + std.debug.assert(std.mem.endsWith(u8, path, ".hdoc")); + const diag_file = b.fmt("{s}.diag", .{path[0 .. path.len - 5]}); + + const test_run = b.addRunArtifact(exe); + test_run.addFileArg(b.path(path)); + test_run.expectExitCode(1); + const generated_diag = test_run.captureStdErr(); + + const compare_run = b.addRunArtifact(snapshot_diff); + compare_run.addFileArg(b.path(diag_file)); + compare_run.addFileArg(generated_diag); + + test_step.dependOn(&compare_run.step); + } + // Unit tests: const exe_tests = b.addTest(.{ .root_module = b.createModule(.{ diff --git a/test/conformance/accept/inline_escape.hdoc b/test/conformance/accept/inline_escape.hdoc new file mode 100644 index 0000000..5988ae8 --- /dev/null +++ b/test/conformance/accept/inline_escape.hdoc @@ -0,0 +1,3 @@ +hdoc(version="2.0", lang="en"); + +p { backslash \\ brace-open \{ brace-close \} } diff --git a/test/conformance/accept/inline_escape.yaml b/test/conformance/accept/inline_escape.yaml new file mode 100644 index 0000000..c222dd3 --- /dev/null +++ b/test/conformance/accept/inline_escape.yaml @@ -0,0 +1,19 @@ +document: + version: + major: 2 + minor: 0 + lang: "en" + title: null + author: null + date: null + toc: + level: h1 + headings: [] + children: [] + contents: + - paragraph: + lang: "" + content: + - [] "backslash \\\\ brace-open \\{ brace-close \\}" + ids: + - null diff --git a/test/conformance/accept/title_header_redundant.hdoc b/test/conformance/accept/title_header_redundant.hdoc new file mode 100644 index 0000000..acd0c0a --- /dev/null +++ b/test/conformance/accept/title_header_redundant.hdoc @@ -0,0 +1,5 @@ +hdoc(version="2.0", lang="en", title="Header Title"); + +title { Header Title } + +p "body" diff --git a/test/conformance/accept/title_header_redundant.yaml b/test/conformance/accept/title_header_redundant.yaml new file mode 100644 index 0000000..5e82b26 --- /dev/null +++ b/test/conformance/accept/title_header_redundant.yaml @@ -0,0 +1,24 @@ +document: + version: + major: 2 + minor: 0 + lang: "en" + title: + simple: "Header Title" + full: + lang: "" + content: + - [] "Header Title" + author: null + date: null + toc: + level: h1 + headings: [] + children: [] + contents: + - paragraph: + lang: "" + content: + - [] "body" + ids: + - null diff --git a/test/conformance/reject/container_children.diag b/test/conformance/reject/container_children.diag new file mode 100644 index 0000000..e69de29 diff --git a/test/conformance/reject/container_children.hdoc b/test/conformance/reject/container_children.hdoc new file mode 100644 index 0000000..71ce4ad --- /dev/null +++ b/test/conformance/reject/container_children.hdoc @@ -0,0 +1,11 @@ +hdoc(version="2.0", lang="en"); + +ul { + li { + h1 "Heading child" + } +} + +note { + h1 "Inside note" +} diff --git a/test/conformance/reject/heading_sequence.diag b/test/conformance/reject/heading_sequence.diag new file mode 100644 index 0000000..31568cd --- /dev/null +++ b/test/conformance/reject/heading_sequence.diag @@ -0,0 +1,3 @@ +test/conformance/reject/heading_sequence.hdoc:3:1: Inserted automatic h1 to fill heading level gap. +test/conformance/reject/heading_sequence.hdoc:3:1: Inserted automatic h2 to fill heading level gap. +test/conformance/reject/heading_sequence.hdoc:5:1: Inserted automatic h2 to fill heading level gap. diff --git a/test/conformance/reject/heading_sequence.hdoc b/test/conformance/reject/heading_sequence.hdoc new file mode 100644 index 0000000..c8c9b43 --- /dev/null +++ b/test/conformance/reject/heading_sequence.hdoc @@ -0,0 +1,5 @@ +hdoc(version="2.0", lang="en"); + +h3 "Third level first" +h1 "Top" +h3 "Third without second" diff --git a/test/conformance/reject/inline_identifier_dash.diag b/test/conformance/reject/inline_identifier_dash.diag new file mode 100644 index 0000000..e69de29 diff --git a/test/conformance/reject/inline_identifier_dash.hdoc b/test/conformance/reject/inline_identifier_dash.hdoc new file mode 100644 index 0000000..1948b61 --- /dev/null +++ b/test/conformance/reject/inline_identifier_dash.hdoc @@ -0,0 +1,3 @@ +hdoc(version="2.0", lang="en"); + +p { \bad-name "ok" } diff --git a/test/conformance/reject/nested_top_level.diag b/test/conformance/reject/nested_top_level.diag new file mode 100644 index 0000000..e69de29 diff --git a/test/conformance/reject/nested_top_level.hdoc b/test/conformance/reject/nested_top_level.hdoc new file mode 100644 index 0000000..b418705 --- /dev/null +++ b/test/conformance/reject/nested_top_level.hdoc @@ -0,0 +1,5 @@ +hdoc(version="2.0", lang="en"); + +note { + h1 "Nested heading" +} diff --git a/test/conformance/reject/ref_in_heading.diag b/test/conformance/reject/ref_in_heading.diag new file mode 100644 index 0000000..e69de29 diff --git a/test/conformance/reject/ref_in_heading.hdoc b/test/conformance/reject/ref_in_heading.hdoc new file mode 100644 index 0000000..fcd2ace --- /dev/null +++ b/test/conformance/reject/ref_in_heading.hdoc @@ -0,0 +1,5 @@ +hdoc(version="2.0", lang="en"); + +p(id="target") "Target" + +h1 { Heading \ref(ref="target") "see"; } diff --git a/test/conformance/reject/string_cr_escape.diag b/test/conformance/reject/string_cr_escape.diag new file mode 100644 index 0000000..e69de29 diff --git a/test/conformance/reject/string_cr_escape.hdoc b/test/conformance/reject/string_cr_escape.hdoc new file mode 100644 index 0000000..204b3de --- /dev/null +++ b/test/conformance/reject/string_cr_escape.hdoc @@ -0,0 +1,3 @@ +hdoc(version="2.0", lang="en"); + +p "line\rline" diff --git a/test/conformance/reject/time_relative_fmt.diag b/test/conformance/reject/time_relative_fmt.diag new file mode 100644 index 0000000..e69de29 diff --git a/test/conformance/reject/time_relative_fmt.hdoc b/test/conformance/reject/time_relative_fmt.hdoc new file mode 100644 index 0000000..767ed26 --- /dev/null +++ b/test/conformance/reject/time_relative_fmt.hdoc @@ -0,0 +1,3 @@ +hdoc(version="2.0", lang="en", tz="+00:00"); + +p { \time(fmt="relative") "12:00:00Z" } From 67d4426f3fdaca524e5d09e7638ef169bbcc5e1e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Felix=20Quei=C3=9Fner?= Date: Sat, 3 Jan 2026 22:48:26 +0100 Subject: [PATCH 2/3] Clarify inline group brace text --- docs/TODO.md | 21 +-------------------- docs/specification.md | 23 +++++++++++++++++++++++ 2 files changed, 24 insertions(+), 20 deletions(-) diff --git a/docs/TODO.md b/docs/TODO.md index 556c7dc..c9001b1 100644 --- a/docs/TODO.md +++ b/docs/TODO.md @@ -5,30 +5,11 @@ - Assign semantics to node types, paragraph kinds, ... - Specify "syntax" proper - Add links to RFCs where possible -- Verbatim-body to text conversion is under-specified. You define verbatim syntax (: with | lines) and later say verbatim bodies become inline text spans (§8.2), but you don’t precisely define how piped lines join (LF vs preserving original CRLF, whether there is a trailing newline, whether a final EOF line_terminator contributes a newline, etc.). Different implementations may diverge. -- Inline “groups” exist syntactically but are not given explicit semantics. The grammar includes inline_group ::= "{" , inline_content , "}" and §5.4 makes brace balancing a core rule, but §8.2 doesn’t explicitly state that groups are semantically transparent (flattened) versus affecting whitespace normalization boundaries or span merging. - Span attribute semantics are referenced but not fully defined. §8.2 introduces spans with an “attribute set (e.g. emphasis/monospace/link…)” but the spec never fully defines the canonical attribute keys, nesting behavior (e.g., \em inside \mono), or how lang overrides interact at span level. That’s a major interoperability risk because renderers may differ even if parsers agree. - Refine that `hdoc(title)` is metadata while `title{}` is rendered rich text -- Refine `img(path)` only using forward slash. - - Proposal: Add to §9.3.5: - - "path MUST use forward slashes (/) as path separators, regardless of host OS." - - "path MUST be relative; absolute paths and URI schemes (e.g., http://) MUST be rejected." - - "Path resolution is relative to the directory containing the HyperDoc source file." - - "Path traversal outside the source directory (e.g., ../../etc/passwd) SHOULD be rejected or restricted by implementations." -- Proposal: Add to §9.2.4: - - "Multiple toc elements MAY appear in a document; each MUST render the same heading structure but MAY appear at different locations." - - "If depth differs between instances, each TOC renders independently according to its own depth attribute." -- Add to §9.2.5: - - "Multiple footnotes elements partition footnote rendering; each instance collects only footnotes/citations accumulated since the previous dump (or document start)." -- Proposal: Add to §4: - - "Implementations MUST support nesting depths of at least 32 levels." - - "Implementations MAY reject documents exceeding this depth with a diagnostic." - - "Nesting depth is measured as the maximum distance from the document root to any leaf node." - Ambiguity of Inline Unicode: - Finding: String literals ("...") support \u{...} escapes (§7.2.1). Inline text streams (bodies of p, h1) do not (§6.1 only lists \\, \{, \}). - Issue: Authors cannot enter invisible characters (like Non-Breaking Space U+00A0 or Zero Width Space U+200B) into a paragraph without pasting the raw invisible character, which is brittle and invisible in editors. -- Recommendation: Add explicit sequencing in §7 stating: "Escape decoding MUST occur during semantic validation, before inline text construction (§8.2) for inline-list bodies, and before attribute validation for attribute values." -- Recommendation: Add to §9.2.1: "If the document contains any \date, \time, or \datetime elements with fmt values other than iso, and hdoc(lang) is not specified, implementations SHOULD emit a diagnostic." - Issue: "Lexical" implies only regex-level matching. It does not strictly forbid 2023-02-31. For a strict format, "Semantic" validity (Gregorian correctness) should be enforced to prevent invalid metadata. ## Potential Future Features @@ -120,4 +101,4 @@ quote { - `include(path="...")` is rejected for unbounded document content growth - `code` is just `\mono(syntax="…")` - `details/summary` is just HTML with dynamic changing page layout, ever tried printing this? -- `\math`, `equation{…}` have too high implementation complexity and have high requirements on fonts, font renderers and layout engines. \ No newline at end of file +- `\math`, `equation{…}` have too high implementation complexity and have high requirements on fonts, font renderers and layout engines. diff --git a/docs/specification.md b/docs/specification.md index fcee7cd..7ebfe42 100644 --- a/docs/specification.md +++ b/docs/specification.md @@ -206,6 +206,12 @@ The grammar is intentionally ambiguous; a deterministic external rule selects a - Attribute values are **string literals** (see §5.5). - Attribute keys are identifiers with hyphen-separated segments (see §5.1 and §10.1). +### 4.4 Nesting depth (syntax) + +- Implementations **MUST** support nesting depths of at least 32 levels. +- Implementations **MAY** reject documents that exceed this depth with a diagnostic. +- Nesting depth is measured as the maximum distance from the document root to any leaf node. + ## 5. Grammar and additional syntax rules ### 5.1 Grammar (EBNF) @@ -351,6 +357,8 @@ Tooling that aims to preserve author intent **SHOULD** preserve whether braces w Escape sequences are recognized only in string literals (node bodies of the `"..."` form and attribute values). No other syntax performs string-literal escape decoding. +Escape decoding **MUST** occur during semantic validation, before inline text construction (§8.2) for inline-list bodies, and before attribute validation for attribute values. + ### 7.1 Control character policy (semantic) - A semantic validator **MAY** reject TAB (U+0009) in source text. @@ -432,6 +440,8 @@ Semantic processing **MUST** construct inline text as a sequence of **spans**, w - a Unicode string, and - an attribute set (e.g. emphasis/monospace/link, language overrides, etc.). +Inline groups are structural only: when converting the inline tree into spans, implementations **MUST** flatten `inline_group` boundaries. An `inline_group` **MUST NOT** create a span boundary and **MUST NOT** affect whitespace normalization, but it **MUST** contribute the literal `{` and `}` characters to the inline text at its start and end. + Processing rules: 1. **Parse → tree:** Parsing preserves `ws` and yields an inline tree (text items, inline nodes, and inline groups). @@ -586,6 +596,9 @@ The elements in this chapter **MUST** appear only as top-level block elements (d - `date` (optional): datetime lexical format (§10.2.3) - `tz` (optional): default timezone for time/datetime values (§10.2) +Diagnostics: +- If the document contains any `\date`, `\time`, or `\datetime` elements with `fmt` values other than `iso`, and `hdoc(lang)` is not specified, implementations **SHOULD** emit a diagnostic. + #### 9.2.2 `title` (document title) - **Role:** document-level display title @@ -624,6 +637,8 @@ Heading structure and numbering: Semantic constraints: - `toc` **MUST** be a top-level block element (a direct child of the document). +- Multiple `toc` elements **MAY** appear in a document; each **MUST** render the same heading structure but **MAY** appear at different locations. +- If `depth` differs between instances, each `toc` **MUST** render independently according to its own `depth` attribute. #### 9.2.5 Footnote dump: `footnotes` @@ -635,6 +650,7 @@ Semantic constraints: Semantics: +- Multiple `footnotes` elements **MAY** appear in a document. - `footnotes;` collects and renders all footnotes of all kinds accumulated since the previous `footnotes(...)` node (or since start of document if none appeared yet). - `footnotes(kind="footnote");` collects and renders only `kind="footnote"` entries accumulated since the previous `footnotes(...)` node. - `footnotes(kind="citation");` collects and renders only `kind="citation"` entries accumulated since the previous `footnotes(...)` node. @@ -686,6 +702,13 @@ Only an empty body (`;`) is not "inline text". - `lang` (optional) - `id` (optional; top-level only) +Path semantics: + +- `path` **MUST** use forward slashes (`/`) as path separators, regardless of host operating system. +- `path` **MUST** be relative; absolute paths and URI schemes **MUST** be rejected. +- Path resolution is relative to the directory containing the HyperDoc source file. +- Path traversal outside the source directory (e.g., `../../etc/passwd`) **SHOULD** be rejected or restricted by implementations. + #### 9.3.6 Preformatted: `pre` - **Body:** inline text From 3a9ece8ac671d90bf94da88d956c45f9e81c2d55 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Felix=20Quei=C3=9Fner?= Date: Sun, 4 Jan 2026 00:52:15 +0100 Subject: [PATCH 3/3] Remove llvm requirement for hyperdoc exe --- AGENTS.md | 4 +- SPEC_TODO.md | 40 --- src/hyperdoc.zig | 335 ++++++++++++++---- src/render/html5.zig | 2 +- src/testsuite.zig | 18 +- test/conformance/accept/inline_escape.yaml | 2 +- .../reject/container_children.diag | 4 + test/conformance/reject/heading_sequence.diag | 9 +- .../reject/inline_identifier_dash.diag | 2 + test/conformance/reject/nested_top_level.diag | 2 + test/conformance/reject/ref_in_heading.diag | 2 + test/conformance/reject/string_cr_escape.diag | 2 + .../conformance/reject/time_relative_fmt.diag | 2 + 13 files changed, 304 insertions(+), 120 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 5575ad0..f10cdca 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -14,6 +14,8 @@ - Treat `docs/specification.md` as the authoritative source of behavior; examples may be outdated or incorrect. - If the spec is unclear or conflicts with code/tests, ask before changing behavior. - Do not implement "just make it work" fallbacks that alter semantics to satisfy examples. +- Diagnostics must not store dynamic strings (e.g., slices to parsed source). Keep diagnostic payloads POD/small and avoid holding arena-backed text. +- Do not hide crashes by removing safety checks or switching off DebugAllocator; fix the root cause instead. A signal 6 from DebugAllocator indicates memory corruption or a similar misuse. ## Zig Programming Style @@ -25,4 +27,4 @@ - If you add a `hdoc` file to `test/snapshot`, also: - Generate the corresponding html and yaml file - Add the file inside build.zig to the snapshot_files global -- If you change behaviour, the snapshot tests will fail. Validate the failure against your expectations and see if you broke something unexpected. \ No newline at end of file +- If you change behaviour, the snapshot tests will fail. Validate the failure against your expectations and see if you broke something unexpected. diff --git a/SPEC_TODO.md b/SPEC_TODO.md index 77efb91..1e879a9 100644 --- a/SPEC_TODO.md +++ b/SPEC_TODO.md @@ -1,46 +1,6 @@ # Spec compliance TODOs -- Inline escape tokens remain undecoded in inline text construction. - - Expect: `\\`, `\{`, and `\}` tokens produced in inline bodies decode to literal `\`, `{`, and `}` during semantic processing (§6.1). - - Actual: Inline text spans keep the backslash sequences verbatim, so escapes render incorrectly. - - Proposed: Decode these three escape tokens before span merging while preserving locations. - -- String literal control character policy is incomplete. - - Expect: Resolved string values must reject control characters except LF and CR when immediately followed by LF (§7.1). - - Actual: `\r` escapes decode to lone CR codepoints without diagnostics, so invalid CR characters survive into resolved text. - - Proposed: Reject `\r` unless it participates in a CRLF sequence after escape decoding. - -- Identifier parsing permits extra characters. - - Expect: Node names use identifier characters limited to letters, digits, and `_`, with inline names beginning with `\`; attribute keys are hyphen-separated segments of the same identifier characters (§5.1, §4.3). - - Actual: Identifiers allow `-` and `\` in any position, so node and attribute names outside the grammar are accepted. - - Proposed: Align identifier character checks with the grammar and treat hyphens only as separators for attribute keys. - -- Heading sequencing rules are missing. - - Expect: `h2` must follow an `h1`, and `h3` must follow an `h2` without intervening `h1` (§9.2.3). - - Actual: Heading indices increment without validating the required ordering. - - Proposed: Track the last seen heading levels and emit errors when a heading appears without its required parent level. - - Title/header interplay lacks the required comparison. - Expect: When both `hdoc(title=...)` and `title { ... }` are present, their plaintext forms are compared and a redundancy hint is emitted if they match (§8.1). - Actual: The block title is used and the header title is ignored without any comparison or diagnostics. - Proposed: Compare the plaintext values, warn when redundant, and keep emitting hints when neither title form is present. - -- Top-level-only elements are allowed to nest. - - Expect: `h1`/`h2`/`h3`, `toc`, and `footnotes` may only appear as top-level blocks (§9.2). - - Actual: Nested blocks (e.g., `note { h1 ... }`) accept these nodes, so top-level elements render within other containers. - - Proposed: Reject top-level elements when they appear in nested block lists. - -- Containers do not restrict children to general text blocks. - - Expect: `li`, `td`, and admonition blocks contain general text block elements (with shorthand promotion) and may be empty for admonitions (§9.1.3, §9.3.2, §9.4.5). - - Actual: Block lists in these containers accept any block type (including headings and footnotes) and treat empty lists as errors. - - Proposed: Limit children to the allowed general text blocks and permit empty admonition bodies. - -- `\time` accepts an unsupported `fmt`. - - Expect: `\time(fmt=...)` supports only `iso`, `short`, `long`, and `rough` (§10.3.4). - - Actual: The `fmt` enum includes `relative`, so `fmt="relative"` is accepted. - - Proposed: Remove the unsupported variant and reject unknown `fmt` values. - -- `\ref` is permitted inside headings and titles. - - Expect: `\ref` must not appear inside `h1`/`h2`/`h3` or `title` bodies (§9.5.6). - - Actual: Inline translation allows references in these contexts without diagnostics. - - Proposed: Detect and reject `\ref` nodes while processing heading and title bodies. diff --git a/src/hyperdoc.zig b/src/hyperdoc.zig index 42919c7..1396b77 100644 --- a/src/hyperdoc.zig +++ b/src/hyperdoc.zig @@ -379,7 +379,6 @@ pub const Time = struct { long, short, rough, - relative, iso, }; @@ -584,13 +583,6 @@ pub fn parse( }; while (true) { - errdefer |err| { - std.log.debug("error at examples/demo.hdoc:{f}: {t}", .{ - parser.make_diagnostic_location(parser.offset), - err, - }); - } - const node = parser.accept_node(.top_level) catch |err| switch (err) { error.OutOfMemory => |e| return @as(error{OutOfMemory}!Document, e), // TODO: What the fuck? Bug report! @@ -1028,7 +1020,7 @@ pub const SemanticAnalyzer = struct { else => unreachable, }), .lang = attrs.lang, - .content = try sema.translate_inline(node, .emit_diagnostic, .one_space), + .content = try sema.translate_inline(node, .emit_diagnostic, .one_space, .heading), }; return .{ heading, attrs.id }; @@ -1041,7 +1033,7 @@ pub const SemanticAnalyzer = struct { return .{ .lang = attrs.lang, - .content = try sema.translate_inline(node, .emit_diagnostic, .one_space), + .content = try sema.translate_inline(node, .emit_diagnostic, .one_space, .title), }; } @@ -1053,7 +1045,7 @@ pub const SemanticAnalyzer = struct { const heading: Block.Paragraph = .{ .lang = attrs.lang, - .content = try sema.translate_inline(node, .emit_diagnostic, .one_space), + .content = try sema.translate_inline(node, .emit_diagnostic, .one_space, .normal), }; return .{ heading, attrs.id }; @@ -1076,7 +1068,11 @@ pub const SemanticAnalyzer = struct { else => unreachable, }, .lang = attrs.lang, - .content = try sema.translate_block_list(node, .text_to_p), + .content = try sema.translate_block_list(node, .{ + .upgrade = .text_to_p, + .allow_empty = true, + .general_text_only = true, + }), }; return .{ admonition, attrs.id }; @@ -1161,7 +1157,7 @@ pub const SemanticAnalyzer = struct { .lang = attrs.lang, .alt = alt, .path = path, - .content = try sema.translate_inline(node, .allow_empty, .one_space), + .content = try sema.translate_inline(node, .allow_empty, .one_space, .normal), }; return .{ image, attrs.id }; @@ -1177,7 +1173,7 @@ pub const SemanticAnalyzer = struct { const preformatted: Block.Preformatted = .{ .lang = attrs.lang, .syntax = attrs.syntax, - .content = try sema.translate_inline(node, .emit_diagnostic, .keep_space), + .content = try sema.translate_inline(node, .emit_diagnostic, .keep_space, .normal), }; return .{ preformatted, attrs.id }; @@ -1342,7 +1338,7 @@ pub const SemanticAnalyzer = struct { rows.appendAssumeCapacity(.{ .group = .{ .lang = row_attrs.lang, - .content = try sema.translate_inline(child_node, .emit_diagnostic, .one_space), + .content = try sema.translate_inline(child_node, .emit_diagnostic, .one_space, .normal), }, }); }, @@ -1457,7 +1453,10 @@ pub const SemanticAnalyzer = struct { return .{ .lang = attrs.lang, .colspan = colspan, - .content = try sema.translate_block_list(node, .text_to_p), + .content = try sema.translate_block_list(node, .{ + .upgrade = .text_to_p, + .general_text_only = true, + }), }; } @@ -1473,13 +1472,48 @@ pub const SemanticAnalyzer = struct { return .{ .lang = attrs.lang, - .content = try sema.translate_block_list(node, .text_to_p), + .content = try sema.translate_block_list(node, .{ + .upgrade = .text_to_p, + .general_text_only = true, + }), }; } const BlockTextUpgrade = enum { no_upgrade, text_to_p }; + const BlockListOptions = struct { + upgrade: BlockTextUpgrade, + allow_empty: bool = false, + general_text_only: bool = false, + }; + + fn is_top_level_only_block(node_type: Parser.NodeType) bool { + return switch (node_type) { + .h1, .h2, .h3, .toc, .footnotes => true, + else => false, + }; + } + + fn is_general_text_block(node_type: Parser.NodeType) bool { + return switch (node_type) { + .p, + .note, + .warning, + .danger, + .tip, + .quote, + .spoiler, + .ul, + .ol, + .img, + .pre, + .table, + => true, + + else => false, + }; + } - fn translate_block_list(sema: *SemanticAnalyzer, node: Parser.Node, upgrade: BlockTextUpgrade) error{ Unimplemented, InvalidNodeType, OutOfMemory, BadAttributes }![]Block { + fn translate_block_list(sema: *SemanticAnalyzer, node: Parser.Node, options: BlockListOptions) error{ Unimplemented, InvalidNodeType, OutOfMemory, BadAttributes }![]Block { switch (node.body) { .list => |child_nodes| { var blocks: std.ArrayList(Block) = .empty; @@ -1488,7 +1522,12 @@ pub const SemanticAnalyzer = struct { try blocks.ensureTotalCapacityPrecise(sema.arena, child_nodes.len); for (child_nodes) |child_node| { - if (child_node.type == .toc) { + if (is_top_level_only_block(child_node.type)) { + try sema.emit_diagnostic(.illegal_child_item, child_node.location); + continue; + } + + if (options.general_text_only and !is_general_text_block(child_node.type)) { try sema.emit_diagnostic(.illegal_child_item, child_node.location); continue; } @@ -1500,16 +1539,26 @@ pub const SemanticAnalyzer = struct { blocks.appendAssumeCapacity(block); } + if (blocks.items.len == 0 and !options.allow_empty) { + try sema.emit_diagnostic(.list_body_required, node.location); + } + return try blocks.toOwnedSlice(sema.arena); }, - .empty, .string, .verbatim, .text_span => switch (upgrade) { + .empty, .string, .verbatim, .text_span => switch (options.upgrade) { .no_upgrade => { + if (options.allow_empty and node.body == .empty) + return &.{}; + try sema.emit_diagnostic(.{ .block_list_required = .{ .type = node.type } }, node.location); return &.{}; }, .text_to_p => { - const spans = try sema.translate_inline(node, .emit_diagnostic, .one_space); + if (options.allow_empty and node.body == .empty) + return &.{}; + + const spans = try sema.translate_inline(node, .emit_diagnostic, .one_space, .normal); const blocks = try sema.arena.alloc(Block, 1); blocks[0] = .{ @@ -1526,11 +1575,13 @@ pub const SemanticAnalyzer = struct { } /// Translates a node into a sequence of inline spans. - fn translate_inline(sema: *SemanticAnalyzer, node: Parser.Node, empty_handling: EmptyHandling, whitespace_handling: Whitespace) error{ OutOfMemory, BadAttributes }![]Span { + const InlineContext = enum { normal, heading, title }; + + fn translate_inline(sema: *SemanticAnalyzer, node: Parser.Node, empty_handling: EmptyHandling, whitespace_handling: Whitespace, context: InlineContext) error{ OutOfMemory, BadAttributes }![]Span { var spans: std.ArrayList(Span) = .empty; defer spans.deinit(sema.arena); - try sema.translate_inline_body(&spans, node.body, .{}, empty_handling); + try sema.translate_inline_body(&spans, node.body, .{}, empty_handling, context); return try sema.compact_spans(spans.items, whitespace_handling); } @@ -1723,11 +1774,11 @@ pub const SemanticAnalyzer = struct { return new; } - fn translate_inline_node(sema: *SemanticAnalyzer, spans: *std.ArrayList(Span), node: Parser.Node, attribs: Span.Attributes) !void { + fn translate_inline_node(sema: *SemanticAnalyzer, spans: *std.ArrayList(Span), node: Parser.Node, attribs: Span.Attributes, context: InlineContext) !void { switch (node.type) { .unknown_inline, .text, - => try sema.translate_inline_body(spans, node.body, attribs, .emit_diagnostic), + => try sema.translate_inline_body(spans, node.body, attribs, .emit_diagnostic, context), .@"\\em" => { const props = try sema.get_attributes(node, struct { @@ -1737,7 +1788,7 @@ pub const SemanticAnalyzer = struct { try sema.translate_inline_body(spans, node.body, try sema.derive_attribute(node.location, attribs, .{ .lang = props.lang, .em = true, - }), .emit_diagnostic); + }), .emit_diagnostic, context); }, .@"\\strike" => { @@ -1748,7 +1799,7 @@ pub const SemanticAnalyzer = struct { try sema.translate_inline_body(spans, node.body, try sema.derive_attribute(node.location, attribs, .{ .lang = props.lang, .strike = true, - }), .emit_diagnostic); + }), .emit_diagnostic, context); }, .@"\\sub" => { @@ -1759,7 +1810,7 @@ pub const SemanticAnalyzer = struct { try sema.translate_inline_body(spans, node.body, try sema.derive_attribute(node.location, attribs, .{ .lang = props.lang, .position = .subscript, - }), .emit_diagnostic); + }), .emit_diagnostic, context); }, .@"\\sup" => { @@ -1770,7 +1821,7 @@ pub const SemanticAnalyzer = struct { try sema.translate_inline_body(spans, node.body, try sema.derive_attribute(node.location, attribs, .{ .lang = props.lang, .position = .superscript, - }), .emit_diagnostic); + }), .emit_diagnostic, context); }, .@"\\link" => { @@ -1782,10 +1833,15 @@ pub const SemanticAnalyzer = struct { try sema.translate_inline_body(spans, node.body, try sema.derive_attribute(node.location, attribs, .{ .lang = props.lang, .link = .{ .uri = props.uri }, - }), .emit_diagnostic); + }), .emit_diagnostic, context); }, .@"\\ref" => { + if (context == .heading or context == .title) { + try sema.emit_diagnostic(.{ .inline_not_allowed = .{ .node_type = node.type } }, node.location); + return; + } + const props = try sema.get_attributes(node, struct { lang: LanguageTag = .inherit, ref: Reference, @@ -1812,7 +1868,7 @@ pub const SemanticAnalyzer = struct { .location = node.location, }); }, - else => try sema.translate_inline_body(spans, node.body, link_attribs, .emit_diagnostic), + else => try sema.translate_inline_body(spans, node.body, link_attribs, .emit_diagnostic, context), } }, @@ -1825,7 +1881,7 @@ pub const SemanticAnalyzer = struct { .mono = true, .lang = props.lang, .syntax = props.syntax, - }), .emit_diagnostic); + }), .emit_diagnostic, context); }, .@"\\date", @@ -1852,7 +1908,7 @@ pub const SemanticAnalyzer = struct { break :blk; } - const content_spans = try sema.translate_inline(node, .emit_diagnostic, .one_space); + const content_spans = try sema.translate_inline(node, .emit_diagnostic, .one_space, context); // Convert the content_spans into a "rendered string". const content_text = (sema.render_spans_to_plaintext(content_spans, .reject_date_time) catch |err| switch (err) { @@ -1933,7 +1989,7 @@ pub const SemanticAnalyzer = struct { defer content_spans.deinit(sema.arena); const content_attribs = try sema.derive_attribute(node.location, attribs, .{ .lang = props.lang }); - try sema.translate_inline_body(&content_spans, node.body, content_attribs, .emit_diagnostic); + try sema.translate_inline_body(&content_spans, node.body, content_attribs, .emit_diagnostic, context); const compacted = try sema.compact_spans(content_spans.items, .one_space); if (compacted.len == 0) { @@ -2026,7 +2082,7 @@ pub const SemanticAnalyzer = struct { else if (std.meta.stringToEnum(Format, format_str)) |format| format else blk: { - try sema.emit_diagnostic(.{ .invalid_date_time_fmt = .{ .fmt = format_str } }, get_attribute_location(node, "fmt", .value) orelse node.location); + try sema.emit_diagnostic(.invalid_date_time_fmt, get_attribute_location(node, "fmt", .value) orelse node.location); break :blk .default; }; @@ -2195,7 +2251,14 @@ pub const SemanticAnalyzer = struct { allow_empty, emit_diagnostic, }; - fn translate_inline_body(sema: *SemanticAnalyzer, spans: *std.ArrayList(Span), body: Parser.Node.Body, attribs: Span.Attributes, empty_handling: EmptyHandling) error{ OutOfMemory, BadAttributes }!void { + fn translate_inline_body( + sema: *SemanticAnalyzer, + spans: *std.ArrayList(Span), + body: Parser.Node.Body, + attribs: Span.Attributes, + empty_handling: EmptyHandling, + context: InlineContext, + ) error{ OutOfMemory, BadAttributes }!void { switch (body) { .empty => |location| switch (empty_handling) { .allow_empty => {}, @@ -2255,13 +2318,22 @@ pub const SemanticAnalyzer = struct { .list => |list| { for (list) |child_node| { - try sema.translate_inline_node(spans, child_node, attribs); + try sema.translate_inline_node(spans, child_node, attribs, context); } }, .text_span => |text_span| { + const decoded_text = if (text_span.text.len == 2 and text_span.text[0] == '\\') blk: { + switch (text_span.text[1]) { + '{' => break :blk "{", + '}' => break :blk "}", + '\\' => break :blk "\\", + else => break :blk text_span.text, + } + } else text_span.text; + try spans.append(sema.arena, .{ - .content = .{ .text = text_span.text }, + .content = .{ .text = decoded_text }, .attribs = attribs, .location = text_span.location, }); @@ -2683,6 +2755,15 @@ pub const SemanticAnalyzer = struct { fn compute_next_heading(sema: *SemanticAnalyzer, node: Parser.Node, level: Block.Heading.Level) !Block.Heading.Index { const index = @intFromEnum(level); + const missing_parent: ?Block.Heading.Level = switch (level) { + .h1 => null, + .h2 => if (sema.heading_counters[0] == 0) .h1 else null, + .h3 => if (sema.heading_counters[1] == 0) .h2 else null, + }; + if (missing_parent) |missing| { + try sema.emit_diagnostic(.{ .invalid_heading_sequence = .{ .level = level, .missing = missing } }, node.location); + } + sema.heading_counters[index] += 1; if (index > sema.current_heading_level + 1) { @@ -2694,7 +2775,6 @@ pub const SemanticAnalyzer = struct { for (sema.heading_counters[index + 1 ..]) |*val| { val.* = 0; } - _ = node; return switch (level) { .h1 => .{ .h1 = sema.heading_counters[0..1].* }, @@ -2882,6 +2962,19 @@ pub const SemanticAnalyzer = struct { var output = output_buffer.toOwnedSlice(); errdefer output.deinit(sema.arena); + const chars = output.items(.char); + for (chars, 0..) |ch, idx| { + if (ch == std.ascii.control_code.cr) { + const next_is_lf = idx + 1 < chars.len and chars[idx + 1] == std.ascii.control_code.lf; + if (!next_is_lf) { + try sema.emit_diagnostic( + .{ .illegal_character = .{ .codepoint = std.ascii.control_code.cr } }, + output.get(idx).location, + ); + } + } + } + const view = std.unicode.Utf8View.init(output.items(.char)) catch { std.log.err("invalid utf-8 input: \"{f}\"", .{std.zig.fmtString(output.items(.char))}); @panic("String unescape produced invalid UTF-8 sequence. This should not be possible."); @@ -2953,7 +3046,7 @@ pub const Parser = struct { return error.EndOfFile; } - const type_ident = parser.accept_identifier() catch |err| switch (err) { + const type_ident = parser.accept_identifier(.node) catch |err| switch (err) { error.UnexpectedEndOfFile => |e| switch (scope_type) { .nested => return e, .top_level => return error.EndOfFile, @@ -2978,7 +3071,7 @@ pub const Parser = struct { while (true) { if (parser.try_accept_char(')')) break; - const attr_name = try parser.accept_identifier(); + const attr_name = try parser.accept_identifier(.attribute); _ = try parser.accept_char('='); const attr_value = try parser.accept_string(); @@ -3333,7 +3426,56 @@ pub const Parser = struct { return error.UnterminatedStringLiteral; } - pub fn accept_identifier(parser: *Parser) error{ UnexpectedEndOfFile, InvalidCharacter }!Token { + pub const IdentifierKind = enum { + node, + attribute, + }; + + fn is_identifier_char(c: u8) bool { + return switch (c) { + 'a'...'z', + 'A'...'Z', + '0'...'9', + '_', + => true, + else => false, + }; + } + + fn is_node_identifier_terminator(c: u8) bool { + return switch (c) { + ' ', + '\t', + '\n', + '\r', + '(', + ')', + '{', + '}', + ';', + ':', + '"', + ',', + => true, + else => false, + }; + } + + fn is_attribute_identifier_terminator(c: u8) bool { + return switch (c) { + ' ', + '\t', + '\n', + '\r', + ')', + '=', + ',', + => true, + else => false, + }; + } + + pub fn accept_identifier(parser: *Parser, kind: IdentifierKind) error{ UnexpectedEndOfFile, InvalidCharacter }!Token { parser.skip_whitespace(); if (parser.at_end()) { @@ -3342,17 +3484,76 @@ pub const Parser = struct { } const start = parser.offset; - const first = parser.code[start]; - if (!is_ident_char(first)) { - emitDiagnostic(parser, .{ .invalid_identifier_start = .{ .char = first } }, parser.make_diagnostic_location(start)); - return error.InvalidCharacter; - } + switch (kind) { + .node => { + const first = parser.code[start]; + if (first == '\\') { + parser.offset += 1; + if (parser.offset >= parser.code.len or !is_identifier_char(parser.code[parser.offset])) { + emitDiagnostic(parser, .{ .invalid_identifier_start = .{ .char = first } }, parser.make_diagnostic_location(start)); + return error.InvalidCharacter; + } + } else if (!is_identifier_char(first)) { + emitDiagnostic(parser, .{ .invalid_identifier_start = .{ .char = first } }, parser.make_diagnostic_location(start)); + return error.InvalidCharacter; + } else { + parser.offset += 1; + } - while (parser.offset < parser.code.len) { - const c = parser.code[parser.offset]; - if (!is_ident_char(c)) - break; - parser.offset += 1; + while (parser.offset < parser.code.len) { + const c = parser.code[parser.offset]; + if (is_identifier_char(c)) { + parser.offset += 1; + continue; + } + + if (is_node_identifier_terminator(c)) + break; + + emitDiagnostic(parser, .{ .invalid_identifier_character = .{ .char = c } }, parser.make_diagnostic_location(parser.offset)); + return error.InvalidCharacter; + } + }, + .attribute => { + const first = parser.code[start]; + if (!is_identifier_char(first)) { + emitDiagnostic(parser, .{ .invalid_identifier_start = .{ .char = first } }, parser.make_diagnostic_location(start)); + return error.InvalidCharacter; + } + + parser.offset += 1; + var prev_was_hyphen = false; + + while (parser.offset < parser.code.len) { + const c = parser.code[parser.offset]; + if (is_identifier_char(c)) { + prev_was_hyphen = false; + parser.offset += 1; + continue; + } + + if (c == '-') { + if (prev_was_hyphen) { + emitDiagnostic(parser, .{ .invalid_identifier_character = .{ .char = c } }, parser.make_diagnostic_location(parser.offset)); + return error.InvalidCharacter; + } + prev_was_hyphen = true; + parser.offset += 1; + continue; + } + + if (is_attribute_identifier_terminator(c)) + break; + + emitDiagnostic(parser, .{ .invalid_identifier_character = .{ .char = c } }, parser.make_diagnostic_location(parser.offset)); + return error.InvalidCharacter; + } + + if (prev_was_hyphen) { + emitDiagnostic(parser, .{ .invalid_identifier_character = .{ .char = '-' } }, parser.make_diagnostic_location(parser.offset - 1)); + return error.InvalidCharacter; + } + }, } return parser.slice(start, parser.offset); @@ -3434,19 +3635,6 @@ pub const Parser = struct { }; } - pub fn is_ident_char(c: u8) bool { - return switch (c) { - 'a'...'z', - 'A'...'Z', - '0'...'9', - '_', - '-', - '\\', - => true, - else => false, - }; - } - pub const Token = struct { text: []const u8, location: Location, @@ -3639,6 +3827,7 @@ pub const Diagnostic = struct { pub const UnexpectedEof = struct { context: []const u8, expected_char: ?u8 = null }; pub const UnexpectedCharacter = struct { expected: u8, found: u8 }; pub const InvalidIdentifierStart = struct { char: u8 }; + pub const InvalidIdentifierCharacter = struct { char: u8 }; pub const DuplicateAttribute = struct { name: []const u8 }; pub const NodeAttributeError = struct { type: Parser.NodeType, name: []const u8 }; pub const NodeBodyError = struct { type: Parser.NodeType }; @@ -3647,12 +3836,13 @@ pub const Diagnostic = struct { pub const InvalidBlockError = struct { name: []const u8 }; pub const InlineUsageError = struct { attribute: InlineAttribute }; pub const InlineCombinationError = struct { first: InlineAttribute, second: InlineAttribute }; - pub const DateTimeFormatError = struct { fmt: []const u8 }; pub const InvalidStringEscape = struct { codepoint: u21 }; pub const ForbiddenControlCharacter = struct { codepoint: u21 }; pub const TableShapeError = struct { actual: usize, expected: usize }; pub const ReferenceError = struct { ref: []const u8 }; pub const AutomaticHeading = struct { level: Block.Heading.Level }; + pub const HeadingSequenceError = struct { level: Block.Heading.Level, missing: Block.Heading.Level }; + pub const InlineContextError = struct { node_type: Parser.NodeType }; pub const Code = union(enum) { // errors: @@ -3661,6 +3851,7 @@ pub const Diagnostic = struct { unexpected_character: UnexpectedCharacter, unterminated_string, invalid_identifier_start: InvalidIdentifierStart, + invalid_identifier_character: InvalidIdentifierCharacter, unterminated_block_list, missing_hdoc_header: MissingHdocHeader, duplicate_hdoc_header: DuplicateHdocHeader, @@ -3673,10 +3864,11 @@ pub const Diagnostic = struct { invalid_block_type: InvalidBlockError, block_list_required: NodeBodyError, invalid_inline_combination: InlineCombinationError, + inline_not_allowed: InlineContextError, link_not_nestable, invalid_date_time, invalid_date_time_body, - invalid_date_time_fmt: DateTimeFormatError, + invalid_date_time_fmt, missing_timezone, invalid_unicode_string_escape, invalid_string_escape: InvalidStringEscape, @@ -3700,6 +3892,7 @@ pub const Diagnostic = struct { footnote_missing_ref, footnote_missing_body, footnote_kind_on_reference, + invalid_heading_sequence: HeadingSequenceError, // warnings: document_starts_with_bom, @@ -3725,6 +3918,7 @@ pub const Diagnostic = struct { .unexpected_character, .unterminated_string, .invalid_identifier_start, + .invalid_identifier_character, .unterminated_block_list, .missing_hdoc_header, .duplicate_hdoc_header, @@ -3737,6 +3931,7 @@ pub const Diagnostic = struct { .invalid_block_type, .block_list_required, .invalid_inline_combination, + .inline_not_allowed, .link_not_nestable, .invalid_date_time, .invalid_date_time_fmt, @@ -3764,6 +3959,7 @@ pub const Diagnostic = struct { .footnote_missing_ref, .footnote_missing_body, .footnote_kind_on_reference, + .invalid_heading_sequence, => .@"error", .missing_document_language, @@ -3800,6 +3996,7 @@ pub const Diagnostic = struct { .unexpected_character => |ctx| try w.print("Expected '{c}' but found '{c}'.", .{ ctx.expected, ctx.found }), .unterminated_string => try w.writeAll("Unterminated string literal (missing closing \")."), .invalid_identifier_start => |ctx| try w.print("Invalid identifier start character: '{c}'.", .{ctx.char}), + .invalid_identifier_character => |ctx| try w.print("Invalid identifier character: '{c}'.", .{ctx.char}), .unterminated_block_list => try w.writeAll("Block list body is unterminated (missing '}' before end of file)."), .missing_hdoc_header => try w.writeAll("Document must start with an 'hdoc' header."), .duplicate_hdoc_header => try w.writeAll("Only one 'hdoc' header is allowed; additional header found."), @@ -3823,6 +4020,7 @@ pub const Diagnostic = struct { .redundant_inline => |ctx| try w.print("The inline \\{t} has no effect.", .{ctx.attribute}), .invalid_inline_combination => |ctx| try w.print("Cannot combine \\{t} with \\{t}.", .{ ctx.first, ctx.second }), + .inline_not_allowed => |ctx| try w.print("\\{t} is not allowed in this context.", .{ctx.node_type}), .link_not_nestable => try w.writeAll("Links are not nestable"), .attribute_leading_trailing_whitespace => try w.writeAll("Attribute value has invalid leading or trailing whitespace."), @@ -3831,7 +4029,7 @@ pub const Diagnostic = struct { .missing_timezone => try w.writeAll("Missing timezone offset; add a 'tz' header attribute or include a timezone in the value."), - .invalid_date_time_fmt => |ctx| try w.print("Invalid 'fmt' value '{s}' for date/time.", .{ctx.fmt}), + .invalid_date_time_fmt => try w.writeAll("Invalid 'fmt' value for date/time."), .invalid_string_escape => |ctx| if (ctx.codepoint > 0x20 and ctx.codepoint <= 0x7F) try w.print("\\{u} is not a valid escape sequence.", .{ctx.codepoint}) @@ -3866,6 +4064,7 @@ pub const Diagnostic = struct { .footnote_missing_ref => try w.writeAll("\\footnote without a body requires a ref=\"...\" attribute."), .footnote_missing_body => try w.writeAll("\\footnote definitions require a non-empty body."), .footnote_kind_on_reference => try w.writeAll("Attribute 'kind' is only valid on defining \\footnote entries."), + .invalid_heading_sequence => |ctx| try w.print("{t} requires a preceding {t}.", .{ ctx.level, ctx.missing }), .missing_document_language => try w.writeAll("Document language is missing; set lang on the hdoc header."), .tab_character => try w.writeAll("Tab character is not allowed; use spaces instead."), diff --git a/src/render/html5.zig b/src/render/html5.zig index 5aa9b97..a7acf35 100644 --- a/src/render/html5.zig +++ b/src/render/html5.zig @@ -973,7 +973,7 @@ fn formatTimeValue(value: hdoc.FormattedDateTime(hdoc.Time), buffer: []u8) Rende switch (value.format) { .short, .rough => try writer.print("{d:0>2}:{d:0>2}", .{ value.value.hour, value.value.minute }), - .long, .relative => { + .long => { try writer.print("{d:0>2}:{d:0>2}:{d:0>2}", .{ value.value.hour, value.value.minute, value.value.second }); if (value.value.microsecond > 0) { try writer.print(".{d:0>6}", .{value.value.microsecond}); diff --git a/src/testsuite.zig b/src/testsuite.zig index e2003c8..7d98959 100644 --- a/src/testsuite.zig +++ b/src/testsuite.zig @@ -60,7 +60,7 @@ test "parser accept identifier and word tokens" { .diagnostics = null, }; - const ident = try parser.accept_identifier(); + const ident = try parser.accept_identifier(.node); try std.testing.expectEqualStrings("h1", ident.text); try std.testing.expectEqual(@as(usize, 0), ident.location.offset); try std.testing.expectEqual(@as(usize, 2), ident.location.length); @@ -82,7 +82,7 @@ test "parser rejects identifiers with invalid start characters" { .diagnostics = null, }; - try std.testing.expectError(error.InvalidCharacter, parser.accept_identifier()); + try std.testing.expectError(error.InvalidCharacter, parser.accept_identifier(.node)); } test "parser accept string literals and unescape" { @@ -563,10 +563,16 @@ test "table of contents inserts automatic headings when skipping levels" { var doc = try hdoc.parse(std.testing.allocator, source, &diagnostics); defer doc.deinit(); - try std.testing.expectEqual(@as(usize, 3), diagnostics.items.items.len); + try std.testing.expectEqual(@as(usize, 5), diagnostics.items.items.len); try std.testing.expect(diagnosticCodesEqual(diagnostics.items.items[0].code, .missing_document_language)); - try std.testing.expect(diagnosticCodesEqual(diagnostics.items.items[1].code, .{ .automatic_heading_insertion = .{ .level = .h1 } })); - try std.testing.expect(diagnosticCodesEqual(diagnostics.items.items[2].code, .{ .automatic_heading_insertion = .{ .level = .h2 } })); + try std.testing.expect(diagnosticCodesEqual(diagnostics.items.items[1].code, .{ + .invalid_heading_sequence = .{ .level = .h3, .missing = .h2 }, + })); + try std.testing.expect(diagnosticCodesEqual(diagnostics.items.items[2].code, .{ + .invalid_heading_sequence = .{ .level = .h2, .missing = .h1 }, + })); + try std.testing.expect(diagnosticCodesEqual(diagnostics.items.items[3].code, .{ .automatic_heading_insertion = .{ .level = .h1 } })); + try std.testing.expect(diagnosticCodesEqual(diagnostics.items.items[4].code, .{ .automatic_heading_insertion = .{ .level = .h2 } })); const toc = doc.toc; try std.testing.expectEqual(.h1, toc.level); @@ -879,7 +885,7 @@ test "diagnostic codes are emitted for expected samples" { try validateDiagnostics(.{}, "hdoc(version=\"2.0\",lang=\"en\"); hdoc(version=\"2.0\",lang=\"en\");", &.{ .misplaced_hdoc_header, .duplicate_hdoc_header }); try validateDiagnostics(.{}, "hdoc(version=\"2.0\",lang=\"en\"); h1 \"bad\\q\"", &.{.{ .invalid_string_escape = .{ .codepoint = 'q' } }}); try validateDiagnostics(.{}, "hdoc(version=\"2.0\",lang=\"en\"); h1 \"bad\\u{9}\"", &.{.{ .illegal_character = .{ .codepoint = 0x9 } }}); - try validateDiagnostics(.{}, "hdoc(version=\"2.0\",lang=\"en\"); ul{ li{ toc; } }", &.{.illegal_child_item}); + try validateDiagnostics(.{}, "hdoc(version=\"2.0\",lang=\"en\"); ul{ li{ toc; } }", &.{ .illegal_child_item, .list_body_required }); } test "table derives column count from first data row" { diff --git a/test/conformance/accept/inline_escape.yaml b/test/conformance/accept/inline_escape.yaml index c222dd3..4f58ab7 100644 --- a/test/conformance/accept/inline_escape.yaml +++ b/test/conformance/accept/inline_escape.yaml @@ -14,6 +14,6 @@ document: - paragraph: lang: "" content: - - [] "backslash \\\\ brace-open \\{ brace-close \\}" + - [] "backslash \\ brace-open { brace-close }" ids: - null diff --git a/test/conformance/reject/container_children.diag b/test/conformance/reject/container_children.diag index e69de29..d6354d0 100644 --- a/test/conformance/reject/container_children.diag +++ b/test/conformance/reject/container_children.diag @@ -0,0 +1,4 @@ +/workspace/hyperdoc/test/conformance/reject/container_children.hdoc:5:5: Node not allowed here. +/workspace/hyperdoc/test/conformance/reject/container_children.hdoc:4:3: Node requires list body. +/workspace/hyperdoc/test/conformance/reject/container_children.hdoc:10:3: Node not allowed here. +error: failed to parse "/workspace/hyperdoc/test/conformance/reject/container_children.hdoc": InvalidFile diff --git a/test/conformance/reject/heading_sequence.diag b/test/conformance/reject/heading_sequence.diag index 31568cd..ecae9b9 100644 --- a/test/conformance/reject/heading_sequence.diag +++ b/test/conformance/reject/heading_sequence.diag @@ -1,3 +1,6 @@ -test/conformance/reject/heading_sequence.hdoc:3:1: Inserted automatic h1 to fill heading level gap. -test/conformance/reject/heading_sequence.hdoc:3:1: Inserted automatic h2 to fill heading level gap. -test/conformance/reject/heading_sequence.hdoc:5:1: Inserted automatic h2 to fill heading level gap. +/workspace/hyperdoc/test/conformance/reject/heading_sequence.hdoc:3:1: h3 requires a preceding h2. +/workspace/hyperdoc/test/conformance/reject/heading_sequence.hdoc:5:1: h3 requires a preceding h2. +/workspace/hyperdoc/test/conformance/reject/heading_sequence.hdoc:3:1: Inserted automatic h1 to fill heading level gap. +/workspace/hyperdoc/test/conformance/reject/heading_sequence.hdoc:3:1: Inserted automatic h2 to fill heading level gap. +/workspace/hyperdoc/test/conformance/reject/heading_sequence.hdoc:5:1: Inserted automatic h2 to fill heading level gap. +error: failed to parse "/workspace/hyperdoc/test/conformance/reject/heading_sequence.hdoc": InvalidFile diff --git a/test/conformance/reject/inline_identifier_dash.diag b/test/conformance/reject/inline_identifier_dash.diag index e69de29..0528512 100644 --- a/test/conformance/reject/inline_identifier_dash.diag +++ b/test/conformance/reject/inline_identifier_dash.diag @@ -0,0 +1,2 @@ +/workspace/hyperdoc/test/conformance/reject/inline_identifier_dash.hdoc:3:9: Invalid identifier character: '-'. +error: failed to parse "/workspace/hyperdoc/test/conformance/reject/inline_identifier_dash.hdoc": SyntaxError diff --git a/test/conformance/reject/nested_top_level.diag b/test/conformance/reject/nested_top_level.diag index e69de29..064fdbe 100644 --- a/test/conformance/reject/nested_top_level.diag +++ b/test/conformance/reject/nested_top_level.diag @@ -0,0 +1,2 @@ +/workspace/hyperdoc/test/conformance/reject/nested_top_level.hdoc:4:3: Node not allowed here. +error: failed to parse "/workspace/hyperdoc/test/conformance/reject/nested_top_level.hdoc": InvalidFile diff --git a/test/conformance/reject/ref_in_heading.diag b/test/conformance/reject/ref_in_heading.diag index e69de29..60d0cd0 100644 --- a/test/conformance/reject/ref_in_heading.diag +++ b/test/conformance/reject/ref_in_heading.diag @@ -0,0 +1,2 @@ +/workspace/hyperdoc/test/conformance/reject/ref_in_heading.hdoc:5:14: \\ref is not allowed in this context. +error: failed to parse "/workspace/hyperdoc/test/conformance/reject/ref_in_heading.hdoc": InvalidFile diff --git a/test/conformance/reject/string_cr_escape.diag b/test/conformance/reject/string_cr_escape.diag index e69de29..f85f8c7 100644 --- a/test/conformance/reject/string_cr_escape.diag +++ b/test/conformance/reject/string_cr_escape.diag @@ -0,0 +1,2 @@ +/workspace/hyperdoc/test/conformance/reject/string_cr_escape.hdoc:3:8: Forbidden control character U+000D. +error: failed to parse "/workspace/hyperdoc/test/conformance/reject/string_cr_escape.hdoc": InvalidFile diff --git a/test/conformance/reject/time_relative_fmt.diag b/test/conformance/reject/time_relative_fmt.diag index e69de29..5cbffa2 100644 --- a/test/conformance/reject/time_relative_fmt.diag +++ b/test/conformance/reject/time_relative_fmt.diag @@ -0,0 +1,2 @@ +/workspace/hyperdoc/test/conformance/reject/time_relative_fmt.hdoc:3:15: Invalid 'fmt' value for date/time. +error: failed to parse "/workspace/hyperdoc/test/conformance/reject/time_relative_fmt.hdoc": InvalidFile