Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
- Treat `docs/specification.md` as the authoritative source of behavior; examples may be outdated or incorrect.
- If the spec is unclear or conflicts with code/tests, ask before changing behavior.
- Do not implement "just make it work" fallbacks that alter semantics to satisfy examples.
- Diagnostics must not store dynamic strings (e.g., slices to parsed source). Keep diagnostic payloads POD/small and avoid holding arena-backed text.
- Do not hide crashes by removing safety checks or switching off DebugAllocator; fix the root cause instead. A signal 6 from DebugAllocator indicates memory corruption or a similar misuse.

## Zig Programming Style

Expand All @@ -25,4 +27,4 @@
- If you add a `hdoc` file to `test/snapshot`, also:
- Generate the corresponding html and yaml file
- Add the file inside build.zig to the snapshot_files global
- If you change behaviour, the snapshot tests will fail. Validate the failure against your expectations and see if you broke something unexpected.
- If you change behaviour, the snapshot tests will fail. Validate the failure against your expectations and see if you broke something unexpected.
4 changes: 4 additions & 0 deletions SPEC_TODO.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,6 @@
# Spec compliance TODOs

- Title/header interplay lacks the required comparison.
- Expect: When both `hdoc(title=...)` and `title { ... }` are present, their plaintext forms are compared and a redundancy hint is emitted if they match (§8.1).
- Actual: The block title is used and the header title is ignored without any comparison or diagnostics.
- Proposed: Compare the plaintext values, warn when redundant, and keep emitting hints when neither title form is present.
49 changes: 49 additions & 0 deletions build.zig
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,21 @@ const snapshot_files: []const []const u8 = &.{
"test/snapshot/footnotes.hdoc",
};

const conformance_accept_files: []const []const u8 = &.{
"test/conformance/accept/inline_escape.hdoc",
"test/conformance/accept/title_header_redundant.hdoc",
};

const conformance_reject_files: []const []const u8 = &.{
"test/conformance/reject/string_cr_escape.hdoc",
"test/conformance/reject/inline_identifier_dash.hdoc",
"test/conformance/reject/heading_sequence.hdoc",
"test/conformance/reject/nested_top_level.hdoc",
"test/conformance/reject/container_children.hdoc",
"test/conformance/reject/time_relative_fmt.hdoc",
"test/conformance/reject/ref_in_heading.hdoc",
};

pub fn build(b: *std.Build) void {
// Options:
const target = b.standardTargetOptions(.{});
Expand Down Expand Up @@ -74,6 +89,40 @@ pub fn build(b: *std.Build) void {
}
}

// Conformance snapshots: accept cases (YAML only):
for (conformance_accept_files) |path| {
std.debug.assert(std.mem.endsWith(u8, path, ".hdoc"));
const yaml_file = b.fmt("{s}.yaml", .{path[0 .. path.len - 5]});

const test_run = b.addRunArtifact(exe);
test_run.addArgs(&.{ "--format", "yaml" });
test_run.addFileArg(b.path(path));
const generated_file = test_run.captureStdOut();

const compare_run = b.addRunArtifact(snapshot_diff);
compare_run.addFileArg(b.path(yaml_file));
compare_run.addFileArg(generated_file);

test_step.dependOn(&compare_run.step);
}

// Conformance snapshots: reject cases (diagnostics on stderr, expect exit code 1):
for (conformance_reject_files) |path| {
std.debug.assert(std.mem.endsWith(u8, path, ".hdoc"));
const diag_file = b.fmt("{s}.diag", .{path[0 .. path.len - 5]});

const test_run = b.addRunArtifact(exe);
test_run.addFileArg(b.path(path));
test_run.expectExitCode(1);
const generated_diag = test_run.captureStdErr();

const compare_run = b.addRunArtifact(snapshot_diff);
compare_run.addFileArg(b.path(diag_file));
compare_run.addFileArg(generated_diag);

test_step.dependOn(&compare_run.step);
}

// Unit tests:
const exe_tests = b.addTest(.{
.root_module = b.createModule(.{
Expand Down
21 changes: 1 addition & 20 deletions docs/TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,30 +5,11 @@
- Assign semantics to node types, paragraph kinds, ...
- Specify "syntax" proper
- Add links to RFCs where possible
- Verbatim-body to text conversion is under-specified. You define verbatim syntax (: with | lines) and later say verbatim bodies become inline text spans (§8.2), but you don’t precisely define how piped lines join (LF vs preserving original CRLF, whether there is a trailing newline, whether a final EOF line_terminator contributes a newline, etc.). Different implementations may diverge.
- Inline “groups” exist syntactically but are not given explicit semantics. The grammar includes inline_group ::= "{" , inline_content , "}" and §5.4 makes brace balancing a core rule, but §8.2 doesn’t explicitly state that groups are semantically transparent (flattened) versus affecting whitespace normalization boundaries or span merging.
- Span attribute semantics are referenced but not fully defined. §8.2 introduces spans with an “attribute set (e.g. emphasis/monospace/link…)” but the spec never fully defines the canonical attribute keys, nesting behavior (e.g., \em inside \mono), or how lang overrides interact at span level. That’s a major interoperability risk because renderers may differ even if parsers agree.
- Refine that `hdoc(title)` is metadata while `title{}` is rendered rich text
- Refine `img(path)` only using forward slash.
- Proposal: Add to §9.3.5:
- "path MUST use forward slashes (/) as path separators, regardless of host OS."
- "path MUST be relative; absolute paths and URI schemes (e.g., http://) MUST be rejected."
- "Path resolution is relative to the directory containing the HyperDoc source file."
- "Path traversal outside the source directory (e.g., ../../etc/passwd) SHOULD be rejected or restricted by implementations."
- Proposal: Add to §9.2.4:
- "Multiple toc elements MAY appear in a document; each MUST render the same heading structure but MAY appear at different locations."
- "If depth differs between instances, each TOC renders independently according to its own depth attribute."
- Add to §9.2.5:
- "Multiple footnotes elements partition footnote rendering; each instance collects only footnotes/citations accumulated since the previous dump (or document start)."
- Proposal: Add to §4:
- "Implementations MUST support nesting depths of at least 32 levels."
- "Implementations MAY reject documents exceeding this depth with a diagnostic."
- "Nesting depth is measured as the maximum distance from the document root to any leaf node."
- Ambiguity of Inline Unicode:
- Finding: String literals ("...") support \u{...} escapes (§7.2.1). Inline text streams (bodies of p, h1) do not (§6.1 only lists \\, \{, \}).
- Issue: Authors cannot enter invisible characters (like Non-Breaking Space U+00A0 or Zero Width Space U+200B) into a paragraph without pasting the raw invisible character, which is brittle and invisible in editors.
- Recommendation: Add explicit sequencing in §7 stating: "Escape decoding MUST occur during semantic validation, before inline text construction (§8.2) for inline-list bodies, and before attribute validation for attribute values."
- Recommendation: Add to §9.2.1: "If the document contains any \date, \time, or \datetime elements with fmt values other than iso, and hdoc(lang) is not specified, implementations SHOULD emit a diagnostic."
- Issue: "Lexical" implies only regex-level matching. It does not strictly forbid 2023-02-31. For a strict format, "Semantic" validity (Gregorian correctness) should be enforced to prevent invalid metadata.

## Potential Future Features
Expand Down Expand Up @@ -120,4 +101,4 @@ quote {
- `include(path="...")` is rejected for unbounded document content growth
- `code` is just `\mono(syntax="…")`
- `details/summary` is just HTML with dynamic changing page layout, ever tried printing this?
- `\math`, `equation{…}` have too high implementation complexity and have high requirements on fonts, font renderers and layout engines.
- `\math`, `equation{…}` have too high implementation complexity and have high requirements on fonts, font renderers and layout engines.
23 changes: 23 additions & 0 deletions docs/specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,12 @@ The grammar is intentionally ambiguous; a deterministic external rule selects a
- Attribute values are **string literals** (see §5.5).
- Attribute keys are identifiers with hyphen-separated segments (see §5.1 and §10.1).

### 4.4 Nesting depth (syntax)

- Implementations **MUST** support nesting depths of at least 32 levels.
- Implementations **MAY** reject documents that exceed this depth with a diagnostic.
- Nesting depth is measured as the maximum distance from the document root to any leaf node.

## 5. Grammar and additional syntax rules

### 5.1 Grammar (EBNF)
Expand Down Expand Up @@ -351,6 +357,8 @@ Tooling that aims to preserve author intent **SHOULD** preserve whether braces w

Escape sequences are recognized only in string literals (node bodies of the `"..."` form and attribute values). No other syntax performs string-literal escape decoding.

Escape decoding **MUST** occur during semantic validation, before inline text construction (§8.2) for inline-list bodies, and before attribute validation for attribute values.

### 7.1 Control character policy (semantic)

- A semantic validator **MAY** reject TAB (U+0009) in source text.
Expand Down Expand Up @@ -432,6 +440,8 @@ Semantic processing **MUST** construct inline text as a sequence of **spans**, w
- a Unicode string, and
- an attribute set (e.g. emphasis/monospace/link, language overrides, etc.).

Inline groups are structural only: when converting the inline tree into spans, implementations **MUST** flatten `inline_group` boundaries. An `inline_group` **MUST NOT** create a span boundary and **MUST NOT** affect whitespace normalization, but it **MUST** contribute the literal `{` and `}` characters to the inline text at its start and end.

Processing rules:

1. **Parse → tree:** Parsing preserves `ws` and yields an inline tree (text items, inline nodes, and inline groups).
Expand Down Expand Up @@ -586,6 +596,9 @@ The elements in this chapter **MUST** appear only as top-level block elements (d
- `date` (optional): datetime lexical format (§10.2.3)
- `tz` (optional): default timezone for time/datetime values (§10.2)

Diagnostics:
- If the document contains any `\date`, `\time`, or `\datetime` elements with `fmt` values other than `iso`, and `hdoc(lang)` is not specified, implementations **SHOULD** emit a diagnostic.

#### 9.2.2 `title` (document title)

- **Role:** document-level display title
Expand Down Expand Up @@ -624,6 +637,8 @@ Heading structure and numbering:

Semantic constraints:
- `toc` **MUST** be a top-level block element (a direct child of the document).
- Multiple `toc` elements **MAY** appear in a document; each **MUST** render the same heading structure but **MAY** appear at different locations.
- If `depth` differs between instances, each `toc` **MUST** render independently according to its own `depth` attribute.

#### 9.2.5 Footnote dump: `footnotes`

Expand All @@ -635,6 +650,7 @@ Semantic constraints:

Semantics:

- Multiple `footnotes` elements **MAY** appear in a document.
- `footnotes;` collects and renders all footnotes of all kinds accumulated since the previous `footnotes(...)` node (or since start of document if none appeared yet).
- `footnotes(kind="footnote");` collects and renders only `kind="footnote"` entries accumulated since the previous `footnotes(...)` node.
- `footnotes(kind="citation");` collects and renders only `kind="citation"` entries accumulated since the previous `footnotes(...)` node.
Expand Down Expand Up @@ -686,6 +702,13 @@ Only an empty body (`;`) is not "inline text".
- `lang` (optional)
- `id` (optional; top-level only)

Path semantics:

- `path` **MUST** use forward slashes (`/`) as path separators, regardless of host operating system.
- `path` **MUST** be relative; absolute paths and URI schemes **MUST** be rejected.
- Path resolution is relative to the directory containing the HyperDoc source file.
- Path traversal outside the source directory (e.g., `../../etc/passwd`) **SHOULD** be rejected or restricted by implementations.

#### 9.3.6 Preformatted: `pre`

- **Body:** inline text
Expand Down
Loading
Loading