Group orphan raw-HTML opener/closer blocks around their markdown content#16
Group orphan raw-HTML opener/closer blocks around their markdown content#16
Conversation
CommonMark "type 6" HTML blocks end at the next blank line, so markdown
like
<details>
**bold** content
</details>
reaches the renderer as three blocks: an opener `RawBlock` carrying
`"<details>\n"`, a `Para`, and a closer `RawBlock` with `"</details>\n"`.
The current rawNode wraps each raw blob in its own `<rawhtml>` element
to keep xmlhtml from mangling the bytes; the side effect is that the
`<details>` open and close tags get trapped inside their wrappers and
the markdown paragraph ends up a sibling of the (empty) details element
rather than its child.
New `Heist.Extra.Splices.Pandoc.RawHtmlGroup.groupRawHtmlBlocks` pass
walks the AST (via `Text.Pandoc.Walk`, so nested block lists are covered
too) and rewrites those orphan triplets into a `B.Div` carrying the tag
in its `"tag"` attribute. The `Div` arm of `rpBlock'` already turns that
into the named element. It now also strips the `"tag"` directive before
serialising attributes so the override doesn't leak as a literal
`tag="…"` attribute on the rendered element.
Tests: 12 unit tests pin the parser and grouping behaviour (issue
example, empty group, orphan open/close, consecutive pairs, same-tag
nesting via depth counting, self-closing rejection, balanced-in-one-block
rejection, case-insensitive matching, attribute tolerance, hyphenated
custom-element names, mismatched-tag orphan), plus an end-to-end
integration test through `renderPandocWith` that asserts the markdown
paragraph lands inside `<details>` with no `<rawhtml>` wrapper.
Closes srid/emanote#433.
…ag (no '>') doesn't silently match
The "tag" attribute on a B.Div is a directive used by groupRawHtmlBlocks (in RawHtmlGroup) to override the rendered element name. The directive's key was scattered as a string literal across three sites — the producer in RawHtmlGroup and two consumers (getTag, dropTagAttr) buried in a where-clause inside rpBlock'. Promote the protocol to first-class names in Render.Internal: - tagDirectiveKey :: Text — the single source of truth for the key. - divTag :: Text -> B.Attr -> Text — formerly the local getTag. - stripTagDirective :: B.Attr -> B.Attr — formerly the local dropTagAttr. RawHtmlGroup now imports tagDirectiveKey when constructing the Div, and the Div arm of rpBlock' calls divTag/stripTagDirective directly. The where-clause in rpBlock' loses two helpers and Map import; everything else is unchanged. Addresses Hickey #2 (named protocol over implicit convention) and Lowy #2/#4 (extract tagDirectiveKey + named helper for dropTagAttr).
openerTag and closerTag both stripped a prefix, parsed a tag-name span, and verified that nothing but whitespace followed the closing '>'. The opener has one extra check (reject self-closing); other than that the two parsers were the same shape. Extract the shared work into one parseTagAfterPrefix and let openerTag layer the void-element rejection on top.
…unused defaultTag The tag-directive scheme (key + resolver + stripper) was sitting in Render.Internal — a module whose docstring scopes it to "pure helpers extracted from Render.hs", i.e. table rendering. The producer of the directive is RawHtmlGroup, and the volatility lives there: any future change to the wire format starts at the module that decides what shape to emit. Move the three helpers to RawHtmlGroup (the producer) and have Render import from there. Render.Internal is back to its original table-helpers scope. While moving, also drop the unused defaultTag parameter on divTag — every call site passes "div"; bake it in. Switch divTag from Map.fromList+Map.lookup to plain Data.List.lookup since attr lists are flat assoc lists with no duplicates in practice (no behaviour change for real input). Save a Map allocation per Div on the rendering hot path.
- RenderSpec: comment said 'getTag' but the helper was renamed to divTag. - RawHtmlGroup module docstring: trim the Public-surface paragraph from three sentences to one — the rest narrated cabal config that is one grep away. - RawHtmlGroupSpec: drop the 'Block helpers' header comment that narrated what the next three lines obviously are.
Hickey/Lowy Analysis
Hickey rationaleFive findings on the diff. The one real correctness item — Lowy rationaleFive findings on the boundary. The boundary itself passes the "almost expendable" test — if Pandoc stops splitting orphan-HTML blocks, the entire The one "No-op" item — |
…686) **Markdown content between two blank-line-separated raw HTML tags now nests inside the surrounding element instead of escaping to be its sibling.** Pandoc emits CommonMark "type 6" HTML blocks like `<details>` … markdown … `</details>` as three separate AST blocks (opener `RawBlock`, `Para`, closer `RawBlock`); without grouping, each raw blob ends up wrapped in its own element and the markdown paragraph drifts outside. The actual fix is upstream in [srid/heist-extra#16](srid/heist-extra#16), which adds a `groupRawHtmlBlocks` AST-preprocess pass that rewrites those orphan triplets into a `B.Div` carrying the tag in a directive attribute — the existing `Div` renderer turns it into the named element with the markdown content as a real DOM child. This PR pins emanote to that branch and notes the fix under the `Bug fixes` section of the unreleased changelog. > Closes #433. ### Try it locally ```sh nix run github:srid/emanote/empty-group ``` _Generated by [`/do`](https://github.com/srid/agency) on Claude Code (model `claude-opus-4-7`)._
Brings in #16 (group orphan raw-HTML opener/closer blocks) so this PR sits on top of latest master. # Conflicts: # CHANGELOG.md # heist-extra.cabal # test/Spec.hs
Pandoc splits CommonMark "type 6" raw-HTML blocks at the next blank line, so markdown like
reaches the renderer as three separate blocks: an opener
RawBlockcarrying"<details>\n", aPara, and a closerRawBlockcarrying"</details>\n". Today'srawNodewraps each raw blob in its own<rawhtml>element to keep xmlhtml's parser from mangling the bytes; the side effect is that the<details>open and close tags get trapped inside their wrappers, so the markdown paragraph ends up a sibling of the (now empty) details element rather than its child. See srid/emanote#433.This PR adds a new
Heist.Extra.Splices.Pandoc.RawHtmlGroupmodule exporting one function:It walks a block list and, when it sees an unbalanced opening tag followed downstream by a matching closing tag (depth counted only against opens of the same tag name), replaces that span with a
B.Divcarrying the tag in a"tag"directive attribute. The renderer'sDivarm already turnsDivwith a"tag"attr into the named element viadivTag, and now also callsstripTagDirectiveso the directive doesn't survive into serialised HTML as a literaltag="…".renderPandocWithapplies the pass viaText.Pandoc.Walk.walk, so nested block lists (BlockQuote, list items, etc.) are covered too.How the AST changes
Decisions worth surfacing
<details open>)Div— deliberate scope until a real case demands otherwise<br />)<span>x</span>)Where the volatility lives
RawHtmlGroupowns the directive scheme end-to-end:tagDirectiveKey,divTag, andstripTagDirectiveall live there, andRenderimports them as a consumer. Future evolution (alternative wrapping strategies, attribute preservation, void-element awareness, smarter nesting heuristics) starts inRawHtmlGroupso the renderer's interface stays stable.Test coverage
13 unit tests pin the AST transformation: the issue's exact example, empty-group case, orphan opener/closer, consecutive pairs, same-tag nesting via depth counting, self-closing rejection, balanced-in-one-block rejection, case-insensitive matching, attribute tolerance, hyphenated custom-element names, mismatched-tag orphan, and the malformed-closer (missing
>) regression. One end-to-end integration test throughrenderPandocWithasserts the markdown paragraph lands inside<details>with no<rawhtml>wrapper. Plus a regression test pinning that thetagdirective doesn't leak as a literal HTML attribute.Generated by
/doon Claude Code (modelclaude-opus-4-7).