More sophisticated flanking rules for math by notriddle · Pull Request #168 · jgm/commonmark-hs

notriddle · 2026-03-28T00:33:44Z

Fixes #167

With this change, $ that's preceded by word chars can't open, and followed by word chars can't close. This reduces "false positives" on existing markdown documents like this:

vim /etc/systemd/system/$UNIT.$TYPE
                        ^-----^ currently parsed as latex math

I've checked a few other commonmark-based markdown flavors that accept LaTeX math, to see what they did. The rule I added is supposed to be the same as the one used in VSCode's markdown-it plugin and Discourse's math plugin:

Full disclosure: I know of a few implementations that don't do this:

https://github.com/kivikakk/comrak/blob/0d4a9ca/src/parser/inlines.rs#L2104 (narrow special case for digits after the closer)
https://github.com/classeur/markdown-it-mathjax/blob/c1e34d4/markdown-it-mathjax.js#L80 (same narrow special case for digits)
https://github.com/jupyterlab/jupyterlab/blob/61a2db4/packages/rendermime/src/latex.ts#L17 (very permissive; even 1 $ 2 $ 3 thinks the 2 is in a math span)

With this change, `$` that's preceded by word chars can't open, and followed by word chars can't close. This reduces "false positives" on existing markdown documents like this: vim /etc/systemd/system/$UNIT.$TYPE ^-----^ currently parsed as latex math I've checked a few other commonmark-based markdown flavors that accept LaTeX math, to see what they did. The rule I added is supposed to be the same as the one used in VSCode's markdown-it plugin and Discourse's math plugin: - https://github.com/microsoft/vscode-markdown-it-katex/blob/efd01d8/src/index.ts#L22-L32 - https://github.com/discourse/discourse/blob/41f62aa/plugins/discourse-math/assets/javascripts/lib/discourse-markdown/discourse-math.js#L46 Full disclosure: I know of a few implementations that don't do this: - https://github.com/kivikakk/comrak/blob/0d4a9ca/src/parser/inlines.rs#L2104 (narrow special case for digits after the closer) - https://github.com/classeur/markdown-it-mathjax/blob/c1e34d4/markdown-it-mathjax.js#L80 (same narrow special case for digits) - https://github.com/jupyterlab/jupyterlab/blob/61a2db4/packages/rendermime/src/latex.ts#L17 (very permissive; even `1 $ 2 $ 3` thinks the `2` is in a math span)

jgm · 2026-03-28T13:51:08Z

Probably this is a good idea, but I do have one concern. This change would introduce an expressive blindspot that we don't have now. Currently you can express things like $\pi$ie, and after this change you wouldn't be able to express them. On the other hand, this change doesn't give any new expressive possibilities (it just allows you to emit backslashes in expressing them).

Arguably, we don't need to express $\pi$ie. But are we confident that there are no similar cases that are useful but will become impossible to express? I have myself sometimes wanted to add subscripts or superscripts to regular words, e.g. word$_\alpha$. Granted, we have a syntax for subscript that could be used in this case. But are we confident that we wouldn't be ruling out any legitimate uses?

notriddle · 2026-03-28T23:06:38Z

Most cases seem like they could be solved with \text. So $\pi\text{ie}$ becomes $\pi\text{ie}$. I suppose this means there's technically no expressiveness limitations for plain adjacent text.

The other major problem is CJK no-spaces writing. We should probably follow the same rules for that that are used for emphasis.

jgm · 2026-03-29T00:27:00Z

CJK is a serious concern, thanks for bringing that up. This change could break many existing CJK documents containing math, unless we implemented special rules (which are still not implemented for emphasis).

jgm · 2026-03-29T00:28:07Z

Regarding \text: yes, you could work around the expressivity issue that way, but existing docs may still break.

notriddle · 2026-03-29T13:30:16Z

but existing docs may still break

In practice, I occasionally see evidence of markdown that’s written for a different engine than the one that’s actually been used for publishing. Either the author never previewed it at all, or they previewed it in a writing environment like GitHub or VSCode that has its own renderer. Either way, the number of documents you’d break can, surprisingly, be outnumbered by the already-broken docs that you’d fix.

jgm · 2026-03-29T14:21:14Z

Either way, the number of documents you’d break can, surprisingly, be outnumbered by the already-broken docs that you’d fix.

It's possible. But I do feel worse about breaking documents by authors who care enough to check that their document renders properly (possibly including myself).

notriddle · 2026-03-29T16:12:05Z

That scenario only happens if the writer previewed the document using commonmark-hs, and didn't also preview the document in something else. How widespread is that, relatively speaking?

A quick search for "pandoc -f gfm" README.md unearths 135 results for document pipelines that convert GFM-formatted README files into other formats. Whether they're checking pandoc's output or not, they definitely also want it to render correctly in GitHub's web interface, which means we're safe as long as we do what it does.

That's the lower bound. The upper bound is just "pandoc -f gfm", which gives 840 results. I don't know what fraction of those document pipelines are actually being used with GitHub, but it's probably more than 135. ¹

On the other hand, "pandoc -f commonmark" only gives 158 results, and "pandoc -f commonmark_x" 128. The number of people using commonmark-hs in "non-GitHub-flavored" mode is, roughly, the same order of magnitude as the number of people using "gfm" specifically to process their README files.

It's a lot harder for me to know what's going on outside of the public eye, but I do have reasons to think that documents targetted at pandoc specifically will tend to use pandoc-flavored markdown instead of commonmark. First of all because there are features in pandoc-flavored markdown that commonmark-hs doesn't have (citations, grid tables, ...). And, second, because of the more popular name.

In any case, people who assume pandoc's "gfm" mode will parse their document the same way GitHub does aren't really doing anything wrong. ↩

jgm · 2026-03-29T16:39:36Z

That may be true.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More sophisticated flanking rules for math#168

More sophisticated flanking rules for math#168
notriddle wants to merge 1 commit intojgm:masterfrom
notriddle:math-spans-preceding-following

notriddle commented Mar 28, 2026

Uh oh!

jgm commented Mar 28, 2026

Uh oh!

notriddle commented Mar 28, 2026 •

edited

Loading

Uh oh!

jgm commented Mar 29, 2026

Uh oh!

jgm commented Mar 29, 2026

Uh oh!

notriddle commented Mar 29, 2026

Uh oh!

jgm commented Mar 29, 2026

Uh oh!

notriddle commented Mar 29, 2026

Uh oh!

jgm commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

notriddle commented Mar 28, 2026

Uh oh!

jgm commented Mar 28, 2026

Uh oh!

notriddle commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgm commented Mar 29, 2026

Uh oh!

jgm commented Mar 29, 2026

Uh oh!

notriddle commented Mar 29, 2026

Uh oh!

jgm commented Mar 29, 2026

Uh oh!

notriddle commented Mar 29, 2026

Footnotes

Uh oh!

jgm commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

notriddle commented Mar 28, 2026 •

edited

Loading