Skip to content

More sophisticated flanking rules for math#168

Open
notriddle wants to merge 1 commit intojgm:masterfrom
notriddle:math-spans-preceding-following
Open

More sophisticated flanking rules for math#168
notriddle wants to merge 1 commit intojgm:masterfrom
notriddle:math-spans-preceding-following

Conversation

@notriddle
Copy link
Copy Markdown
Contributor

Fixes #167

With this change, $ that's preceded by word chars can't open, and followed by word chars can't close. This reduces "false positives" on existing markdown documents like this:

vim /etc/systemd/system/$UNIT.$TYPE
                        ^-----^ currently parsed as latex math

I've checked a few other commonmark-based markdown flavors that accept LaTeX math, to see what they did. The rule I added is supposed to be the same as the one used in VSCode's markdown-it plugin and Discourse's math plugin:

Full disclosure: I know of a few implementations that don't do this:

With this change, `$` that's preceded by word chars can't open,
and followed by word chars can't close. This reduces "false positives"
on existing markdown documents like this:

    vim /etc/systemd/system/$UNIT.$TYPE
                            ^-----^ currently parsed as latex math

I've checked a few other commonmark-based markdown flavors that accept
LaTeX math, to see what they did. The rule I added is supposed to be the
same as the one used in VSCode's markdown-it plugin and Discourse's math plugin:

- https://github.com/microsoft/vscode-markdown-it-katex/blob/efd01d8/src/index.ts#L22-L32
- https://github.com/discourse/discourse/blob/41f62aa/plugins/discourse-math/assets/javascripts/lib/discourse-markdown/discourse-math.js#L46

Full disclosure: I know of a few implementations that don't do this:

- https://github.com/kivikakk/comrak/blob/0d4a9ca/src/parser/inlines.rs#L2104 (narrow special case for digits after the closer)
- https://github.com/classeur/markdown-it-mathjax/blob/c1e34d4/markdown-it-mathjax.js#L80 (same narrow special case for digits)
- https://github.com/jupyterlab/jupyterlab/blob/61a2db4/packages/rendermime/src/latex.ts#L17 (very permissive; even `1 $ 2 $ 3` thinks the `2` is in a math span)
@jgm
Copy link
Copy Markdown
Owner

jgm commented Mar 28, 2026

Probably this is a good idea, but I do have one concern. This change would introduce an expressive blindspot that we don't have now. Currently you can express things like $\pi$ie, and after this change you wouldn't be able to express them. On the other hand, this change doesn't give any new expressive possibilities (it just allows you to emit backslashes in expressing them).

Arguably, we don't need to express $\pi$ie. But are we confident that there are no similar cases that are useful but will become impossible to express? I have myself sometimes wanted to add subscripts or superscripts to regular words, e.g. word$_\alpha$. Granted, we have a syntax for subscript that could be used in this case. But are we confident that we wouldn't be ruling out any legitimate uses?

@notriddle
Copy link
Copy Markdown
Contributor Author

notriddle commented Mar 28, 2026

Most cases seem like they could be solved with \text. So $\pi\text{ie}$ becomes $\pi\text{ie}$. I suppose this means there's technically no expressiveness limitations for plain adjacent text.

The other major problem is CJK no-spaces writing. We should probably follow the same rules for that that are used for emphasis.

@jgm
Copy link
Copy Markdown
Owner

jgm commented Mar 29, 2026

CJK is a serious concern, thanks for bringing that up. This change could break many existing CJK documents containing math, unless we implemented special rules (which are still not implemented for emphasis).

@jgm
Copy link
Copy Markdown
Owner

jgm commented Mar 29, 2026

Regarding \text: yes, you could work around the expressivity issue that way, but existing docs may still break.

@notriddle
Copy link
Copy Markdown
Contributor Author

but existing docs may still break

In practice, I occasionally see evidence of markdown that’s written for a different engine than the one that’s actually been used for publishing. Either the author never previewed it at all, or they previewed it in a writing environment like GitHub or VSCode that has its own renderer. Either way, the number of documents you’d break can, surprisingly, be outnumbered by the already-broken docs that you’d fix.

@jgm
Copy link
Copy Markdown
Owner

jgm commented Mar 29, 2026

Either way, the number of documents you’d break can, surprisingly, be outnumbered by the already-broken docs that you’d fix.

It's possible. But I do feel worse about breaking documents by authors who care enough to check that their document renders properly (possibly including myself).

@notriddle
Copy link
Copy Markdown
Contributor Author

That scenario only happens if the writer previewed the document using commonmark-hs, and didn't also preview the document in something else. How widespread is that, relatively speaking?

A quick search for "pandoc -f gfm" README.md unearths 135 results for document pipelines that convert GFM-formatted README files into other formats. Whether they're checking pandoc's output or not, they definitely also want it to render correctly in GitHub's web interface, which means we're safe as long as we do what it does.

That's the lower bound. The upper bound is just "pandoc -f gfm", which gives 840 results. I don't know what fraction of those document pipelines are actually being used with GitHub, but it's probably more than 135. 1

On the other hand, "pandoc -f commonmark" only gives 158 results, and "pandoc -f commonmark_x" 128. The number of people using commonmark-hs in "non-GitHub-flavored" mode is, roughly, the same order of magnitude as the number of people using "gfm" specifically to process their README files.

It's a lot harder for me to know what's going on outside of the public eye, but I do have reasons to think that documents targetted at pandoc specifically will tend to use pandoc-flavored markdown instead of commonmark. First of all because there are features in pandoc-flavored markdown that commonmark-hs doesn't have (citations, grid tables, ...). And, second, because of the more popular name.

Footnotes

  1. In any case, people who assume pandoc's "gfm" mode will parse their document the same way GitHub does aren't really doing anything wrong.

@jgm
Copy link
Copy Markdown
Owner

jgm commented Mar 29, 2026

That may be true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Math environment flanking rules

2 participants