More sophisticated flanking rules for math#168
Conversation
With this change, `$` that's preceded by word chars can't open,
and followed by word chars can't close. This reduces "false positives"
on existing markdown documents like this:
vim /etc/systemd/system/$UNIT.$TYPE
^-----^ currently parsed as latex math
I've checked a few other commonmark-based markdown flavors that accept
LaTeX math, to see what they did. The rule I added is supposed to be the
same as the one used in VSCode's markdown-it plugin and Discourse's math plugin:
- https://github.com/microsoft/vscode-markdown-it-katex/blob/efd01d8/src/index.ts#L22-L32
- https://github.com/discourse/discourse/blob/41f62aa/plugins/discourse-math/assets/javascripts/lib/discourse-markdown/discourse-math.js#L46
Full disclosure: I know of a few implementations that don't do this:
- https://github.com/kivikakk/comrak/blob/0d4a9ca/src/parser/inlines.rs#L2104 (narrow special case for digits after the closer)
- https://github.com/classeur/markdown-it-mathjax/blob/c1e34d4/markdown-it-mathjax.js#L80 (same narrow special case for digits)
- https://github.com/jupyterlab/jupyterlab/blob/61a2db4/packages/rendermime/src/latex.ts#L17 (very permissive; even `1 $ 2 $ 3` thinks the `2` is in a math span)
|
Probably this is a good idea, but I do have one concern. This change would introduce an expressive blindspot that we don't have now. Currently you can express things like Arguably, we don't need to express |
|
Most cases seem like they could be solved with The other major problem is CJK no-spaces writing. We should probably follow the same rules for that that are used for emphasis. |
|
CJK is a serious concern, thanks for bringing that up. This change could break many existing CJK documents containing math, unless we implemented special rules (which are still not implemented for emphasis). |
|
Regarding |
In practice, I occasionally see evidence of markdown that’s written for a different engine than the one that’s actually been used for publishing. Either the author never previewed it at all, or they previewed it in a writing environment like GitHub or VSCode that has its own renderer. Either way, the number of documents you’d break can, surprisingly, be outnumbered by the already-broken docs that you’d fix. |
It's possible. But I do feel worse about breaking documents by authors who care enough to check that their document renders properly (possibly including myself). |
|
That scenario only happens if the writer previewed the document using commonmark-hs, and didn't also preview the document in something else. How widespread is that, relatively speaking? A quick search for That's the lower bound. The upper bound is just On the other hand, It's a lot harder for me to know what's going on outside of the public eye, but I do have reasons to think that documents targetted at pandoc specifically will tend to use pandoc-flavored markdown instead of commonmark. First of all because there are features in pandoc-flavored markdown that commonmark-hs doesn't have (citations, grid tables, ...). And, second, because of the more popular name. Footnotes
|
|
That may be true. |
Fixes #167
With this change,
$that's preceded by word chars can't open, and followed by word chars can't close. This reduces "false positives" on existing markdown documents like this:I've checked a few other commonmark-based markdown flavors that accept LaTeX math, to see what they did. The rule I added is supposed to be the same as the one used in VSCode's markdown-it plugin and Discourse's math plugin:
Full disclosure: I know of a few implementations that don't do this:
1 $ 2 $ 3thinks the2is in a math span)