feat: correctly parse malformed md syntax #1279

maximilianfalco · 2025-12-22T05:58:34Z

	Fix CX-2680

🧰 Changes

Created a post process AST-based plugin that runs after remarkParse that fixes and parses malformed MD syntax. We can now handle malformed syntax like

** bold**
__ bold__
* italic *

Note

The above syntax was previously accepted in the old markdown engine because it used remark-parse version 8. mdxish and mdx uses version 11 which is now built on top of micromark which is significantly more strict, hence why these invalid md syntaxes are not allowed.

Before	After

🧬 QA & Testing

Type in malformed MD syntax for bold and italics
With the new transformer, we should still be able to render them

- added a lot of comments for dev purposes, will remove before pushing

kevinports

Really nice find that this "regression" is actually come differences in how old/new versions of how remark-parse handles this syntax. I assumed it was something we intentionally added in the legacy markdown package but I couldn't figure out where.

I wish we didn't have to support this because it really isn't valid markdown. But as you've seen with enterprise migrations we need parity with the legacy engine to support customers. So thanks for figuring this out!

Had a handful of comments. Remark plugins aren't my forte so doing my best here.

processor/transform/mdxish/normalize-malformed-md-syntax.ts

__tests__/transformers/normalize-malformed-md-syntax.test.ts

maximilianfalco · 2025-12-23T06:57:01Z

@kevinports thanks for the review! I had another look and I forgot to include strikethrough syntaxes using the squiggly "~~" thing. It looks like our editor is already pretty strict about this (ie if there is a space it would not render the strikethrough) but it does fall in the same ballpark as the bold syntax (ie ** and __)

so im just wondering if we should also loop that syntax (ie ~~) in as well or not? cc @rafegoldberg

- also fix the trailing space issue

- returns new index to prevent the plugin from revisiting them

kevinports · 2025-12-23T14:58:22Z

so im just wondering if we should also loop that syntax (ie ~~) in as well or not?

Yeah good point. Though I checked on a legacy project and it looks like the legacy engine does not handle malformed strikethroughs like ~~ foo~~. So my vote is to follow suite here and only worry about ** and _. The fewer odd behaviors we have to support the better imo.

kevinports

Noticed one more small thing to address before merging. But otherwise looks good to me.

kevinports · 2025-12-23T15:04:28Z

processor/transform/mdxish/normalize-malformed-md-syntax.ts

+    // Patterns to detect for bold (** and __) and italic (* and _) syntax:
+    // Bold: ** text**, **text **, word** text**, ** text **
+    // Italic: * text*, *text *, word* text*, * text *
+    // Same patterns for underscore variants
+    // We use separate patterns for each marker type to allow this flexibility.
+
+    // Pattern for ** bold **
+    // Groups: 1=wordBefore, 2=marker, 3=contentWithSpaceAfter, 4=trailingSpace1, 5=contentWithSpaceBefore, 6=trailingSpace2, 7=afterChar
+    // trailingSpace1 is for "** text **" pattern, trailingSpace2 is for "**text **" pattern
+    const asteriskBoldRegex =
+      /([^*\s]+)?\s*(\*\*)(?:\s+((?:[^*\n]|\*(?!\*))+?)(\s*)\2|((?:[^*\n]|\*(?!\*))+?)(\s+)\2)(\S|$)?/g;
+
+    // Pattern for __ bold __
+    const underscoreBoldRegex =
+      /([^_\s]+)?\s*(__)(?:\s+((?:[^_\n]|_(?!_))+?)(\s*)\2|((?:[^_\n]|_(?!_))+?)(\s+)\2)(\S|$)?/g;
+
+    // Pattern for * italic *
+    const asteriskItalicRegex = /([^*\s]+)?\s*(\*)(?!\*)(?:\s+([^*\n]+?)(\s*)\2|([^*\n]+?)(\s+)\2)(\S|$)?/g;
+
+    // Pattern for _ italic _
+    const underscoreItalicRegex = /([^_\s]+)?\s*(_)(?!_)(?:\s+([^_\n]+?)(\s*)\2|([^_\n]+?)(\s+)\2)(\S|$)?/g;


Can you move these regex definitions up to the module scope. Defining them in the visitor function means they are recreated for every node in the AST right?

maximilianfalco added 4 commits December 22, 2025 12:52

wip: first pass

2551c62

feat: support underscore syntax

d1200fc

- added a lot of comments for dev purposes, will remove before pushing

feat: add support for italics

bc8793b

chore: code cleanup

bbd97ce

maximilianfalco marked this pull request as ready for review December 22, 2025 07:09

kevinports reviewed Dec 22, 2025

View reviewed changes

kevinports requested review from dannobytes and rafegoldberg December 22, 2025 22:11

maximilianfalco added 5 commits December 23, 2025 17:17

tests: add some more edge cases related to snake_case

7a6e4e6

add some more tests regarding trailing spaces

0d24e18

fix: add new logic to support underscores and snake_case

335140e

- also fix the trailing space issue

fix: fixed a bug where escaped markers are incorrectly rendered

fbfd1a8

fix: make sure plugin returns index and what to skip

9d5ab90

- returns new index to prevent the plugin from revisiting them

kevinports approved these changes Dec 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: correctly parse malformed md syntax #1279

feat: correctly parse malformed md syntax #1279

maximilianfalco commented Dec 22, 2025 •

edited

Loading

Uh oh!

kevinports left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maximilianfalco commented Dec 23, 2025

Uh oh!

kevinports commented Dec 23, 2025

Uh oh!

kevinports left a comment

Uh oh!

kevinports Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: correctly parse malformed md syntax #1279

Are you sure you want to change the base?

feat: correctly parse malformed md syntax #1279

Conversation

maximilianfalco commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧰 Changes

🧬 QA & Testing

Uh oh!

kevinports left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maximilianfalco commented Dec 23, 2025

Uh oh!

kevinports commented Dec 23, 2025

Uh oh!

kevinports left a comment

Choose a reason for hiding this comment

Uh oh!

kevinports Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maximilianfalco commented Dec 22, 2025 •

edited

Loading