Skip to content

Conversation

@maximilianfalco
Copy link
Contributor

@maximilianfalco maximilianfalco commented Dec 22, 2025

PR App Fix CX-2680

🧰 Changes

Created a post process AST-based plugin that runs after remarkParse that fixes and parses malformed MD syntax. We can now handle malformed syntax like

  • ** bold**
  • __ bold__
  • * italic *

Note

The above syntax was previously accepted in the old markdown engine because it used remark-parse version 8. mdxish and mdx uses version 11 which is now built on top of micromark which is significantly more strict, hence why these invalid md syntaxes are not allowed.

Before After
Screenshot 2025-12-22 at 14 01 44 Screenshot 2025-12-22 at 13 46 30

🧬 QA & Testing

  1. Type in malformed MD syntax for bold and italics
  2. With the new transformer, we should still be able to render them

@maximilianfalco maximilianfalco marked this pull request as ready for review December 22, 2025 07:09
Copy link
Contributor

@kevinports kevinports left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice find that this "regression" is actually come differences in how old/new versions of how remark-parse handles this syntax. I assumed it was something we intentionally added in the legacy markdown package but I couldn't figure out where.

I wish we didn't have to support this because it really isn't valid markdown. But as you've seen with enterprise migrations we need parity with the legacy engine to support customers. So thanks for figuring this out!

Had a handful of comments. Remark plugins aren't my forte so doing my best here.

@maximilianfalco
Copy link
Contributor Author

@kevinports thanks for the review! I had another look and I forgot to include strikethrough syntaxes using the squiggly "~~" thing. It looks like our editor is already pretty strict about this (ie if there is a space it would not render the strikethrough) but it does fall in the same ballpark as the bold syntax (ie ** and __)

so im just wondering if we should also loop that syntax (ie ~~) in as well or not? cc @rafegoldberg

@kevinports
Copy link
Contributor

so im just wondering if we should also loop that syntax (ie ~~) in as well or not?

Yeah good point. Though I checked on a legacy project and it looks like the legacy engine does not handle malformed strikethroughs like ~~ foo~~. So my vote is to follow suite here and only worry about ** and _. The fewer odd behaviors we have to support the better imo.

Copy link
Contributor

@kevinports kevinports left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed one more small thing to address before merging. But otherwise looks good to me.

Comment on lines +29 to +49
// Patterns to detect for bold (** and __) and italic (* and _) syntax:
// Bold: ** text**, **text **, word** text**, ** text **
// Italic: * text*, *text *, word* text*, * text *
// Same patterns for underscore variants
// We use separate patterns for each marker type to allow this flexibility.

// Pattern for ** bold **
// Groups: 1=wordBefore, 2=marker, 3=contentWithSpaceAfter, 4=trailingSpace1, 5=contentWithSpaceBefore, 6=trailingSpace2, 7=afterChar
// trailingSpace1 is for "** text **" pattern, trailingSpace2 is for "**text **" pattern
const asteriskBoldRegex =
/([^*\s]+)?\s*(\*\*)(?:\s+((?:[^*\n]|\*(?!\*))+?)(\s*)\2|((?:[^*\n]|\*(?!\*))+?)(\s+)\2)(\S|$)?/g;

// Pattern for __ bold __
const underscoreBoldRegex =
/([^_\s]+)?\s*(__)(?:\s+((?:[^_\n]|_(?!_))+?)(\s*)\2|((?:[^_\n]|_(?!_))+?)(\s+)\2)(\S|$)?/g;

// Pattern for * italic *
const asteriskItalicRegex = /([^*\s]+)?\s*(\*)(?!\*)(?:\s+([^*\n]+?)(\s*)\2|([^*\n]+?)(\s+)\2)(\S|$)?/g;

// Pattern for _ italic _
const underscoreItalicRegex = /([^_\s]+)?\s*(_)(?!_)(?:\s+([^_\n]+?)(\s*)\2|([^_\n]+?)(\s+)\2)(\S|$)?/g;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move these regex definitions up to the module scope. Defining them in the visitor function means they are recreated for every node in the AST right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants