-
Notifications
You must be signed in to change notification settings - Fork 16
feat: correctly parse malformed md syntax #1279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next
Are you sure you want to change the base?
feat: correctly parse malformed md syntax #1279
Conversation
- added a lot of comments for dev purposes, will remove before pushing
kevinports
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice find that this "regression" is actually come differences in how old/new versions of how remark-parse handles this syntax. I assumed it was something we intentionally added in the legacy markdown package but I couldn't figure out where.
I wish we didn't have to support this because it really isn't valid markdown. But as you've seen with enterprise migrations we need parity with the legacy engine to support customers. So thanks for figuring this out!
Had a handful of comments. Remark plugins aren't my forte so doing my best here.
|
@kevinports thanks for the review! I had another look and I forgot to include strikethrough syntaxes using the squiggly "~~" thing. It looks like our editor is already pretty strict about this (ie if there is a space it would not render the strikethrough) but it does fall in the same ballpark as the bold syntax (ie ** and __) so im just wondering if we should also loop that syntax (ie ~~) in as well or not? cc @rafegoldberg |
- also fix the trailing space issue
- returns new index to prevent the plugin from revisiting them
Yeah good point. Though I checked on a legacy project and it looks like the legacy engine does not handle malformed strikethroughs like |
kevinports
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed one more small thing to address before merging. But otherwise looks good to me.
| // Patterns to detect for bold (** and __) and italic (* and _) syntax: | ||
| // Bold: ** text**, **text **, word** text**, ** text ** | ||
| // Italic: * text*, *text *, word* text*, * text * | ||
| // Same patterns for underscore variants | ||
| // We use separate patterns for each marker type to allow this flexibility. | ||
|
|
||
| // Pattern for ** bold ** | ||
| // Groups: 1=wordBefore, 2=marker, 3=contentWithSpaceAfter, 4=trailingSpace1, 5=contentWithSpaceBefore, 6=trailingSpace2, 7=afterChar | ||
| // trailingSpace1 is for "** text **" pattern, trailingSpace2 is for "**text **" pattern | ||
| const asteriskBoldRegex = | ||
| /([^*\s]+)?\s*(\*\*)(?:\s+((?:[^*\n]|\*(?!\*))+?)(\s*)\2|((?:[^*\n]|\*(?!\*))+?)(\s+)\2)(\S|$)?/g; | ||
|
|
||
| // Pattern for __ bold __ | ||
| const underscoreBoldRegex = | ||
| /([^_\s]+)?\s*(__)(?:\s+((?:[^_\n]|_(?!_))+?)(\s*)\2|((?:[^_\n]|_(?!_))+?)(\s+)\2)(\S|$)?/g; | ||
|
|
||
| // Pattern for * italic * | ||
| const asteriskItalicRegex = /([^*\s]+)?\s*(\*)(?!\*)(?:\s+([^*\n]+?)(\s*)\2|([^*\n]+?)(\s+)\2)(\S|$)?/g; | ||
|
|
||
| // Pattern for _ italic _ | ||
| const underscoreItalicRegex = /([^_\s]+)?\s*(_)(?!_)(?:\s+([^_\n]+?)(\s*)\2|([^_\n]+?)(\s+)\2)(\S|$)?/g; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move these regex definitions up to the module scope. Defining them in the visitor function means they are recreated for every node in the AST right?
🧰 Changes
Created a post process AST-based plugin that runs after
remarkParsethat fixes and parses malformed MD syntax. We can now handle malformed syntax like** bold**__ bold__* italic *Note
The above syntax was previously accepted in the old markdown engine because it used
remark-parseversion 8.mdxishandmdxuses version 11 which is now built on top ofmicromarkwhich is significantly more strict, hence why these invalid md syntaxes are not allowed.🧬 QA & Testing