fix(stripComments): dont escape special markdown chars *, _, # #1267

dannobytes · 2025-12-12T23:27:39Z

	Fix CX-2603

🧰 Changes

Prevents *, _ and # chars from getting escaped in the md syntax to
avoid something like **bold ** becoming \*\*bold \*\*.

Cases:

Unfortunately, whitelisting the * or _ chars introduces another issue where we sometimes do want to escape them, like when following a backslash, e.g. \* or \#.

So we need a way to match *word* but not \*word\*.

✅ What's working

before	after

❌ What's not?

When a literal asterisk is desired in the MD with \*here\*, the escaped asterisk is getting removed and being rendered as *here* bold.

======>

🧬 QA & Testing

Prevents `*`, `_` and `#` chars from getting escaped in the md syntax to avoid something like `**bold **` becoming `\*\*bold \*\*`. Not entirely certain this is enough to solve the problem of MD content with trailing spaces between the closing `**` from rendering, but added some tests to verify that our strip comments transformer should at least leave these alone. Is there any harm in never escaping these?

dannobytes · 2025-12-12T23:31:32Z

lib/stripComments.ts

+            handlers: {
+              text(node, _, state, info) {
+                // Don't escape special markdown characters like #, *, or _.
+                if (/[#*_]/.test(node.value)) return node.value;


tbh, i'm not entirely certain if this is the correct way to be solving this but am not sure how to think of all the cases in which this may regress something.

but just wanted to throw up something as a starting point to iterate on.

i didn't get to see this make any noticeable impact on the readme app after linking, but i was having trouble getting anything in my MD repo to show up.

need to look into it a bit more and pair on it with someone, b/c i couldn't figure out why i wasn't seeing any console logs come thru.

dannobytes · 2025-12-16T19:28:58Z

i added some better tests to help point out the hard problem here 2ab0f3d

split them up into two tests:

allows compact headings with no whitespace delimiter
allows leading/trailing spaces between bold/italic markers

the biggest problem here is that any invalid markdown that doesn't fit the CommonMark spec gets converted into a text node from remark-parse.

for example, here's a simple md input along with the mdast text nodes that it gets converted to:

#Blue
\\# Literal
# Black

{
  type: 'text',
  value: '#Blue\n# Literal',
  position: {
    start: { line: 2, column: 1, offset: 1 },
    end: { line: 3, column: 11, offset: 17 }
  }
}
{
  type: 'text',
  value: 'Black',
  position: {
    start: { line: 4, column: 3, offset: 20 },
    end: { line: 4, column: 8, offset: 25 }
  }
} "Black"

see how the first two lines with invalid MD get converted into a single text node instead of being split up. also notice how the backslashed \# character isn't included in the node value. this makes sense b/c it'll eventually get sanitized/escaped.

if we then try to evaluate these text nodes to either return its raw value vs a "safe" (i.e. escaped) value, how can we without being able to discern whether the # char is a heading vs backslashed literal?

the same problem exists for leading/trailing bold/italic markers.

single line with **bold ** text and \\*literal\\* asterisks.

**bold**
**  leading**
**trailing  **

turns into this mdast set of text nodes:

{
  type: 'text',
  value: 'single line with **bold ** text and *literal* asterisks.',
  position: {
    start: { line: 2, column: 1, offset: 1 },
    end: { line: 2, column: 59, offset: 59 }
  }
}
{
  type: 'text',
  value: '\n**  leading**\n**trailing  **',
  position: {
    start: { line: 4, column: 9, offset: 69 },
    end: { line: 6, column: 15, offset: 98 }
  }
}
{
  type: 'text',
  value: 'bold',
  position: {
    start: { line: 4, column: 3, offset: 63 },
    end: { line: 4, column: 7, offset: 67 }
  }
}
{
  type: 'text',
  value: '\n**  leading**\n**trailing  **',
  position: {
    start: { line: 4, column: 9, offset: 69 },
    end: { line: 6, column: 15, offset: 98 }
  }
}

how do you take a single text node like single line with **bold ** text and *literal* asterisks. and run a regex to determine whether to return a "safe" value vs not?

it feels like we might be going about this the wrong way. i think we need to write a remark plugin, much like the remark-gfm plugin to support GitHub-flavored markdown that includes all of ReadMe-flavored markdown and runs this like

const file = unified()
  .use(remarkParse)
  .use(remarkReadMe)
  ...

not sure if we have anything like this already, but this feels like the right approach instead of the processing we're doing in our remarkStringify handlers

dannobytes requested a review from kevinports December 12, 2025 23:28

dannobytes commented Dec 12, 2025

View reviewed changes

test: add better test to verify results

2ab0f3d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(stripComments): dont escape special markdown chars *, _, # #1267

fix(stripComments): dont escape special markdown chars *, _, # #1267

dannobytes commented Dec 12, 2025 •

edited

Loading

Uh oh!

dannobytes Dec 12, 2025

Uh oh!

dannobytes commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(stripComments): dont escape special markdown chars *, _, # #1267

Are you sure you want to change the base?

fix(stripComments): dont escape special markdown chars *, _, # #1267

Conversation

dannobytes commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧰 Changes

🧬 QA & Testing

Uh oh!

dannobytes Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

dannobytes commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dannobytes commented Dec 12, 2025 •

edited

Loading