Skip to content

Conversation

@dannobytes
Copy link
Contributor

@dannobytes dannobytes commented Dec 12, 2025

PR App Fix CX-2603

🧰 Changes

Prevents *, _ and # chars from getting escaped in the md syntax to
avoid something like **bold ** becoming \*\*bold \*\*.

Cases:

  • **word** should be word
  • **word ** should be word
  • \*word\* should be *word*
  • _word_ should be word
  • _word _ should be word
  • \_word\_ should be _word_

Unfortunately, whitelisting the * or _ chars introduces another issue where we sometimes do want to escape them, like when following a backslash, e.g. \* or \#.

So we need a way to match *word* but not \*word\*.

✅ What's working

before after
image image

❌ What's not?

When a literal asterisk is desired in the MD with \*here\*, the escaped asterisk is getting removed and being rendered as *here* bold.

image ======> image

🧬 QA & Testing

Prevents `*`, `_` and `#` chars from getting escaped in the md syntax to
avoid something like `**bold **` becoming `\*\*bold \*\*`.

Not entirely certain this is enough to solve the problem of MD content
with trailing spaces between the closing `**` from rendering, but added
some tests to verify that our strip comments transformer should at least
leave these alone.

Is there any harm in never escaping these?
handlers: {
text(node, _, state, info) {
// Don't escape special markdown characters like #, *, or _.
if (/[#*_]/.test(node.value)) return node.value;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, i'm not entirely certain if this is the correct way to be solving this but am not sure how to think of all the cases in which this may regress something.

but just wanted to throw up something as a starting point to iterate on.

i didn't get to see this make any noticeable impact on the readme app after linking, but i was having trouble getting anything in my MD repo to show up.

need to look into it a bit more and pair on it with someone, b/c i couldn't figure out why i wasn't seeing any console logs come thru.

@dannobytes
Copy link
Contributor Author

i added some better tests to help point out the hard problem here 2ab0f3d

split them up into two tests:

  1. allows compact headings with no whitespace delimiter
  2. allows leading/trailing spaces between bold/italic markers

the biggest problem here is that any invalid markdown that doesn't fit the CommonMark spec gets converted into a text node from remark-parse.

for example, here's a simple md input along with the mdast text nodes that it gets converted to:

#Blue
\\# Literal
# Black
{
  type: 'text',
  value: '#Blue\n# Literal',
  position: {
    start: { line: 2, column: 1, offset: 1 },
    end: { line: 3, column: 11, offset: 17 }
  }
}
{
  type: 'text',
  value: 'Black',
  position: {
    start: { line: 4, column: 3, offset: 20 },
    end: { line: 4, column: 8, offset: 25 }
  }
} "Black"

see how the first two lines with invalid MD get converted into a single text node instead of being split up. also notice how the backslashed \# character isn't included in the node value. this makes sense b/c it'll eventually get sanitized/escaped.

if we then try to evaluate these text nodes to either return its raw value vs a "safe" (i.e. escaped) value, how can we without being able to discern whether the # char is a heading vs backslashed literal?

the same problem exists for leading/trailing bold/italic markers.

single line with **bold ** text and \\*literal\\* asterisks.

**bold**
**  leading**
**trailing  **

turns into this mdast set of text nodes:

{
  type: 'text',
  value: 'single line with **bold ** text and *literal* asterisks.',
  position: {
    start: { line: 2, column: 1, offset: 1 },
    end: { line: 2, column: 59, offset: 59 }
  }
}
{
  type: 'text',
  value: '\n**  leading**\n**trailing  **',
  position: {
    start: { line: 4, column: 9, offset: 69 },
    end: { line: 6, column: 15, offset: 98 }
  }
}
{
  type: 'text',
  value: 'bold',
  position: {
    start: { line: 4, column: 3, offset: 63 },
    end: { line: 4, column: 7, offset: 67 }
  }
}
{
  type: 'text',
  value: '\n**  leading**\n**trailing  **',
  position: {
    start: { line: 4, column: 9, offset: 69 },
    end: { line: 6, column: 15, offset: 98 }
  }
}

how do you take a single text node like single line with **bold ** text and *literal* asterisks. and run a regex to determine whether to return a "safe" value vs not?

it feels like we might be going about this the wrong way. i think we need to write a remark plugin, much like the remark-gfm plugin to support GitHub-flavored markdown that includes all of ReadMe-flavored markdown and runs this like

const file = unified()
  .use(remarkParse)
  .use(remarkReadMe)
  ...

not sure if we have anything like this already, but this feels like the right approach instead of the processing we're doing in our remarkStringify handlers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants