🔒 Fix: Sanitize user content rendered to Markdown to prevent XSS/code execution#53
🔒 Fix: Sanitize user content rendered to Markdown to prevent XSS/code execution#53
Conversation
… execution 🎯 What: Added sanitizeMarkdownContent utility to sanitize title, excerpt, and notes fetched from Raindrop.⚠️ Risk: If unfixed, malicious input in Raindrop bookmarks (like inline scripts, dataview blocks, or HTML with inline events) could execute arbitrary code when rendered inside the Obsidian note. 🛡️ Solution: The sanitizeMarkdownContent function neutralizes executable code blocks, strips dangerous HTML tags and inline event handlers, and defangs dangerous URL protocols while preserving the structure of the Markdown. Co-authored-by: frostmute <989225+frostmute@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Code Review
This pull request introduces a sanitizeMarkdownContent utility to prevent XSS and arbitrary code execution within Obsidian by neutralizing dangerous HTML tags, scripts, and plugin-specific executable blocks. Feedback focuses on refining the sanitization regex to prevent bypasses via whitespace and to avoid over-matching legitimate text. Additionally, it was noted that sanitizing already YAML-escaped strings leads to rendering issues, so raw data should be used for these fields instead.
| // Handle HTML entities that might try to bypass the above (e.g. javascript:) | ||
| // We do a simple pass to decode basic entities and then check for dangerous protocols | ||
| // If we decode it and it matches a dangerous protocol, we replace the whole entity string with '' | ||
| sanitized = sanitized.replace(/(javascript|vbscript|data)(?:&#[0-9]+;|&#x[0-9a-fA-F]+;|&[a-zA-Z]+;)/gi, (match) => { |
There was a problem hiding this comment.
This regex is missing an optional whitespace check (\s*) between the protocol name and the HTML entity. While the literal colon check on line 60 includes \s*, an attacker could bypass this rule using a payload like javascript :. Additionally, note that this approach only catches obfuscation of the colon; obfuscating the protocol name itself (e.g., javascript:) would still bypass the filter.
| sanitized = sanitized.replace(/(javascript|vbscript|data)(?:&#[0-9]+;|&#x[0-9a-fA-F]+;|&[a-zA-Z]+;)/gi, (match) => { | |
| sanitized = sanitized.replace(/(javascript|vbscript|data)\s*(?:&#[0-9]+;|&#x[0-9a-fA-F]+;|&[a-zA-Z]+;)/gi, (match) => { |
| } | ||
| if (templateData.note) { | ||
| noteBody += `## Notes\n${templateData.note}\n\n`; | ||
| noteBody += `## Notes\n${sanitizeMarkdownContent(templateData.note)}\n\n`; |
There was a problem hiding this comment.
The templateData.note property has already been processed by escapeYamlString (at line 873), which is intended for YAML frontmatter. Applying sanitizeMarkdownContent to a YAML-escaped string and then appending it to the Markdown body will lead to incorrect rendering of special characters (e.g., extra backslashes or quotes). You should use the raw raindrop.note content here instead.
| noteBody += `## Notes\n${sanitizeMarkdownContent(templateData.note)}\n\n`; | |
| noteBody += `## Notes | |
| ${sanitizeMarkdownContent(raindrop.note || '')} | |
| `; |
| parts.push(`- ${sanitizeMarkdownContent(text).replace(NEWLINE_REGEX, ' ')}\n`); | ||
| if (note) { | ||
| parts.push(` *Note:* ${note.replace(NEWLINE_REGEX, ' ')}\n`); | ||
| parts.push(` *Note:* ${sanitizeMarkdownContent(note).replace(NEWLINE_REGEX, ' ')}\n`); |
There was a problem hiding this comment.
Similar to the issue with the notes field, h.text and h.note retrieved from templateData.highlights have already been YAML-escaped (lines 891-892). Sanitizing these escaped strings for use in the Markdown body will cause rendering artifacts. It is better to sanitize the raw highlight data directly from raindrop.highlights.
const rawHighlight = raindrop.highlights?.[i];
const text = typeof rawHighlight?.text === 'string' ? rawHighlight.text : '';
const note = typeof rawHighlight?.note === 'string' ? rawHighlight.note : '';
parts.push(`- ${sanitizeMarkdownContent(text).replace(NEWLINE_REGEX, ' ')}
`);
if (note) {
parts.push(` *Note:* ${sanitizeMarkdownContent(note).replace(NEWLINE_REGEX, ' ')}
`);
}| sanitized = sanitized.replace(dangerousTagsRegex, ''); | ||
|
|
||
| // 4. Remove inline event handlers (onerror, onclick, etc.) from ANY remaining HTML tags | ||
| sanitized = sanitized.replace(/\bon[a-z]+\s*=\s*(?:'[^']*'|"[^"]*"|[^\s>]+)/gi, ''); |
There was a problem hiding this comment.
The regex \bon[a-z]+\s*= is overly broad and will match legitimate words starting with 'on' that are followed by an equals sign, such as online=true, ongoing=false, or only=1. This can lead to unintended data loss or corruption in the generated notes. It is safer to target specific known event handlers or ensure the match is part of an HTML tag.
| sanitized = sanitized.replace(/\bon[a-z]+\s*=\s*(?:'[^']*'|"[^"]*"|[^\s>]+)/gi, ''); | |
| sanitized = sanitized.replace(/\b(onclick|onerror|onload|onmouseover|onfocus|onblur|onsubmit|onchange|oninput)\s*=\s*(?:'[^']*'|"[^"]*"|[^\s>]+)/gi, ''); |
🎯 What: Added
⚠️ Risk: If unfixed, malicious input in Raindrop bookmarks (like inline scripts, dataview blocks, or HTML with inline events) could execute arbitrary code when rendered inside the Obsidian note.
sanitizeMarkdownContentutility to sanitize title, excerpt, and notes fetched from Raindrop.🛡️ Solution: The
sanitizeMarkdownContentfunction neutralizes executable code blocks, strips dangerous HTML tags and inline event handlers, and defangs dangerous URL protocols while preserving the structure of the Markdown.PR created automatically by Jules for task 12993720941921711554 started by @frostmute