Skip to content

🔒 Fix: Sanitize user content rendered to Markdown to prevent XSS/code execution#53

Open
frostmute wants to merge 1 commit intomainfrom
jules-12993720941921711554-e9e72638
Open

🔒 Fix: Sanitize user content rendered to Markdown to prevent XSS/code execution#53
frostmute wants to merge 1 commit intomainfrom
jules-12993720941921711554-e9e72638

Conversation

@frostmute
Copy link
Copy Markdown
Owner

🎯 What: Added sanitizeMarkdownContent utility to sanitize title, excerpt, and notes fetched from Raindrop.
⚠️ Risk: If unfixed, malicious input in Raindrop bookmarks (like inline scripts, dataview blocks, or HTML with inline events) could execute arbitrary code when rendered inside the Obsidian note.
🛡️ Solution: The sanitizeMarkdownContent function neutralizes executable code blocks, strips dangerous HTML tags and inline event handlers, and defangs dangerous URL protocols while preserving the structure of the Markdown.


PR created automatically by Jules for task 12993720941921711554 started by @frostmute

… execution

🎯 What: Added sanitizeMarkdownContent utility to sanitize title, excerpt, and notes fetched from Raindrop.
⚠️ Risk: If unfixed, malicious input in Raindrop bookmarks (like inline scripts, dataview blocks, or HTML with inline events) could execute arbitrary code when rendered inside the Obsidian note.
🛡️ Solution: The sanitizeMarkdownContent function neutralizes executable code blocks, strips dangerous HTML tags and inline event handlers, and defangs dangerous URL protocols while preserving the structure of the Markdown.

Co-authored-by: frostmute <989225+frostmute@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a sanitizeMarkdownContent utility to prevent XSS and arbitrary code execution within Obsidian by neutralizing dangerous HTML tags, scripts, and plugin-specific executable blocks. Feedback focuses on refining the sanitization regex to prevent bypasses via whitespace and to avoid over-matching legitimate text. Additionally, it was noted that sanitizing already YAML-escaped strings leads to rendering issues, so raw data should be used for these fields instead.

// Handle HTML entities that might try to bypass the above (e.g. javascript&#58;)
// We do a simple pass to decode basic entities and then check for dangerous protocols
// If we decode it and it matches a dangerous protocol, we replace the whole entity string with ''
sanitized = sanitized.replace(/(javascript|vbscript|data)(?:&#[0-9]+;|&#x[0-9a-fA-F]+;|&[a-zA-Z]+;)/gi, (match) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This regex is missing an optional whitespace check (\s*) between the protocol name and the HTML entity. While the literal colon check on line 60 includes \s*, an attacker could bypass this rule using a payload like javascript &colon;. Additionally, note that this approach only catches obfuscation of the colon; obfuscating the protocol name itself (e.g., j&#x61;vascript:) would still bypass the filter.

Suggested change
sanitized = sanitized.replace(/(javascript|vbscript|data)(?:&#[0-9]+;|&#x[0-9a-fA-F]+;|&[a-zA-Z]+;)/gi, (match) => {
sanitized = sanitized.replace(/(javascript|vbscript|data)\s*(?:&#[0-9]+;|&#x[0-9a-fA-F]+;|&[a-zA-Z]+;)/gi, (match) => {

Comment thread src/main.ts
}
if (templateData.note) {
noteBody += `## Notes\n${templateData.note}\n\n`;
noteBody += `## Notes\n${sanitizeMarkdownContent(templateData.note)}\n\n`;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The templateData.note property has already been processed by escapeYamlString (at line 873), which is intended for YAML frontmatter. Applying sanitizeMarkdownContent to a YAML-escaped string and then appending it to the Markdown body will lead to incorrect rendering of special characters (e.g., extra backslashes or quotes). You should use the raw raindrop.note content here instead.

Suggested change
noteBody += `## Notes\n${sanitizeMarkdownContent(templateData.note)}\n\n`;
noteBody += `## Notes
${sanitizeMarkdownContent(raindrop.note || '')}
`;

Comment thread src/main.ts
Comment on lines +1127 to +1129
parts.push(`- ${sanitizeMarkdownContent(text).replace(NEWLINE_REGEX, ' ')}\n`);
if (note) {
parts.push(` *Note:* ${note.replace(NEWLINE_REGEX, ' ')}\n`);
parts.push(` *Note:* ${sanitizeMarkdownContent(note).replace(NEWLINE_REGEX, ' ')}\n`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the issue with the notes field, h.text and h.note retrieved from templateData.highlights have already been YAML-escaped (lines 891-892). Sanitizing these escaped strings for use in the Markdown body will cause rendering artifacts. It is better to sanitize the raw highlight data directly from raindrop.highlights.

                            const rawHighlight = raindrop.highlights?.[i];
                            const text = typeof rawHighlight?.text === 'string' ? rawHighlight.text : '';
                            const note = typeof rawHighlight?.note === 'string' ? rawHighlight.note : '';

                            parts.push(`- ${sanitizeMarkdownContent(text).replace(NEWLINE_REGEX, ' ')}
`);
                            if (note) {
                                parts.push(`  *Note:* ${sanitizeMarkdownContent(note).replace(NEWLINE_REGEX, ' ')}
`);
                            }

sanitized = sanitized.replace(dangerousTagsRegex, '');

// 4. Remove inline event handlers (onerror, onclick, etc.) from ANY remaining HTML tags
sanitized = sanitized.replace(/\bon[a-z]+\s*=\s*(?:'[^']*'|"[^"]*"|[^\s>]+)/gi, '');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regex \bon[a-z]+\s*= is overly broad and will match legitimate words starting with 'on' that are followed by an equals sign, such as online=true, ongoing=false, or only=1. This can lead to unintended data loss or corruption in the generated notes. It is safer to target specific known event handlers or ensure the match is part of an HTML tag.

Suggested change
sanitized = sanitized.replace(/\bon[a-z]+\s*=\s*(?:'[^']*'|"[^"]*"|[^\s>]+)/gi, '');
sanitized = sanitized.replace(/\b(onclick|onerror|onload|onmouseover|onfocus|onblur|onsubmit|onchange|oninput)\s*=\s*(?:'[^']*'|"[^"]*"|[^\s>]+)/gi, '');

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant