🔒 [security fix] Fix URL Injection in SkillFetcher#5
Conversation
This commit addresses a security vulnerability where user-provided inputs (author and name) were directly formatted into URLs without proper escaping. We now use `urllib.parse.quote` with `safe=''` to sanitize these inputs, preventing potential URL injection and path traversal attacks. Affected methods: - fetch_skill_from_github - fetch_skill_from_clawhub_website - discover_author_via_github - fetch_skill (print statement) New tests in `tests/test_fetcher.py` verify the correct escaping of special characters. Co-authored-by: frostmute <989225+frostmute@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Code Review
This pull request introduces URL encoding for skill and author identifiers in the SkillFetcher class to prevent malformed requests and adds a corresponding test suite. Feedback suggests further hardening the security by explicitly encoding dots to prevent path traversal and wrapping search terms in quotes for the GitHub Search API. Additionally, it is recommended to refactor the test mocking strategy to avoid global side effects and remove unused imports.
| quoted_author = urllib.parse.quote(author, safe='') | ||
| quoted_name = urllib.parse.quote(name, safe='') | ||
| url = self.CLAW_HUB_RAW_GITHUB_URL.format(author=quoted_author, name=quoted_name) |
There was a problem hiding this comment.
While urllib.parse.quote(..., safe='') correctly encodes slashes to prevent path traversal, it does not encode dots (.) by default. This allows an attacker to provide .. as an author or name, which can still lead to path traversal within the GitHub repository (e.g., accessing files outside the skills/ directory). Explicitly encoding dots or validating the input is recommended to fully mitigate this risk.
| quoted_author = urllib.parse.quote(author, safe='') | |
| quoted_name = urllib.parse.quote(name, safe='') | |
| url = self.CLAW_HUB_RAW_GITHUB_URL.format(author=quoted_author, name=quoted_name) | |
| quoted_author = urllib.parse.quote(author, safe='').replace('.', '%2E') | |
| quoted_name = urllib.parse.quote(name, safe='').replace('.', '%2E') | |
| url = self.CLAW_HUB_RAW_GITHUB_URL.format(author=quoted_author, name=quoted_name) |
| quoted_name = urllib.parse.quote(name, safe='') | ||
| url = self.CLAW_HUB_WEBSITE_URL.format(name=quoted_name) |
There was a problem hiding this comment.
Similar to the GitHub fetcher, dots are not encoded by urllib.parse.quote, which could allow path traversal via .. sequences in the name parameter. Encoding dots ensures the request remains constrained to the intended path on the target server.
| quoted_name = urllib.parse.quote(name, safe='') | |
| url = self.CLAW_HUB_WEBSITE_URL.format(name=quoted_name) | |
| quoted_name = urllib.parse.quote(name, safe='').replace('.', '%2E') | |
| url = self.CLAW_HUB_WEBSITE_URL.format(name=quoted_name) |
| quoted_name = urllib.parse.quote(name, safe='') | ||
| url = self.GITHUB_SEARCH_API_URL.format(name=quoted_name) |
There was a problem hiding this comment.
The GitHub Search API interprets characters like spaces and + as term separators in the q parameter. Even when URL-encoded, these characters can break the path: qualifier logic. Wrapping the {name} placeholder in double quotes within the search query ensures it is treated as a single literal path segment by the search engine.
| quoted_name = urllib.parse.quote(name, safe='') | |
| url = self.GITHUB_SEARCH_API_URL.format(name=quoted_name) | |
| quoted_name = urllib.parse.quote(name, safe='') | |
| url = self.GITHUB_SEARCH_API_URL.format(name=f'"{quoted_name}"') |
| mock_requests = MagicMock() | ||
| sys.modules['requests'] = mock_requests | ||
| sys.modules['bs4'] = MagicMock() |
There was a problem hiding this comment.
Mocking dependencies by directly modifying sys.modules is a global side effect that can interfere with other tests and is generally considered brittle. It is better to use unittest.mock.patch to locally mock the requests and bs4 modules. Additionally, the patch import on line 10 is currently unused.
🎯 What:
Fixed a URL injection / unescaped user input vulnerability in
claw2manus/fetcher.py.Malicious users could potentially craft
authorornamestrings containing special characters (like../or#) to redirect requests to unexpected URLs or access sensitive files via path traversal on the target servers (GitHub, clawhub.ai).🛡️ Solution:
Used
urllib.parse.quote(..., safe='')to properly escape all user-controllable variables before they are interpolated into URL templates. This ensures that any special characters are safely URL-encoded, treating them as literal path segments rather than control characters.Verified the fix with a new test suite
tests/test_fetcher.pyand ensured core functionality is preserved.PR created automatically by Jules for task 11110538775753625202 started by @frostmute