Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions claw2manus/fetcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ class SkillFetcher:
def fetch_skill_from_github(self, author: str, name: str) -> str | None:
url = self.CLAW_HUB_RAW_GITHUB_URL.format(author=author, name=name)
try:
response = requests.get(url)
response = requests.get(url, timeout=10)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Using a single value for timeout sets both the connection and read timeouts to that value. It is recommended to use a tuple (e.g., (3.05, 10)) to specify separate timeouts. The connection timeout of 3.05 is slightly larger than a multiple of 3 (the default TCP retransmission window), which is a common best practice. Also, consider defining this timeout as a class-level constant to avoid repeating the magic number 10 across multiple methods.

Suggested change
response = requests.get(url, timeout=10)
response = requests.get(url, timeout=(3.05, 10))

response.raise_for_status() # Raise an exception for HTTP errors
return response.text
except requests.exceptions.RequestException as e:
Expand All @@ -22,7 +22,7 @@ def fetch_skill_from_clawhub_website(self, name: str) -> str | None:
"""Scrapes SKILL.md content from clawhub.ai."""
url = self.CLAW_HUB_WEBSITE_URL.format(name=name)
try:
response = requests.get(url)
response = requests.get(url, timeout=10)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Consider using a tuple for the timeout parameter to distinguish between connection and read timeouts, consistent with best practices for network requests.

Suggested change
response = requests.get(url, timeout=10)
response = requests.get(url, timeout=(3.05, 10))

response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')

Expand All @@ -46,7 +46,7 @@ def discover_author_via_github(self, name: str) -> str | None:
url = self.GITHUB_SEARCH_API_URL.format(name=name)
headers = {"Accept": "application/vnd.github.v3+json"}
try:
response = requests.get(url, headers=headers)
response = requests.get(url, headers=headers, timeout=10)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The GitHub Search API requires a User-Agent header for all requests. Without it, the API will likely return a 403 Forbidden error. It is recommended to include a descriptive User-Agent (e.g., the name of your application). Additionally, using a tuple for the timeout parameter is recommended for better granularity.

Suggested change
response = requests.get(url, headers=headers, timeout=10)
response = requests.get(url, headers={**headers, "User-Agent": "claw2manus"}, timeout=(3.05, 10))

response.raise_for_status()
data = response.json()
if data.get("total_count", 0) > 0:
Expand Down