Skip to content

Conversation

@suyash469
Copy link

Summary

The scraper was crashing with an unhandled exception when a connection timeout occurred (e.g., Read timed out). Additionally, there was a logic error in repo_scraping_utils.py where response.status_code was accessed inside an except block where response was not defined.

Changes

  • Refactored github_api_request in repofinder/scraping/repo_scraping_utils.py.
  • Added proper try-except blocks to catch requests.exceptions.RequestException and other connection errors.
  • Implemented a retry mechanism (max 3 retries) with a timeout increase to 30 seconds to handle unstable connections.
  • Fixed indentation and variable scope issues preventing the script from running continuously.

Testing

  • Ran the script locally; verified it now catches timeouts (logging them as warnings) instead of crashing the entire process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant