In crawler.py, there's this code:
try:
response = urllib.request.urlopen(url)
except:
print('404 error')
return
However, 404 is not the only possible exception that can occur with urllib.request.urlopen().
Solution:
try:
response = urllib.request.urlopen(url)
except Exception:
print(repr(Exception))
return
This prints the correct exception message and avoids user confusion, as in the case of the closed issue about 404 errors for sites that exist, which can be due to other errors, such as an SSL certificate issue.