Python script to crawl WARC data and find what sites disallow Googlebot
Feel free to submit a pull request with any improvements
Source for crawl data: https://commoncrawl.org/the-data/get-started/
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Python script to crawl WARC data and find what sites disallow Googlebot
Feel free to submit a pull request with any improvements
Source for crawl data: https://commoncrawl.org/the-data/get-started/