Skip to content

Investigate Responsible Web Scraping enhancements #3

@dglttr

Description

@dglttr

Consider adding functionality related to responsible web scraping/crawling.

  • Respect crawl delay and other non-standard info from robots.txt
  • Include info from response headers and meta tags (see here)

Reference: https://www.zyte.com/blog/how-to-crawl-the-web-politely-with-scrapy/

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions