scrawler already records the steps_from_start_page of individual websites. A structure recording which pages appear at which depth (from the start page) and on which sub-pages, in a sort of tree structure, could be extracted and saved in an XML file, for example.
This could be useful for understanding the structure of web domains.
scrawler already records the
steps_from_start_pageof individual websites. A structure recording which pages appear at which depth (from the start page) and on which sub-pages, in a sort of tree structure, could be extracted and saved in an XML file, for example.This could be useful for understanding the structure of web domains.