-
Notifications
You must be signed in to change notification settings - Fork 3
What is the best way to skip any URL from crawl if already crawled? #2
Copy link
Copy link
Open
Description
Hello Ashwanthkumar
Thanks for your post and quick respond to previous queries.
I have a new query that how can I skip specific URL(s) from crawling if it was already crawled previously? That means I want to crawl particular website but I want to skip few URLs which are already crawled first time and that crawling process was stopped somehow and now I need to rerun the process so I want to skip those URLs. Actually I don't want to hit the website I am crawling for such URLs.
I find this code inside 'phpcrawler.class.php' file:
`// Request URL (crawl())
unset($page_data);
if (!isset($this->urls_to_crawl[$pri_level][$key]["referer_url"]))
{
$this->urls_to_crawl[$pri_level][$key]["referer_url"] = "";
}
$page_data = $this->pageRequest->receivePage($this->urls_to_crawl[$pri_level][$key]["url_rebuild"],
$this->urls_to_crawl[$pri_level][$key]["referer_url"]);`
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels