Added simple implementation of code to skip URLs already processed #71
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a simple implementation of a feature to skip URLs that have already been processed (Issue #70). It is relatively naive, but should be useful.
It adds a new command-line option (
-kor--skipexisting) which, if enabled, means that quickscrape checks to see if the output folder it is going to use for a URL already exists, and if so then skips that URL. It will also skip the rate-limiting at that point (as we don't need to rate-limit if we haven't actually downloaded any URLs), and reinstate the rate-limiting next time it actually downloads a URL.This is my first PR written in javascript, so I may have done some completely stupid things! Feedback would be greatly appreciated.