-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Currently I fetch the 500 oldest links in the HTML cache and if any of them are accessed, the cache is ignored and they are rescraped. It seems a large chunk of entries are outdated/invalid though so only around 120 seem to have been rescraped (in the last run). With 2500 courses/sem, updating the last 2 semesters takes way too long.
This issue consists of two things:
- Adding a script to clean up all urls in the cache that will never be accessed.
- Adjusting the system for rescraping. Potentially by even making use of the
flaggedattribute. Might make sense to simply flag pages to be rescraped and checking that at most 500 are actually rescraped.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working