Skip to content

Wr309567 crawl active courses#74

Open
kristian-94 wants to merge 7 commits intomasterfrom
wr309567
Open

Wr309567 crawl active courses#74
kristian-94 wants to merge 7 commits intomasterfrom
wr309567

Conversation

@kristian-94
Copy link
Contributor

@kristian-94 kristian-94 commented Apr 1, 2019

This is another option of limiting the scope of the crawler, which will allow it to be more focused on courses that are active. This only crawls courses that have an enddate in the future if enabled, and adds the options to only crawl courses that have a certain block enabled. This way we don't crawl unnecessary pages of which there could be many on a big site.

Kristian Ringer added 4 commits April 1, 2019 15:07
    We have SQL that will return us a valid queue item, and we don't
    need to iterate through to validate queue items before crawling
    them.
… the queue

        We add all the recent courses once the crawler starts a new
        cycle, this is the appropriate place to add all the seed URL's
        of each new course we want to start crawling.
kristian-94 and others added 3 commits April 2, 2019 12:18
  - We make a check before parsing the html to check if this is a
  recent course, so we don't need to have a different queue query.
@brendanheywood brendanheywood changed the title Wr309567 Wr309567 crawl active courses Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant