Skip to content

Conversation

@ajparsons
Copy link
Contributor

This PR replaces the previous scraper to address the change in the london mayor/assembly website mysociety/theyworkforyou#1687

This is also adding some config files for docker and code linters. Linters are restricted to the london-mayors-question folder for the moment.

The scraper talks to the london site in two places:

  • Scrapers the search to get the slugs of all questions in a time range.
  • Fetches the details from the question page.

Because we have no way of knowing which questions have answers, all questions without answers need to be re-queried for an update.

The command to do this looks something like this:

questions.py fetch-unknown-questions --last-week fetch-unstored refresh-unanswered build-xml --outdir temp/

And a version of this has replaced the commented out lines in updatedaterange-parse.

It stores intermediate files in a json_cache directory. A initial populate will need to be done to catch up:

questions.py fetch-unknown-questions 2020-12-20

There have been some updates to the overall requirements.txt - which hopefully shouldn't cause wider problems.

Import running for all info since 2020-12-20 seems to work fine in TWFY:

image

@ajparsons ajparsons requested a review from dracos April 18, 2023 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant