Skip to content

Scraper: 3-month backfill for missing minutes only #22

@AndreRobitaille

Description

@AndreRobitaille

Implement a constrained backfill policy to re-check recent meetings for missing minutes.

Acceptance criteria

  • Daily run backfills meetings from the last 1 year that are missing minutes_pdf.
  • For those meetings, re-parse the detail page and (re)download docs as needed.
  • Backfill should be chunked or throttled to avoid timeouts (e.g., process in batches with delays between requests).
  • No other automated scraping outside the daily cron.
  • Document the policy in README or ops notes.

Notes

  • Uses two-rivers.org as the source of truth.
  • 1-year window catches late-posted minutes and historical gaps.
  • Must be resilient to timeouts — consider batching or enqueueing individual meeting re-checks as separate jobs.

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions