Summary
The scraper jobs parse the Two Rivers city website HTML to discover meetings and extract agenda data. If the city website changes its HTML structure (new CMS, redesign, layout tweaks), the scrapers will silently produce incorrect or empty results with no alerting.
Current Risk
Scrapers::DiscoverMeetingsJob parses table rows from the meetings listing page
Scrapers::ParseMeetingPageJob parses detail page structure for agenda items, documents, motions
Scrapers::ParseAgendaJob parses agenda HTML format
- None of these have tests or structural validation
- A website change could result in: zero meetings discovered, missing documents, lost agenda items — all silently
Proposed Mitigations
1. Structural Assertions in Scraper Jobs
Add validation checks that raise/log warnings when expected HTML elements are missing:
- Meetings page: Assert table with expected columns exists
- Detail page: Assert expected sections (agenda, documents, motions) are present
- Log warnings when a scrape run produces zero results or significantly fewer results than previous runs
2. Canary Test (Integration)
A test that hits the live Two Rivers website and validates that the HTML structure matches what the scrapers expect:
- Run periodically (not on every CI run — too slow and fragile)
- Validates: page loads, expected table structure exists, at least N meetings found
- Can be triggered manually:
bin/rails test test/integration/scraper_canary_test.rb
3. Monitoring/Alerting
- Track meetings discovered per scrape run
- Alert if a run discovers zero meetings (likely structural change)
- Alert if document download rate drops significantly
- Consider a simple admin dashboard metric or Solid Queue job failure tracking
4. Fixture-Based Regression Tests
- Save snapshots of real HTML pages as test fixtures
- Run scraper parsing against fixtures to catch regressions
- Update fixtures when intentional changes are made
Related
Summary
The scraper jobs parse the Two Rivers city website HTML to discover meetings and extract agenda data. If the city website changes its HTML structure (new CMS, redesign, layout tweaks), the scrapers will silently produce incorrect or empty results with no alerting.
Current Risk
Scrapers::DiscoverMeetingsJobparses table rows from the meetings listing pageScrapers::ParseMeetingPageJobparses detail page structure for agenda items, documents, motionsScrapers::ParseAgendaJobparses agenda HTML formatProposed Mitigations
1. Structural Assertions in Scraper Jobs
Add validation checks that raise/log warnings when expected HTML elements are missing:
2. Canary Test (Integration)
A test that hits the live Two Rivers website and validates that the HTML structure matches what the scrapers expect:
bin/rails test test/integration/scraper_canary_test.rb3. Monitoring/Alerting
4. Fixture-Based Regression Tests
Related