Skip to content

feat(contests): Cache full contest details from WA7BNM#259

Draft
arunderwood wants to merge 2 commits intomainfrom
arunderwood/contest-details
Draft

feat(contests): Cache full contest details from WA7BNM#259
arunderwood wants to merge 2 commits intomainfrom
arunderwood/contest-details

Conversation

@arunderwood
Copy link
Owner

@arunderwood arunderwood commented Jan 2, 2026

Summary

Implements GitHub issue #196 - Add a daily background job to scrape WA7BNM contest detail pages and populate bands, modes, sponsor, officialRulesUrl, and extended metadata.

Key changes:

  • New ContestSeries entity to model WA7BNM's series concept (deduplicates scraping)
  • Daily scheduled task (ContestSeriesRefreshTask) runs at 4am UTC
  • ContestSeriesClient scrapes detail pages using Jsoup
  • Change detection via "Revision Date" parsing to avoid redundant scraping
  • Resilience4j circuit breaker and retry for fault tolerance - Avoids placing additional load on WA7BNM if something goes wrong

WA7BNM Integration Details

Integration Overview

NextSkip integrates with WA7BNM Contest Calendar in two ways:

  1. iCal feed consumption (existing, every 6 hours) - Gets contest names, dates, and URLs
  2. Details page scraping (proposed new, daily at 4am UTC off-peak) - Enriches with bands, modes, sponsor, rules URL

Rate Limiting

  • 5-second minimum delay between requests (configurable via nextskip.contests.series.rate-limit-seconds)
  • Daily batch schedule - Not continuous scraping
  • Change detection via revision date - Avoids re-scraping unchanged content
  • Circuit breaker - Prevents hammering during outages (300s open state)

Data Usage

  • Bands, modes, exchange format, rules URL for each contest series
  • All contest data links back to contestcalendar.com via calendarSourceUrl
  • No data redistribution - Used only to enrich NextSkip dashboard display

Technical Implementation

  • Jsoup for HTML parsing (no JavaScript execution needed)
  • Resilience4j for fault tolerance (circuit breaker + retry)
  • PostgreSQL persistence with ContestSeries entity
  • db-scheduler for recurring task management

Files Changed

New Files

File Purpose
008-contest-series-table.yaml Database migration for contest_series table
ContestSeriesEntity.java JPA entity for series metadata
ContestSeriesRepository.java Data access layer
ContestSeriesDto.java DTO for scraped data
ContestSeriesClient.java HTML scraper with Jsoup
ContestSeriesRefreshTask.java Daily scheduled task
ContestSeriesClientTest.java Unit tests with WireMock
ContestSeriesRefreshTaskTest.java Task unit tests
ContestSeriesEntityIntegrationTest.java Integration tests

Modified Files

File Change
ContestEntity.java Add wa7bnmRef field
ContestRefreshService.java Extract wa7bnmRef from URL
ContestRepository.java Add query for distinct refs
application.yml Add circuit breaker/retry config
build.gradle Add Jsoup dependency

Test Plan

  • Unit tests for ContestSeriesClient parsing (87%+ branch coverage)
  • Unit tests for ContestSeriesRefreshTask logic (91%+ branch coverage)
  • Integration tests for ContestSeriesEntity persistence
  • All existing tests pass
  • Delta coverage meets 80% threshold (95.95% lines, 88.19% branches)
  • Manual verification: Run ./gradlew bootRun and verify no startup errors
  • Manual verification: Trigger series refresh and check logs

Add daily background job to scrape WA7BNM Contest Calendar detail pages
and populate bands, modes, sponsor, officialRulesUrl, and extended
metadata for contest series.

Key changes:
- Add ContestSeriesEntity to model WA7BNM's series concept
- Add ContestSeriesClient HTML scraper with Jsoup for parsing
- Add ContestSeriesRefreshTask scheduled daily at 4am UTC
- Add wa7bnmRef field to ContestEntity for series linkage
- Add database migration for contest_series tables
- Configure Resilience4j circuit breaker for fault tolerance

Rate limiting: 5 seconds between requests to be respectful to WA7BNM.
Change detection: Uses "Revision Date" field to skip unchanged series.

Closes #196
Add comprehensive tests for fallback parsing paths:
- Definition list format (dt/dd elements)
- Bold text format (b/strong elements)
- parseContestName fallback when no h1 element
- parseRulesUrl fallback for non-table links
- Edge cases for empty fields and null parents

Extract duplicate string literals to constants.
@codecov
Copy link

codecov bot commented Jan 2, 2026

Codecov Report

❌ Patch coverage is 91.27726% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.24%. Comparing base (3a3383f) to head (2937109).

Files with missing lines Patch % Lines
...extskip/contests/internal/ContestSeriesClient.java 89.77% 6 Missing and 12 partials ⚠️
...s/internal/scheduler/ContestSeriesRefreshTask.java 89.33% 7 Missing and 1 partial ⚠️
...xtskip/contests/internal/dto/ContestSeriesDto.java 50.00% 0 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##               main     #259      +/-   ##
============================================
- Coverage     92.33%   92.24%   -0.10%     
  Complexity      768      768              
============================================
  Files           111      115       +4     
  Lines          2441     2759     +318     
  Branches        291      342      +51     
============================================
+ Hits           2254     2545     +291     
- Misses          134      146      +12     
- Partials         53       68      +15     
Flag Coverage Δ
backend 92.97% <91.27%> (-0.23%) ⬇️
frontend 87.90% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ests/internal/scheduler/ContestRefreshService.java 94.87% <100.00%> (+1.53%) ⬆️
...kip/contests/persistence/entity/ContestEntity.java 100.00% <100.00%> (ø)
...ntests/persistence/entity/ContestSeriesEntity.java 100.00% <100.00%> (ø)
...xtskip/contests/internal/dto/ContestSeriesDto.java 50.00% <50.00%> (ø)
...s/internal/scheduler/ContestSeriesRefreshTask.java 89.33% <89.33%> (ø)
...extskip/contests/internal/ContestSeriesClient.java 89.77% <89.77%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant