Add YouTube transcript ingestion for same-day council meeting summaries#86
Merged
AndreRobitaille merged 12 commits intomasterfrom Apr 9, 2026
Merged
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t council meetings Queries for council/work session meetings within 48 hours, fetches the Two Rivers WI YouTube channel stream list via yt-dlp, matches video titles to meeting dates, and enqueues Documents::DownloadTranscriptJob for each match. Also adds a stub DownloadTranscriptJob placeholder for Task 4. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements yt-dlp-based caption download, SRT-to-plaintext parsing, MeetingDocument creation, and conditional SummarizeMeetingJob enqueue. Includes full Minitest coverage (5 tests: happy path, idempotency, summarization gate, and yt-dlp failure handling). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Document priority is now minutes > transcript > packet. When minutes exist, transcript text is appended as supplementary context. When only a transcript exists, it becomes the primary input with summary_type "transcript_recap". The source_type field in generation_data tracks which input combination was used. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shows a cool-toned informational banner when the summary is based on the video recording instead of official minutes. Automatically removed when minutes arrive and the summary is regenerated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Validates video_url matches YouTube URL pattern before passing to yt-dlp. Brakeman false positive ignored since Open3.capture3 with array arguments doesn't use a shell. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add upper time bound to DiscoverTranscriptsJob candidate query (exclude future meetings) - Add test for URL validation rejection in DownloadTranscriptJob - Assert minutes_with_transcript source_type in combined minutes+transcript test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
yt-dlpDesign
docs/superpowers/specs/2026-04-09-youtube-transcript-ingestion-design.mddocs/superpowers/plans/2026-04-09-youtube-transcript-ingestion.mdChanges
Scrapers::DiscoverTranscriptsJob(finds YouTube videos for recent council meetings),Documents::DownloadTranscriptJob(fetches auto-captions, creates MeetingDocument)SummarizeMeetingJob(transcript priority tier, supplementary context, source_type tracking),DiscoverMeetingsJob(triggers transcript discovery),MeetingsController(finds transcript summaries), meeting show view (transcript banner + document display)yt-dlpadded to Dockerfile, Brakeman ignore for false positive on safe Open3.capture3 usageMeeting#document_statusgains:transcripttier (minutes > packet > transcript > agenda)How it works
Test plan
bin/rails test— 437 tests, 0 failuresbin/rubocop— 0 offensesbin/ci— all checks pass🤖 Generated with Claude Code