Skip to content

Fix RSS discovery for direct feed URLs and rel=self links#10

Merged
JulienTant merged 1 commit intomainfrom
fix/rss-discovery-xml-content-type-and-rel-self
Apr 3, 2026
Merged

Fix RSS discovery for direct feed URLs and rel=self links#10
JulienTant merged 1 commit intomainfrom
fix/rss-discovery-xml-content-type-and-rel-self

Conversation

@JulienTant
Copy link
Copy Markdown
Owner

@JulienTant JulienTant commented Apr 3, 2026

Summary

  • When a URL already returns a feed content-type (application/rss+xml, application/atom+xml, application/feed+json), DiscoverFeedURL now returns it directly instead of trying to parse HTML
  • Added fallback to check rel="self" link tags in addition to rel="alternate", since some feeds (e.g. TechCrunch tag feeds) use self-referencing links
  • Added two new unit tests covering both cases

Credit

Based on upstream Hyaxia/blogwatcher#14 by @carlotran4, adapted to our fork's patterns (context threading, testify, error handling in tests, etc.).

Test plan

  • TestDiscoverFeedURL_XMLContentType - verifies direct feed URL return for RSS content-type
  • TestDiscoverFeedURL_RelSelf - verifies discovery via rel="self" link tags
  • Full test suite passes (24 tests)
  • golangci-lint run reports 0 issues

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Improved RSS feed discovery with Content-Type header inspection and additional feed link pattern support for more reliable feed identification.

DiscoverFeedURL now handles two additional cases:

1. When a URL already returns a feed content-type (application/rss+xml,
   application/atom+xml, or application/feed+json), return it directly
   instead of trying to parse HTML.

2. When HTML pages use rel="self" instead of rel="alternate" for their
   feed link tags, check both attributes.

This fixes discovery for feeds like TechCrunch tag/category feeds which
return XML directly and may use rel="self" links.

Based on upstream PR #14 by @carlotran4 (Hyaxia/blogwatcher#14),
adapted to our fork's patterns (context threading, testify, etc.).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

📝 Walkthrough

Walkthrough

The DiscoverFeedURL function now inspects the HTTP response's Content-Type header and returns the URL directly if it matches a feed type. When parsing HTML for feed links, it adds fallback logic to search for rel='self' links when rel='alternate' links are not found. Two new test cases validate this behavior.

Changes

Cohort / File(s) Summary
Feed Discovery Logic
internal/rss/rss.go
Modified DiscoverFeedURL to check Content-Type header for feed types (application/rss+xml, application/atom+xml, application/feed+json) and return directly on match. Added fallback to search link[rel='self'][type='<feedType>'] when rel='alternate' links are absent.
Discovery Tests
internal/rss/rss_test.go
Added test case for direct RSS response with explicit Content-Type: application/rss+xml; charset=UTF-8. Added test case for rel='self' link discovery and URL resolution.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Hop through the feeds with better sight,
Content-Types guide us to the right,
When alternate links don't appear,
Self-relations bring us cheer! 🔗

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: fixing RSS discovery for direct feed URLs and rel=self links, which are the core improvements in the changeset.
Description check ✅ Passed The description covers all required template sections with sufficient detail: clear summary of changes, explicit test plan with checkbox completion, and proper credit attribution.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/rss-discovery-xml-content-type-and-rel-self

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/rss/rss.go`:
- Around line 95-101: The mime.ParseMediaType call currently ignores parse
errors; update the code around mime.ParseMediaType in internal/rss/rss.go so
that when err != nil you do not silently continue but instead handle it per
project rule — either log the parse error with the package's logger (including
context like the contentType and candidate blogURL) or propagate the error to
the caller (adjust the function signature/returns if needed). Keep the existing
mediaType checks (application/rss+xml, application/atom+xml,
application/feed+json) unchanged when err == nil; only add explicit error
handling for the ParseMediaType error path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fbc279ac-b94f-4ce6-a7bf-8a65b911f7ed

📥 Commits

Reviewing files that changed from the base of the PR and between 2972f35 and 22c44d9.

📒 Files selected for processing (2)
  • internal/rss/rss.go
  • internal/rss/rss_test.go

@JulienTant JulienTant merged commit b491488 into main Apr 3, 2026
2 checks passed
@JulienTant JulienTant deleted the fix/rss-discovery-xml-content-type-and-rel-self branch April 3, 2026 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant