Fix duplicate feed links extracted from HTTP pages#331
Conversation
|
This change is also very welcome, but #219 was about another issue:
But don't close this PR, it is also nice UX touch. We need it as well (I will review it tomorrow). |
|
Maybe I misunderstood, but I remember how I found this small bug. We discussed it on Discord: if you open a site via HTTP, the feed link extracted from the anchor appears in the list as http, creating a duplicate of the same link that already exists as https. So I assumed that only this behavior needed to be fixed. Now I need to think about this subscription issue 😀 |
|
I think I see the core issue now. We use the feed URL itself as the filter key, and The clean solution would be to introduce a normalized identifier (like a As a simpler workaround without schema changes, we could try checking the feed with |
Or we can first check exact URL and on nothing found, make another check with other ( |
|
Thanks for UX improvment |
dedupeLinksfunction to ensure that feed links differing only by protocol are not duplicated;Motivation
When a user provides an HTTP URL, the fetched HTML may reference the same feed using both absolute HTTPS links and relative links resolved from the HTTP page. This resulted in duplicate feed candidates being returned.
Note
For manual testing, you can use this page. It contains three potential feeds that would appear as duplicates if you use HTTP without this fix.