Skip to content

Conversation

hseg
Copy link

@hseg hseg commented Apr 15, 2025

The URL DOIs resolve to can move around, with redirects pointing to the new location. To make the tests more robust, only fail if the URLs differ after redirections.

See: https://www.crossref.org/blog/urls-and-dois-a-complicated-relationship/ and item 10 on https://pardalotus.tech/posts/2024-10-02-falsehoods-programmers-believe-about-dois/

@hseg hseg force-pushed the canonicalize_urls branch 4 times, most recently from 289b398 to c67beee Compare April 17, 2025 17:13
@hseg
Copy link
Author

hseg commented Apr 17, 2025

Note the PR is based on the https://github.com/papis/python-doi master, which hasn't been merged here -- merging this PR would do that as well. I'm not sure what the relation is between these two repositories, so I opened the PR in both.

@hseg hseg force-pushed the canonicalize_urls branch 3 times, most recently from 7cfe04a to b161e62 Compare June 29, 2025 14:40
hseg added 2 commits June 29, 2025 17:46
Also put in a fallback using requests, but it is hacky and only works
sometimes. cloudscraper stands a better chance of consistently being
able to get to the final URL
This eg makes it easier to spot which particular iteration breaks
@hseg hseg force-pushed the canonicalize_urls branch from b161e62 to 8e5f3c9 Compare June 29, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants