In the extinct tree, we use QIDs instead of OTTs as primary ID, since too many OTTs don't exist.
In some cases, we end up with multiple nodes incorrectly having the same QID. For instance, both Тheropoda and Averostra get mapped to Q188438, which is the correct ID for Тheropoda. But there exists a distinct QID for Averostra, Q4828332.
To explain why this happens, let's look at how we get QIDs:
- We start with the Wikipedia (not Wikidata!) page name
- We request its data using the Wikipedia API
- We get the QID from that data
- We also get the taxon's data range from that Wikipedia data (specifically the taxobox)
The challenge is that some Wikipedia pages don't actually exist, which is the situation for Averostra. Instead, going to https://en.wikipedia.org/wiki/Averostra just redirects to https://en.wikipedia.org/wiki/Theropoda. It's not because they are synonyms, but more that some articles end up covering multiple taxa. It's not that uncommon.
We need to better detect this situation and try to somehow map to the correct.
/cc @hyanwong
In the extinct tree, we use QIDs instead of OTTs as primary ID, since too many OTTs don't exist.
In some cases, we end up with multiple nodes incorrectly having the same QID. For instance, both
ТheropodaandAverostraget mapped to Q188438, which is the correct ID forТheropoda. But there exists a distinct QID forAverostra, Q4828332.To explain why this happens, let's look at how we get QIDs:
The challenge is that some Wikipedia pages don't actually exist, which is the situation for
Averostra. Instead, going to https://en.wikipedia.org/wiki/Averostra just redirects to https://en.wikipedia.org/wiki/Theropoda. It's not because they are synonyms, but more that some articles end up covering multiple taxa. It's not that uncommon.We need to better detect this situation and try to somehow map to the correct.
/cc @hyanwong