Skip to content

Some taxa on the extinct tree have duplicate IDs #111

@davidebbo

Description

@davidebbo

In the extinct tree, we use QIDs instead of OTTs as primary ID, since too many OTTs don't exist.

In some cases, we end up with multiple nodes incorrectly having the same QID. For instance, both Тheropoda and Averostra get mapped to Q188438, which is the correct ID for Тheropoda. But there exists a distinct QID for Averostra, Q4828332.

To explain why this happens, let's look at how we get QIDs:

  • We start with the Wikipedia (not Wikidata!) page name
  • We request its data using the Wikipedia API
  • We get the QID from that data
  • We also get the taxon's data range from that Wikipedia data (specifically the taxobox)

The challenge is that some Wikipedia pages don't actually exist, which is the situation for Averostra. Instead, going to https://en.wikipedia.org/wiki/Averostra just redirects to https://en.wikipedia.org/wiki/Theropoda. It's not because they are synonyms, but more that some articles end up covering multiple taxa. It's not that uncommon.

We need to better detect this situation and try to somehow map to the correct.

/cc @hyanwong

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions