Skip to content

Use OAI-PMH endpoint instead of API to retrieve datasets #2

@utsmok

Description

@utsmok

Hi, there, I got linked to this interesting project by my colleague Efe!
I see you're using the Pure API to retrieve the datasets by grabbing publications -> linked datasets -> datasets themselves.
Did you know you can use the public OAI-PMH endpoint of your repository to harvest this data directly, without API keys or rate limits?
The endpoint is here for the VU: https://research.vu.nl/ws/oai?verb=ListRecords&metadataPrefix=oai_cerif_openaire&set=datasets:all

I'm using the metadataPrefix oai_cerif_openaire here because this includes the internal pure uuid in each entry, which could be used to retrieve more detailed/non public info from the API if needed, plus if you retrieve the publications as well you can use the uuid for matching them up with the related_to field.

Most institutes around the world have their own OAI-PMH endpoints, especially in Europe in order to facilitate OpenAIRE harvesting; but not all support the same functionality. You can check using the base function calls to get the available sets of records & metadataformats, here for the VU endpoint:
https://research.vu.nl/ws/oai?verb=ListSets
https://research.vu.nl/ws/oai?verb=ListMetadataFormats

Unfortunately, in my experience not many repos supply datasets as a separate item, nor do they always include detailed metadata, but yours (and ours at https://ris.utwente.nl/ws/oai) do!

This all uses the ancient but well documented OAI-PMH protocol. You can read more about the OpenAIRE specs for institute repos here , the (also ancient) CERIF specifications here , and the standard metadataformat (dublin core) specs are here

I'm working on a more general harvester/aggregrator for research metadata, the source can be found here, and I did a short talk recently for the OpenAlex community meetup, which you can view here. Feel free to let me know if I can help out somewhere!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions