Use OAI-PMH endpoint instead of API to retrieve datasets

Hi, there, I got linked to this interesting project by my colleague Efe! 
I see you're using the Pure API to retrieve the datasets by grabbing publications -> linked datasets -> datasets themselves. 
Did you know you can use the public OAI-PMH endpoint of your repository to harvest this data directly, without API keys or rate limits?
The endpoint is here for the VU: [https://research.vu.nl/ws/oai?verb=ListRecords&metadataPrefix=oai_cerif_openaire&set=datasets:all](https://research.vu.nl/ws/oai?verb=ListRecords&metadataPrefix=oai_cerif_openaire&set=datasets:all)

I'm using the metadataPrefix oai_cerif_openaire here because this includes the internal pure uuid in each entry, which could be used to retrieve more detailed/non public info from the API if needed, plus if you retrieve the publications as well you can use the uuid for matching them up with the related_to field. 

Most institutes around the world have their own OAI-PMH endpoints, especially in Europe in order to facilitate OpenAIRE harvesting; but not all support the same functionality. You can check using the base function calls to get the available sets of records & metadataformats, here for the VU endpoint:
[https://research.vu.nl/ws/oai?verb=ListSets](https://research.vu.nl/ws/oai?verb=ListSets)
[https://research.vu.nl/ws/oai?verb=ListMetadataFormats
](https://research.vu.nl/ws/oai?verb=ListMetadataFormats)
Unfortunately, in my experience not many repos supply datasets as a separate item, nor do they always include detailed metadata, but yours (and ours at [https://ris.utwente.nl/ws/oai](https://ris.utwente.nl/ws/oai)) do!

This all uses the ancient but well [documented OAI-PMH protocol](https://www.openarchives.org/OAI/openarchivesprotocol.html). You can read more about the [OpenAIRE specs for institute repos here](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/latest/introduction.html) , the (also ancient) [CERIF specifications here](https://cerif.eurocris.org/vocab/html/) , and the standard metadataformat (dublin core) [specs are here](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/) 

I'm working on a more general harvester/aggregrator for research metadata, the source [can be found here](https://github.com/utsmok/MUS), and I did a short talk recently for the OpenAlex community meetup, [which you can view here](https://mustalk.samuelmok.cc/). Feel free to let me know if I can help out somewhere! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use OAI-PMH endpoint instead of API to retrieve datasets #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use OAI-PMH endpoint instead of API to retrieve datasets #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions