Hi, there, I got linked to this interesting project by my colleague Efe!
I see you're using the Pure API to retrieve the datasets by grabbing publications -> linked datasets -> datasets themselves.
Did you know you can use the public OAI-PMH endpoint of your repository to harvest this data directly, without API keys or rate limits?
The endpoint is here for the VU: https://research.vu.nl/ws/oai?verb=ListRecords&metadataPrefix=oai_cerif_openaire&set=datasets:all
I'm using the metadataPrefix oai_cerif_openaire here because this includes the internal pure uuid in each entry, which could be used to retrieve more detailed/non public info from the API if needed, plus if you retrieve the publications as well you can use the uuid for matching them up with the related_to field.
Most institutes around the world have their own OAI-PMH endpoints, especially in Europe in order to facilitate OpenAIRE harvesting; but not all support the same functionality. You can check using the base function calls to get the available sets of records & metadataformats, here for the VU endpoint:
https://research.vu.nl/ws/oai?verb=ListSets
https://research.vu.nl/ws/oai?verb=ListMetadataFormats
Unfortunately, in my experience not many repos supply datasets as a separate item, nor do they always include detailed metadata, but yours (and ours at https://ris.utwente.nl/ws/oai) do!
This all uses the ancient but well documented OAI-PMH protocol. You can read more about the OpenAIRE specs for institute repos here , the (also ancient) CERIF specifications here , and the standard metadataformat (dublin core) specs are here
I'm working on a more general harvester/aggregrator for research metadata, the source can be found here, and I did a short talk recently for the OpenAlex community meetup, which you can view here. Feel free to let me know if I can help out somewhere!
Hi, there, I got linked to this interesting project by my colleague Efe!
I see you're using the Pure API to retrieve the datasets by grabbing publications -> linked datasets -> datasets themselves.
Did you know you can use the public OAI-PMH endpoint of your repository to harvest this data directly, without API keys or rate limits?
The endpoint is here for the VU: https://research.vu.nl/ws/oai?verb=ListRecords&metadataPrefix=oai_cerif_openaire&set=datasets:all
I'm using the metadataPrefix oai_cerif_openaire here because this includes the internal pure uuid in each entry, which could be used to retrieve more detailed/non public info from the API if needed, plus if you retrieve the publications as well you can use the uuid for matching them up with the related_to field.
Most institutes around the world have their own OAI-PMH endpoints, especially in Europe in order to facilitate OpenAIRE harvesting; but not all support the same functionality. You can check using the base function calls to get the available sets of records & metadataformats, here for the VU endpoint:
https://research.vu.nl/ws/oai?verb=ListSets
https://research.vu.nl/ws/oai?verb=ListMetadataFormats
Unfortunately, in my experience not many repos supply datasets as a separate item, nor do they always include detailed metadata, but yours (and ours at https://ris.utwente.nl/ws/oai) do!
This all uses the ancient but well documented OAI-PMH protocol. You can read more about the OpenAIRE specs for institute repos here , the (also ancient) CERIF specifications here , and the standard metadataformat (dublin core) specs are here
I'm working on a more general harvester/aggregrator for research metadata, the source can be found here, and I did a short talk recently for the OpenAlex community meetup, which you can view here. Feel free to let me know if I can help out somewhere!