-
Notifications
You must be signed in to change notification settings - Fork 538
Description
I noticed that /modifyRegistrationMetadata API was forcing updates for some datasets that seemingly had valid and up-to-date metadata registered already.
The issue appears to be with garbled multi-byte UTF8 characters in the output of https://mds.datacite.org/metadata/.
It is definitely an issue with the GET API above, not with the initial registration POST. Since the characters are present, properly formatted in the output of the REST API, https://api.datacite.org/dois/:
# xml from the REST API:
curl "https://api.datacite.org/dois/10.7910/DVN/ZQU5DA?detail=true&affiliation=true" | jq . > ZQU5DA.json
grep '"xml"' ZQU5DA.json | sed 's/^.*xml": "//' | sed 's/".*$//' | base64 -d > ZQU5DA_decoded.xml
# MDS API:
curl -u '...redacted...' -H "Accept: application/xml; charset=UTF-8" "https://mds.datacite.org/metadata/10.7910/DVN/ZQU5DA" > ZQU5DA_mds.xml
# compare:
diff ZQU5DA_decoded.xml ZQU5DA_mds.xml
6,7c6,7
< <creatorName nameType="Personal">Trigo da Fonseca, Jéssica</creatorName>
< <givenName>Jéssica</givenName>
---
> <creatorName nameType="Personal">Trigo da Fonseca, J??ssica</creatorName>
> <givenName>J??ssica</givenName>
10c10
< <affiliation affiliationIdentifier="https://ror.org/04q8h6b75" schemeURI="https://ror.org" affiliationIdentifierScheme="ROR">Escola Brasileira de Administração Pública e de Empresas</affiliation>
---
> <affiliation affiliationIdentifier="https://ror.org/04q8h6b75" schemeURI="https://ror.org" affiliationIdentifierScheme="ROR">Escola Brasileira de Administra????o P??blica e de Empresas</affiliation>
17c17
< <affiliation>Universidade de Brasília</affiliation>
---
> <affiliation>Universidade de Bras??lia</affiliation>
21c21
...
snipped for brevity
Setting the Accept: application/xml; charset=UTF-8 header does not fix it. The output headers imply that MDS thinks it is outputting UTF-8.
Googling briefly did not find any mentions of this issue (is this simply because nobody except us is using mds at this point??). I'm not seeing any special parameters for character encoding in the MDS guide.
In the context of re-running modifyRegistrationMetadata this would result in 10Ks of unnecessary updates at IQSS. So, I'm just looking for a quick-and-easy way to patch this. The easiest I can think of so far is to rewrite DataCiteRESTfullClient.getMetadata() to use the REST api instead. Although in the long run we will benefit from switching from MDS to REST completely.
... if there's something I'm missing, please let me know.
Metadata
Metadata
Assignees
Type
Projects
Status