Skip to content

Multiple language tags trigger SHACL violation for multiple values #252

@schivmeister

Description

@schivmeister

This has been a long-standing issue, not least due to the rdf:PlainLiteral (more recently xsd:string) and rdfs:langString dichotomy in ontologies like ePO, whose constraints are generated by model2owl. Having data like the following:

epd:id_d9997c19-6dd6-43e2-9706-28df10f8f1eb_AwardCriterion_Y6iaTUeQDukaqhJjdTfKhV
  a epo:AwardCriterion;
  epo:hasAwardCriterionType <http://publications.europa.eu/resource/authority/award-criterion-type/quality>;
  epo:hasWeightValueType <http://publications.europa.eu/resource/authority/number-weight/per-exa>;
  cccev:weight 10.0;
  dct:description "See purchase documents"@en, "Nurodyta pirkimo dokumentuose"@lt;
  skos:prefLabel "Delivery time for goods"@en, "Delivery time for goods"@lt .

will raise a sh:MaxCountConstraintComponent ("More than 1 values") violation for dct:description and/or skos:prefLabel.

There is a simple fix to this: sh:uniqueLang true

However, that's not the whole story. There is a more sophisticated variant which allows also plain, non-language-tagged literals to co-exist with language-tagged ones, combining sh:uniqueLang and sh:qualifiedValueShape:

ex:MaxOneRDFLabelShape
  a sh:NodeShape ;
  sh:targetSubjectsOf rdf:type ;
  sh:property [
    sh:path rdfs:label ;
    sh:uniqueLang true ;
  ] ;
  sh:property [
    sh:path rdfs:label ;
    sh:qualifiedMaxCount 1 ;
    sh:qualifiedValueShape [
      sh:datatype xsd:string ;
    ] ;
    sh:message "Violation of standard practice: More than one `rdfs:label` exists without a language tag" ;
  ]
.

This was implemented for the SEMIC validator (see shape and accompanying test data).

The above technique would allow the following to pass:

ex:Note a owl:Class ;
  skos:prefLabel "note"@en , "nota"@es ;
  rdfs:label "note"@en , "nota"@es ;
  rdfs:comment "note" , "notee" .

but not:

ex:Note a owl:Class ;
  skos:prefLabel "note" , "notee" , "note"@en , "notee"@en ;
  rdfs:label "note" , "notee" , "note"@en , "notee"@en .

The technique currently seen with sh:or:

...
	sh:minCount 0 ;
	sh:maxCount 1 ;
	sh:or (
		[
			sh:datatype xsd:string ;
		]
		[
			sh:datatype rdf:langString ;
		]
	) .

as implemented based on #219, does not work for multiple language tags.

Whether to implement the simple or advanced variant allowing co-existence of plain and language-tagged literals, is perhaps a question for the ontology stakeholders.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions