Add optional property xpath to @context and specify a DTS selector scheme for WADM

Imagine, we have a *knowledge graph* with statements about passages of texts and we are using URIs to the document endpoint to identify these passages. E.g.:

```ttl
<https://example.com/api/dts/document?resource=https://coexist.org/b/john.xml&ref=John:1:3> my:predicate my:ClassX .
```

There is much information enclosed in the URI, e.g. which resource we have a part of and which part it is. But we do not want to parse the URI. We want statements, that describe the RDF resource which is identified by the URI.

DTS does not provide the properties and classes for formalizing the information. But there's a already an open standard for such information: the [Web Annotation Data Model](https://www.w3.org/TR/annotation-model/) (WADM). By describing the RDF resource with the WADM we also get an alignment with CIDOC-CRM, at least if we follow the proposal of the [LINCS project](https://lincsproject.ca/docs/explore-lod/understand-lincs-data/application-profiles-main/sources-metadata#annotations).

In terms of the WADM, a part of a document as is returned by the `document` endpoint, is a *specific resource*. And a derived view as returned by specifying the `mediaType` parameter is also a *specific resource* (cf.  [WADM TR, Sec. 4](https://www.w3.org/TR/annotation-model/#specific-resources)):
 
> While it is possible using only the constructions described above to create Annotations that reference parts of resources by using IRIs with a fragment component, there are many situations when this is not sufficient. For example, even a simple circular region of an image, or a diagonal line across it, are not possible. Selecting an arbitrary span of text in an HTML page, perhaps the simplest annotation concept, is also not supported by fragments. Furthermore, there are non-segment use cases that require a client to retrieve a specific state or representation of the resource, to style it in a particular way, to associate a role with the resource that is specific to the Annotation's use of it, or for the Annotation to only apply when the resource is used in a particular context.

> The Web Annotation Data Model uses a new type of resource to capture these Annotation-specific requirements: a SpecificResource.

How would we describe a the verse `John:1:3` from book of John in `https://coexist.org/b/john.xml`? In WADM, such a partial resource is a *specific resource*, which has two important properties: the *source* (identified by the URI to the whole resource) and a *selector*, that describes the passage by some selection mechanism.

There is no fixed set of selection mechanisms in WADM. Of course, it offers lots of options, e.g. the [`oa:XPathSelector`](https://www.w3.org/TR/annotation-model/#xpath-selector) for DOM-based documents. We can also specify the DTS selection mechanism and provide alternative selectors, that both describe the same portion of the document.

> Multiple Selectors can be given to describe the same Segment in different ways in order to maximize the chances that it will be discoverable later, and that the consuming user agent will be able to use at least one of the Selectors. [WADM 4.2](https://www.w3.org/TR/annotation-model/#selectors)

Here's how it could look like when we describe the partial resource in two ways:

```ttl
@prefix dts: <https://w3id.org/dts/api#> .
@prefix oa: <http://www.w3.org/ns/oa#> .
@prefix my: <...> .

<https://example.com/api/dts/document?resource=https://coexist.org/b/john.xml&ref=John:1:3>
    a oa:SpecificResource ;
    oa:hasSource <https://coexist.org/b/john.xml> ;
    oa:hasSelector [
        a oa:FragmentSelector
        dcterms:conformsTo <https://w3id.org/dts/api#> ;
        rdf:value "tree=wadm&ref=John:1:3"
        ] ;
    oa:hasSelector [
        a oa:XPathSelector ;
        rdf:value "/Q{http://www.tei-c.org/ns/1.0}TEI[1]/Q{http://www.tei-c.org/ns/1.0}text[1]/Q{http://www.tei-c.org/ns/1.0}body[1]/Q{http://www.tei-c.org/ns/1.0}lg[1]/Q{http://www.tei-c.org/ns/1.0}lg[1]/Q{http://www.tei-c.org/ns/1.0}l[3]" ;
        ] ;
        
    # our analytical assertions. They might be formalized a bit
    # different, but thats not the point here.
    my:predicate my:ClassX .
```

Note the first selector, that describes the text passage in the style, that we know from the DTS specifications. That's what I would suggest. We could also go with this DTS selector alone, however, I would consider it a bit obscure and not interoperable enough.

The proposed syntax in the `rdf:value` property `tree=wadm&ref=John:1:3` is borrowed from RFC5147, which is used for plain text selectors in the WADM.


How can we generate such a RDF-based description of a document part?  Do we need an other endpoint? Good news: No. We can get it from the LOD returned by the navigation endpoint by applying a SPARQL construct query on it. We need some parameters for the SPARQL query, but a client knows them already from his query to the document endpoint: 1) the query URL, 2) the citation tree, and 3) the ref parameter (or start and end).


Everyting else needed can be provided in the citation tree, especially the value for the XPathSelector:

```xml
<refsDecl n="wadm" default="false">
  <citeStructure unit="book" match="//body/lg" use="@n">
	<citeData use="path(.)" property="https://w3id.org/dts/api#xpath"/>
    <citeStructure unit="chapter" match="lg" use="@n" delim=":">
      <citeData use="path(.)" property="https://w3id.org/dts/api#xpath"/>
      <citeStructure unit="verse" match="l" use="@n" delim=":">
        <citeData use="path(.)" property="https://w3id.org/dts/api#xpath"/>
      </citeStructure>
    </citeStructure>
  </citeStructure>
</refsDecl>
```

The members of the `wadm` citation would look like this:

```json
  "member": [
    {
      "level": 1,
      "xpath": "/Q{http://www.tei-c.org/ns/1.0}TEI[1]/Q{http://www.tei-c.org/ns/1.0}text[1]/Q{http://www.tei-c.org/ns/1.0}body[1]/Q{http://www.tei-c.org/ns/1.0}lg[1]",
      "identifier": "John",
      "parent": null,
      "citeType": "book",
      "@type": "CitableUnit"
    },
    {
      "level": 2,
      "xpath": "/Q{http://www.tei-c.org/ns/1.0}TEI[1]/Q{http://www.tei-c.org/ns/1.0}text[1]/Q{http://www.tei-c.org/ns/1.0}body[1]/Q{http://www.tei-c.org/ns/1.0}lg[1]/Q{http://www.tei-c.org/ns/1.0}lg[1]",
      "identifier": "John:1",
      "parent": "John",
      "citeType": "chapter",
      "@type": "CitableUnit"
    },
    {
      "level": 3,
      "xpath": "/Q{http://www.tei-c.org/ns/1.0}TEI[1]/Q{http://www.tei-c.org/ns/1.0}text[1]/Q{http://www.tei-c.org/ns/1.0}body[1]/Q{http://www.tei-c.org/ns/1.0}lg[1]/Q{http://www.tei-c.org/ns/1.0}lg[1]/Q{http://www.tei-c.org/ns/1.0}l[1]",
      "identifier": "John:1:1",
      "parent": "John:1",
      "citeType": "verse",
      "@type": "CitableUnit"
    },
    {
      "level": 3,
      "xpath": "/Q{http://www.tei-c.org/ns/1.0}TEI[1]/Q{http://www.tei-c.org/ns/1.0}text[1]/Q{http://www.tei-c.org/ns/1.0}body[1]/Q{http://www.tei-c.org/ns/1.0}lg[1]/Q{http://www.tei-c.org/ns/1.0}lg[1]/Q{http://www.tei-c.org/ns/1.0}l[2]",
      "identifier": "John:1:2",
      "parent": "John:1",
      "citeType": "verse",
      "@type": "CitableUnit"
    },
    {
      "level": 3,
      "xpath": "/Q{http://www.tei-c.org/ns/1.0}TEI[1]/Q{http://www.tei-c.org/ns/1.0}text[1]/Q{http://www.tei-c.org/ns/1.0}body[1]/Q{http://www.tei-c.org/ns/1.0}lg[1]/Q{http://www.tei-c.org/ns/1.0}lg[1]/Q{http://www.tei-c.org/ns/1.0}l[3]",
      "identifier": "John:1:3",
      "parent": "John:1",
      "citeType": "verse",
      "@type": "CitableUnit"
    },
	  /* ... */
```

Here's the SPARQL query:

```sparql
# SPARQL for constructing a WADM selector for the output of the
# document endpoint queried with a ref parameter. The input graph must
# be the output of a navigation endpoint for the same citation tree of
# the same resource.
#
# Parameters to be set:
#
# ?PARAMTREE - the label of the citation tree, empty string for default
# ?PARAMREF - the identifier member passed to the document enpoint as ref parameter
# ?PARAMURI - the document query URL 

PREFIX dts: <https://w3id.org/dts/api#>
PREFIX oa:  <http://www.w3.org/ns/oa#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>


CONSTRUCT {
  ?PARAMURI rdf:type oa:SpecificResource .
  ?PARAMURI oa:hasSource ?resource .
  ?PARAMURI oa:hasSelector _:xps .
  _:xps rdf:type oa:XPathSelector .
  _:xps rdf:value ?xpath .
  ?PARAMURI oa:hasSelector _:fgs .
  _:fgs rdf:type oa:FragmentSelector .
  _:fgs dcterms:conformsTo dts: .
  _:fgs rdf:value ?DTSSEL .

  # _:fgs dts:isMember _:m .
  # _:m dts:identifier ?PARAMREF .
  # _:m dts:citeType ?citeType .
  # _:m rdf:type dts:CiteableUnit .
  # _:m dts:fromTree ?PARAMTREE .
  # _:m dts:level ?level .
  # _:m dts:parent ?parent .

  ?PARAMURI dts:citeType ?citeType .
}

WHERE {

  # ?PARAM* must be passed in as parameters
  BIND("wadm" as ?PARAMTREE) . # empty value means the default tree?
  BIND("John:1:3" as ?PARAMREF) .
  BIND(<https://example.com/api/dts/document?resource=https://coexist.org/b/john.xml&ref=John:1:3> as ?PARAMURI) .


  BIND(CONCAT("tree=", STR($PARAMTREE), "&ref=", STR(?PARAMREF)) as ?DTSSEL) .

  ?resource rdf:type dts:Resource .
  ?member rdf:type dts:CitableUnit .
  ?member dts:identifier ?PARAMREF .
  ?member dts:xpath ?xpath .
  ?member dts:citeType ?citeType .
  ?member dts:level ?level .
  ?member dts:parent ?parent .

}
```

If we uncomment the commented lines, we would also get the information enclosed in `tree=wadm&ref=John:1:3` in a more 'atomic' way. Portions of text queried with `start`+`end` would require an other SPARQL, that constructs a `oa:RangeSelector`.

There are working code examples in the [DTS Transformation's WIKI](https://github.com/SCDH/dts-transformations/wiki/WADM).

DTS and WADM share some important characteristics:

1. continuous ranges: In DTS we get a continuous portion of the document, no matter if we query it by `ref` or by `start` and `end`. AFAIS a WADM selector also selects a continuous range. I think, that's an important constraint and we should be able to lift its productive potential rather than underline its limitations.
2. discontinuous ranges: In DTS multiple queries have to be filed for getting discontinuous, disconnected parts. In WADM, disconnected portions would be described by multiple specific resources.
3. preimage - image: In DTS the *resource* and in the WADM the *source* have identifier which is mapped to the full document in a base format. It's a preimage (Urbild) in a mathematical sense. Portions and derivations to other media types are images (Bild) in a mathematical sense.

WADM selectors can be further refined, in order to select are more specific portion. That's done by `oa:refinedBy` and there are several refinement mechanisms, e.g., quote, string index, or even XPath again. A specific resource described by a refined selector should have a different URI than with an un-refined selector.

What specification work would need to be done for such an alignment with WADM?

1. Allow optional `xpath` as a property of a `CitableUnit` object and have it in the dts namespace. It's value should be a path expression.
2. Specify the DTS as a WADM selector scheme. This can be done outside of the specifications of the endpoints and is independent work.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional property xpath to @context and specify a DTS selector scheme for WADM #281

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add optional property xpath to @context and specify a DTS selector scheme for WADM #281

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions