-
Notifications
You must be signed in to change notification settings - Fork 14
Description
In a very instructive talk at a workshop at Tubingen Heidelberg on 25th September, @tuurma introduced us to current problems regarding citation tree construction with <citeStructure> when dealing with milestone-like markup. Here's my proposal towards this problem:
- Keep the specification of
<citeStructure>and the algorithm for evaluating it as it currently is. - Solve the problem by following the HATEOAS principle, i.e. provide enough information so that a client can make a follow-up query that returns the desired content. Do not build on conventions here, but make this information explicit.
- We get there by adding 2 optional properties to
CitableUnitobjects which provide identifiers of members for running a follow-up query withstartandendparameters. I suggest to name these propertiesstartMemberandendMember. But you may find better names.
Example:
<refsDecl n="page">
<citeStructure unit="page" match="//body//pb" use="@n" delim="p.">
<citeData use="'p.' || @n || '.start'" property="https://w3id.org/dts/api#startMember"/>
<citeData use="'p.' || @n || '.end'" property="https://w3id.org/dts/api#endMember"/>
<citeStructure unit="page-start" match="self::pb"
use="''" delim=".start"/>
<citeStructure unit="page-end" match="self::pb/following::pb[1]/preceding::node()[1]"
use="''" delim=".end"/>
<!--
to solve the last-page problem try
(self::pb/following::pb[1]/preceding::node()[1], (//text//text())[last()] )[1]
-->
</citeStructure>
</refsDecl>
This results in members like these:
"member": [
{
"level": 1,
"startMember": "p.1.start",
"identifier": "p.1",
"parent": null,
"citeType": "page",
"@type": "CitableUnit",
"endMember": "p.1.end"
},
{
"level": 2,
"identifier": "p.1.start",
"parent": "p.1",
"citeType": "page-start",
"@type": "CitableUnit"
},
{
"level": 2,
"identifier": "p.1.end",
"parent": "p.1",
"citeType": "page-end",
"@type": "CitableUnit"
},
/* ... */
]
That's minimal explicit information for a client. From the presence of startMember and endMember it can conclude, that there's a better alternative to getting content than by using the ref parameter and it thus queries http://...&tree=page&start=p.1.start&end=p.1.end. The client does not need to know about conventions that may result from the string literal "page" provided in the citeType property.
The client can also still query the single <pb/> element with ref.
I think it's important, that no special processing on the server side is involved. This makes the suggested solution suitable for all kinds of milestone-like and non-hierarchical markup, not only <pb/>.
You may think, that startMember is redundant, because the referenced member selects the same document node as his parent member. True. In this case. At first, I added it for the beauty of symmetry. But I can think of situations, where the constructed subtrees of the parent differs from both, the start and the end member's. E.g. a cite structure for apparatus entries in external double-end-point variant encoding. Anyway, cases will be legion. And now I would say, that there MUST either be both of these optional properties or none of them.
BTW, DTS Transformations enabled me to investigate on this subject and finally come up with this suggestion. The investigation with working code examples is documented in the Wiki.
What do you think about going the HATEOAS way?