-
Notifications
You must be signed in to change notification settings - Fork 12
Description
The issue
Taking Example1 from https://www.w3.org/TR/turtle/
@base <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://www.perceive.net/schemas/relationship/> .
<#green-goblin>
rel:enemyOf <#spiderman> ;
a foaf:Person ; # in the context of the Marvel universe
foaf:name "Green Goblin" .
<#spiderman>
rel:enemyOf <#green-goblin> ;
a foaf:Person ;
foaf:name "Spiderman", "Человек-паук"@ru .Instantiating a EasyRDF Graph based on this data is ok. It parses correctly.
However, when the prefixes are not provided an exception is thrown 'Unable to parse data of an unknown format.'
E.g.
<#green-goblin>
rel:enemyOf <#spiderman> ;
a foaf:Person ; # in the context of the Marvel universe
foaf:name "Green Goblin" .
<#spiderman>
rel:enemyOf <#green-goblin> ;
a foaf:Person ;
foaf:name "Spiderman", "Человек-паук"@ru .
Analysis
According to the Turtle Grammer:
- A
turtleDocis a set ofstatements [1] - A
statementis adirectiveORtriples[2]
That means that a turtle document can, but not necessarily needs to, start with @prefix or @base statements.
In the logic of method Format::guessFormat(), turtle documents are only recognized when they start with prefix or base statements (with or without the @).
...
} elseif (preg_match('/@prefix\s|@base\s/', $short)) {
return self::getFormat('turtle');
} elseif (
preg_match('/prefix\s|base\s/i', $short)
// see FormatTest::testGuessFormatTurtleByPrefix for an example
&& false === str_contains($short, '<?xml')
) {
return self::getFormat('turtle');
} elseif (preg_match('/^\s*<.+> <.+>/m', $short)) {
return self::getFormat('ntriples');
} else {
return null;
}
...Solution space
I propose to add a few more possibilities to recognize turtle documents.
We need to keep in mind to be able to distinguish turtle from n-triples syntax. If I'm correct, turtle is a superset of ntriples. A valid n-triples document also complies to turtle, but not the other way around.
There are specific indicators that we are dealing with a turtle document, including:
- The shorthand
ais used for predicate rdf:type - There are compact URIs used, i.e. prefix colon localname enclosed by whitespace
- There is usage of semicolon to close-continue predicateObjectLists
These are not allowed to be used in n-triples syntax so we are good here. This might result in more false positive hits for turtle when guessing the format resulting in parser errors. But I don't see a problem here, because otherwise the guessFormat method will return null which also results in an exception as described in this issue.