diff --git a/.github/ISSUE_TEMPLATE/data.md b/.github/ISSUE_TEMPLATE/data.md
index d5bf4bcd0d..cbc58b2db3 100644
--- a/.github/ISSUE_TEMPLATE/data.md
+++ b/.github/ISSUE_TEMPLATE/data.md
@@ -7,55 +7,32 @@ assignees: ''
---
-# Issue still valid?
-> DBpedia updates frequently in this order: 1. DIEF software (extracts data from wikidata), 2. monthly dumps, 3. online services loaded from dumps.
-> We update http://dief.tools.dbpedia.org/server/extraction/ on a daily basis from the git and it reflects the current state.
->
-> **Disclaimer:** The public SPARQL endpoints (e.g., http://dbpedia.org/sparql) and other applications build based on DBpedia's data are not in sync yet with the latest monthly extracted data.
->
-> Therefore, you can use this tool to extract an example page and check if the error persists in the latest software version, and add the link you used for verification, e.g., http://dief.tools.dbpedia.org/server/extraction/en/extract?title=United+States
+# Issue validity
+> Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into http://dbpedia.org/sparql . During these three months, Wikipedia changes and also the DBpedia Information Extraction Framework receives patches. At http://dief.tools.dbpedia.org/server/extraction/en/ we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g. `Berlin` or `Joe_Biden` here: http://dief.tools.dbpedia.org/server/extraction/en/
+> If the issue persists, please post the link from your browser here:
-# Source
-> Where did you find the data issue? Pick one, remove the others.
-
-### Web / SPARQL
-> State the service (e.g. http://dbpedia.org/sparql) and the SPARQL query
-> give a link to the web / linked data pages (e.g. http://dbpedia.org/resource/Berlin)
-
-### Release Dumps
-> DBpedia provides monthly release dumps, cf. release-dashboard.dbpedia.org
-> provide artifact & version or download link
-
-### Running the DBpedia Extraction (DIEF) software
-> Please include all necessary information.
-
-
-# Classification
-> If you have some familiarity with DBpedia, please use the classification tags at (link) to correctly file this issue. Otherwise skip this step.
-
-
-
-### Error Description
+# Error Description
> Please state the nature of your technical emergency:
+# Pinpointing the source of the error
+> Where did you find the data issue? Non-exhaustive options are:
+* Web/SPARQL, e.g. http://dbpedia.org/sparql or http://dbpedia.org/resource/Berlin, please **provide query or link**
+* Dumps: dumps are managed by the Databus. Please **provide artifact & version or download link**
+* DIEF: you ran the software and the error occured then, please **include all necessary information such as the extractor or log**. If you had problems running the software use [another issue template](https://github.com/dbpedia/extraction-framework/issues/new/choose)
-### Error specification
-> Pick the appropriate:
+# Details
+> please post the details
-- Affected extraction artifacts (Databus artifact version or file identifiers):
- - https://databus.dbpedia.org/dbpedia/mappings/mappingbased-objects/mappingbased-objects_lang=en_disjointDomain.ttl.bz2
- -
-- Example DBpedia resource URL(s) having the error (one full IRI per line):
- - http://dbpedia.org/resource/Leipzig
- -
-- Erroneous triples RDF snippet (NTRIPLES):
+> Wrong triples RDF snippet
```
```
-- Expected / corrected RDF outcome snippet (NTRIPLES):
+> Expected / corrected RDF outcome snippet
```
```
+>Example DBpedia resource URL(s)
+```
-### Additional context
-> Add any other context about the problem here.
+```
+> Other
diff --git a/README.md b/README.md
index 634adb8237..803c831824 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
**Homepage**: http://dbpedia.org
**Documentation**: http://dev.dbpedia.org/Extraction
**Get in touch with DBpedia**: https://wiki.dbpedia.org/join/get-in-touch
-**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace](https://dbpedia-slack.herokuapp.com/) - the main point for [developement updates](https://github.com/dbpedia/extraction-framework/blob/master/.github/workflows/maven.yml) and discussions
+**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace]( https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for developement updates and discussions
## Contents
@@ -61,7 +61,7 @@ The DBpedia extraction framework is structured into different modules
### Core Module
-
+
@@ -76,9 +76,9 @@ The DBpedia extraction framework is structured into different modules
In addition to the core components, a number of utility packages offers essential functionality to be used by the extraction code:
-* **Ontology** Classes used to represent an ontology. Methods for both, reading and writing ontologies are provided. All classes are located in the namespace [org.dbpedia.extraction.ontology](tree/master/core/src/main/scala/org/dbpedia/extraction/ontology)
-* **DataParser** Parsers to extract data from nodes in the abstract syntax tree. All classes are located in the namespace [org.dbpedia.extraction.dataparser](tree/master/core/src/main/scala/org/dbpedia/extraction/dataparser)
-* **Util** Various utility classes. All classes are located in the namespace [org.dbpedia.extraction.util](tree/master/core/src/main/scala/org/dbpedia/extraction/util)
+* **Ontology** Classes used to represent an ontology. Methods for both, reading and writing ontologies are provided. All classes are located in the namespace `org.dbpedia.extraction.ontology`.
+* **DataParser** Parsers to extract data from nodes in the abstract syntax tree. All classes are located in the namespace `org.dbpedia.extraction.dataparser`.
+* **Util** Various utility classes. All classes are located in the namespace `org.dbpedia.extraction.util`.
### Dump extraction Module
@@ -104,7 +104,7 @@ Please make sure you have read the Developer's Certificate of Origin, further do
8. Send a pull request from your branch into `extraction-framework/dev` via GitHub.
* In the description, reference the associated commit (for example, _"Fixes #123 by ..."_ for issue number 123).
* Your changes will be reviewed and discussed on GitHub.
- * In addition, [Travis-CI](http://about.travis-ci.org/) will test if the merged version passes the build.
+ * In addition, [Travis-CI](https://www.travis-ci.com/about-us/) will test if the merged version passes the build.
* If there are further changes you need to make, because Travis said the build fails or because somebody caught something you overlooked, go back to item 4. Stay on the same branch (if it is still related to the same issue). GitHub will add the new commits to the same pull request.
* When everything is fine, your changes will be merged into `extraction-framework/dev`, finally the `dev` together with your improvements will be merged with the `master` branch.
@@ -112,17 +112,17 @@ Please keep in mind:
- Try *not* to modify the indentation. If you want to re-format, use a separate "formatting" commit in which no functionality changes are made.
- **Never** rebase the master onto a development branch (i.e. _never_ call `rebase` from `extraction-framework/master`). Only rebase your branch onto the dev branch, *if and only if* nobody already pulled from the development branch!
- If you already pushed a branch to GitHub, later rebased the master onto this branch and then tried to push again, GitHub won't let you saying _"To prevent you from losing history, non-fast-forward updates were rejected"_. If _(and only if)_ you are sure that nobody already pulled from this branch, add `--force` to the push command.
-[_"Don’t rebase branches you have shared with another developer."_](http://www.jarrodspillers.com/2009/08/19/git-merge-vs-git-rebase-avoiding-rebase-hell/)
-[_"Rebase is awesome, I use rebase exclusively for everything local. Never for anything that I've already pushed."_](http://jeffkreeftmeijer.com/2010/the-magical-and-not-harmful-rebase/#comment-87479247)
-[_"Never ever rebase a branch that you pushed, or that you pulled from another person_"](http://blog.experimentalworks.net/2009/03/merge-vs-rebase-a-deep-dive-into-the-mysteries-of-revision-control/)
+ - _"[Don’t rebase branches you have shared with another developer.](http://www.jarrodspillers.com/2009/08/19/git-merge-vs-git-rebase-avoiding-rebase-hell/)"_
+ - _"[Rebase is awesome, I use rebase exclusively for everything local. Never for anything that I've already pushed.](http://jeffkreeftmeijer.com/2010/the-magical-and-not-harmful-rebase/#comment-87479247)"_
+ - _"[Never ever rebase a branch that you pushed, or that you pulled from another person](https://web.archive.org/web/20150622064245/http://blog.experimentalworks.net/2009/03/merge-vs-rebase-a-deep-dive-into-the-mysteries-of-revision-control/)"_
- In general, we prefer Scala over Java.
More tips:
- Guides to setup your development environment for [Intellij](Setting up IntelliJ IDEA) or [Eclipse](Setting up eclipse).
-- Get help with the [Maven build](Build-from-Source-with-Maven) or another form of [installation](Installation).
-- [Download](Downloads) some data to work with.
-- How to run [from Scala/Java](Run-from-Java-or-Scala) or [from a JAR](Run-from-a-JAR).
-- Having different troubles? Check the [troubleshooting page](Troubleshooting) or post on https://forum.dbpedia.org.
+- Get help with the [Maven build](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html) or another form of [installation](https://maven.apache.org/install.html).
+- [Download](https://dumps.wikimedia.org/) some data to work with.
+- How to run [from Scala/Java](https://docs.scala-lang.org/tutorials/scala-with-maven.html) or [from a JAR](https://docs.oracle.com/javase/tutorial/deployment/jar/run.html).
+- Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on https://forum.dbpedia.org.
### Important: Developer's Certificate of Origin
By sending a pull request to the [extraction-framework repository](https://github.com/dbpedia/extraction-framework) on GitHub, you implicitly accept the [Developer's Certificate of Origin 1.1](https://github.com/dbpedia/extraction-framework/blob/master/documentation/DeveloperCertificateOfOrigin.md)
diff --git a/core/doc/HowTo-release-DBpedia.txt b/core/doc/HowTo-release-DBpedia.txt
index 7903c20f9a..44e1291bc9 100644
--- a/core/doc/HowTo-release-DBpedia.txt
+++ b/core/doc/HowTo-release-DBpedia.txt
@@ -22,7 +22,7 @@ release. It might not be complete. Please also consult with the others!
- Commit the files to the hg repository
- Don't change the files anymore. The whole extraction should use the same version.
- - for AbstractExtractor: insert Wikipedia dumps into a local MySQL database using ...dump.sql.Import.scala
+ - for PlainAbstractExtractor: insert Wikipedia dumps into a local MySQL database using ...dump.sql.Import.scala
- adjust the LocalSettings.php of mw-modified: specify username+password for the database and the database prefix
TODO: more in-depth explanations about abstract extraction
diff --git a/core/src/main/scala/org/dbpedia/extraction/config/Config.scala b/core/src/main/scala/org/dbpedia/extraction/config/Config.scala
index 7f3e218edb..e10453e7d1 100644
--- a/core/src/main/scala/org/dbpedia/extraction/config/Config.scala
+++ b/core/src/main/scala/org/dbpedia/extraction/config/Config.scala
@@ -277,7 +277,8 @@ class Config(val configPath: String) extends
shortAbstractsProperty = this.getProperty("short-abstracts-property", "rdfs:comment").trim,
longAbstractsProperty = this.getProperty("long-abstracts-property", "abstract").trim,
shortAbstractMinLength = this.getProperty("short-abstract-min-length", "200").trim.toInt,
- abstractTags = this.getProperty("abstract-tags", "query,pages,page,extract").trim
+ abstractTags = this.getProperty("abstract-tags", "query,pages,page,extract").trim,
+ removeBrokenBracketsProperty = this.getProperty("remove-broken-brackets-plain-abstracts", "false").trim.toBoolean
)
} match{
case Success(s) => s
@@ -293,7 +294,8 @@ class Config(val configPath: String) extends
writeAnchor = this.getProperty("nif-write-anchor", "false").trim.toBoolean,
writeLinkAnchor = this.getProperty("nif-write-link-anchor", "true").trim.toBoolean,
abstractsOnly = this.getProperty("nif-extract-abstract-only", "true").trim.toBoolean,
- cssSelectorMap = this.getClass.getClassLoader.getResource("nifextractionconfig.json") //static config file in core/src/main/resources
+ cssSelectorMap = this.getClass.getClassLoader.getResource("nifextractionconfig.json"), //static config file in core/src/main/resources
+ removeBrokenBracketsProperty = this.getProperty("remove-broken-brackets-html-abstracts", "false").trim.toBoolean
)
} match{
case Success(s) => s
@@ -348,7 +350,8 @@ object Config{
writeAnchor: Boolean,
writeLinkAnchor: Boolean,
abstractsOnly: Boolean,
- cssSelectorMap: URL
+ cssSelectorMap: URL,
+ removeBrokenBracketsProperty: Boolean
)
/**
@@ -369,11 +372,12 @@ object Config{
)
case class AbstractParameters(
- abstractQuery: String,
- shortAbstractsProperty: String,
- longAbstractsProperty: String,
- shortAbstractMinLength: Int,
- abstractTags: String
+ abstractQuery: String,
+ shortAbstractsProperty: String,
+ longAbstractsProperty: String,
+ shortAbstractMinLength: Int,
+ abstractTags: String,
+ removeBrokenBracketsProperty: Boolean
)
case class SlackCredentials(
diff --git a/core/src/main/scala/org/dbpedia/extraction/mappings/AbstractExtractorWikipedia.scala b/core/src/main/scala/org/dbpedia/extraction/mappings/HtmlAbstractExtractor.scala
similarity index 94%
rename from core/src/main/scala/org/dbpedia/extraction/mappings/AbstractExtractorWikipedia.scala
rename to core/src/main/scala/org/dbpedia/extraction/mappings/HtmlAbstractExtractor.scala
index 14bd29d0a3..50c97b8bf4 100644
--- a/core/src/main/scala/org/dbpedia/extraction/mappings/AbstractExtractorWikipedia.scala
+++ b/core/src/main/scala/org/dbpedia/extraction/mappings/HtmlAbstractExtractor.scala
@@ -13,7 +13,7 @@ import scala.language.reflectiveCalls
* Created: 5/19/14 9:21 AM
*/
-class AbstractExtractorWikipedia(
+class HtmlAbstractExtractor(
context : {
def ontology : Ontology
def language : Language
diff --git a/core/src/main/scala/org/dbpedia/extraction/mappings/MissingAbstractsExtractor.scala b/core/src/main/scala/org/dbpedia/extraction/mappings/MissingAbstractsExtractor.scala
index 421ec65016..5f4aff0c5c 100644
--- a/core/src/main/scala/org/dbpedia/extraction/mappings/MissingAbstractsExtractor.scala
+++ b/core/src/main/scala/org/dbpedia/extraction/mappings/MissingAbstractsExtractor.scala
@@ -54,7 +54,7 @@ extends PageNodeExtractor
private val language = context.language.wikiCode
- private val logger = Logger.getLogger(classOf[AbstractExtractor].getName)
+ private val logger = Logger.getLogger(classOf[PlainAbstractExtractor].getName)
//private val apiParametersFormat = "uselang="+language+"&format=xml&action=parse&prop=text&title=%s&text=%s"
private val apiParametersFormat = "uselang="+language+"&format=xml&action=query&prop=extracts&exintro=&explaintext=&titles=%s"
diff --git a/core/src/main/scala/org/dbpedia/extraction/mappings/NifExtractor.scala b/core/src/main/scala/org/dbpedia/extraction/mappings/NifExtractor.scala
index cca0ae316e..28e7b533db 100644
--- a/core/src/main/scala/org/dbpedia/extraction/mappings/NifExtractor.scala
+++ b/core/src/main/scala/org/dbpedia/extraction/mappings/NifExtractor.scala
@@ -14,7 +14,7 @@ import scala.language.reflectiveCalls
/**
* Extracts page html.
*
- * Based on AbstractExtractor, major difference is the parameter
+ * Based on PlainAbstractExtractor, major difference is the parameter
* apiParametersFormat = "action=parse&prop=text§ion=0&format=xml&page=%s"
*
* This class produces all nif related datasets for the abstract as well as the short-, long-abstracts datasets.
@@ -69,7 +69,7 @@ class NifExtractor(
object NifExtractor{
//TODO check if this function is still relevant
- //copied from AbstractExtractor
+ //copied from PlainAbstractExtractor
def postProcessExtractedHtml(pageTitle: WikiTitle, text: String): String =
{
val startsWithLowercase =
diff --git a/core/src/main/scala/org/dbpedia/extraction/mappings/AbstractExtractor.scala b/core/src/main/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractor.scala
similarity index 89%
rename from core/src/main/scala/org/dbpedia/extraction/mappings/AbstractExtractor.scala
rename to core/src/main/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractor.scala
index a9026c5af8..59cf57d029 100644
--- a/core/src/main/scala/org/dbpedia/extraction/mappings/AbstractExtractor.scala
+++ b/core/src/main/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractor.scala
@@ -1,13 +1,13 @@
package org.dbpedia.extraction.mappings
import java.util.logging.Logger
-
import org.dbpedia.extraction.annotations.ExtractorAnnotation
import org.dbpedia.extraction.config.Config
import org.dbpedia.extraction.config.provenance.DBpediaDatasets
import org.dbpedia.extraction.ontology.Ontology
import org.dbpedia.extraction.transform.{Quad, QuadBuilder}
-import org.dbpedia.extraction.util.{Language, MediaWikiConnector}
+import org.dbpedia.extraction.util.abstracts.AbstractUtils
+import org.dbpedia.extraction.util.{Language, MediaWikiConnector, WikiUtil}
import org.dbpedia.extraction.wikiparser._
import scala.language.reflectiveCalls
@@ -30,7 +30,7 @@ import scala.language.reflectiveCalls
@deprecated("replaced by NifExtractor.scala: which will extract the whole page content including the abstract", "2016-10")
@ExtractorAnnotation("abstract extractor")
-class AbstractExtractor(
+class PlainAbstractExtractor(
context : {
def ontology : Ontology
def language : Language
@@ -39,7 +39,7 @@ class AbstractExtractor(
)
extends WikiPageExtractor
{
- protected val logger = Logger.getLogger(classOf[AbstractExtractor].getName)
+ protected val logger = Logger.getLogger(classOf[PlainAbstractExtractor].getName)
this.getClass.getClassLoader.getResource("myproperties.properties")
@@ -50,6 +50,8 @@ extends WikiPageExtractor
//private val apiParametersFormat = "uselang="+language+"&format=xml&action=parse&prop=text&title=%s&text=%s"
protected val apiParametersFormat = context.configFile.abstractParameters.abstractQuery
+ protected val removeBrokenBrackets = context.configFile.abstractParameters.removeBrokenBracketsProperty
+
// lazy so testing does not need ontology
protected lazy val shortProperty = context.ontology.properties(context.configFile.abstractParameters.shortAbstractsProperty)
@@ -63,7 +65,6 @@ extends WikiPageExtractor
private val mwConnector = new MediaWikiConnector(context.configFile.mediawikiConnection, context.configFile.abstractParameters.abstractTags.split(","))
-
override def extract(pageNode : WikiPage, subjectUri: String): Seq[Quad] =
{
//Only extract abstracts for pages from the Main namespace
@@ -79,16 +80,22 @@ extends WikiPageExtractor
// if(abstractWikiText == "") return Seq.empty
//Retrieve page text
- val text = mwConnector.retrievePage(pageNode.title, apiParametersFormat, pageNode.isRetry) match{
- case Some(t) => AbstractExtractor.postProcessExtractedHtml(pageNode.title, replacePatterns(t))
+ val text = mwConnector.retrievePage(pageNode.title, apiParametersFormat, pageNode.isRetry) match {
+ case Some(t) => PlainAbstractExtractor.postProcessExtractedHtml(pageNode.title, replacePatterns(t))
case None => return Seq.empty
}
+ val modifiedText = if (removeBrokenBrackets) {
+ AbstractUtils.removeBrokenBracketsInAbstracts(text)
+ } else {
+ text
+ }
+
//Create a short version of the abstract
- val shortText = short(text)
+ val shortText = short(modifiedText)
//Create statements
- val quadLong = longQuad(pageNode.uri, text, pageNode.sourceIri)
+ val quadLong = longQuad(pageNode.uri,modifiedText, pageNode.sourceIri)
val quadShort = shortQuad(pageNode.uri, shortText, pageNode.sourceIri)
if (shortText.isEmpty)
@@ -140,7 +147,7 @@ extends WikiPageExtractor
private def replacePatterns(abst: String): String= {
var ret = abst
- for ((regex, replacement) <- AbstractExtractor.patternsToRemove) {
+ for ((regex, replacement) <- PlainAbstractExtractor.patternsToRemove) {
val matches = regex.pattern.matcher(ret)
if (matches.find()) {
ret = matches.replaceAll(replacement)
@@ -205,7 +212,7 @@ extends WikiPageExtractor
.filter(renderNode)
.map(_.toWikiText)
.mkString("").trim
-
+
// decode HTML entities - the result is plain text
decodeHtml(text)
}
@@ -213,7 +220,7 @@ extends WikiPageExtractor
}
-object AbstractExtractor {
+object PlainAbstractExtractor {
//TODO check if this function is still relevant
def postProcessExtractedHtml(pageTitle: WikiTitle, text: String): String =
@@ -243,6 +250,7 @@ object AbstractExtractor {
val patternsToRemove = List(
"""
""".r -> " ",
- """
""".r -> " "
+ """""".r -> " ",
+ """.*<\/normalized>""".r -> ""
)
}
diff --git a/core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala b/core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala
index 53a2cb4a27..2aee5ee67c 100644
--- a/core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala
+++ b/core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala
@@ -6,180 +6,219 @@ import org.dbpedia.extraction.transform.Quad
import org.dbpedia.extraction.util.Language
import org.dbpedia.extraction.wikiparser.{AnnotationKey, PropertyNode, TemplateNode}
+import java.util.logging.Logger
import scala.collection.mutable
import scala.collection.mutable.ArrayBuffer
import scala.language.reflectiveCalls
-class TemplateMapping(
+class TemplateMapping(
val mapToClass : OntologyClass,
val correspondingClass : OntologyClass, // must be public val for converting to rml
val correspondingProperty : OntologyProperty, // must be public for converting to rml
val mappings : List[PropertyMapping], // must be public val for statistics
context: {
def ontology : Ontology
- def language : Language
- }
-)
+ def language : Language
+ }
+)
extends Extractor[TemplateNode]
{
- override val datasets: Set[Dataset] = mappings.flatMap(_.datasets).toSet ++ Set(DBpediaDatasets.OntologyTypes, DBpediaDatasets.OntologyTypesTransitive, DBpediaDatasets.OntologyPropertiesObjects)
+ override val datasets: Set[Dataset] = mappings.flatMap(_.datasets).toSet ++ Set(DBpediaDatasets.OntologyTypes, DBpediaDatasets.OntologyTypesTransitive, DBpediaDatasets.OntologyPropertiesObjects)
- private val classOwlThing = context.ontology.classes("owl:Thing")
- private val propertyRdfType = context.ontology.properties("rdf:type")
+ private val classOwlThing = context.ontology.classes("owl:Thing")
+ private val propertyRdfType = context.ontology.properties("rdf:type")
+
+ /**
+ * when extractor has a pre-phase
+ */
- override def extract(node: TemplateNode, subjectUri: String): Seq[Quad] =
- {
- val pageNode = node.root
- val graph = new ArrayBuffer[Quad]
+ override def extract(node: TemplateNode, subjectUri: String): Seq[Quad] =
+ {
+ val pageNode = node.root
+ val graph = new ArrayBuffer[Quad]
- pageNode.getAnnotation(TemplateMapping.CLASS_ANNOTATION) match
- {
- case None => //So far, no template has been mapped on this page
- {
- //Add ontology instance
- createInstance(graph, subjectUri, node)
-
- //Save existing template (this is the first one)
- node.setAnnotation(TemplateMapping.TEMPLATELIST_ANNOTATION, Seq(node.title.decoded))
-
- //Extract properties
- graph ++= mappings.flatMap(_.extract(node, subjectUri))
- }
- case Some(pageClass) => //This page already has a root template.
- {
- // Depending on the following conditions we create a new "blank node" or append the data to the main resource.
- // Example case for creating new resources are the pages: enwiki:Volkswagen_Golf , enwiki:List_of_Playboy_Playmates_of_2012
- // Example case we could append to existing class are where we have to different mapped templates that one is a subclass of the other
-
- // Condition #1
- // Check if the root template has been mapped to the corresponding Class of this template
- // If the mapping already defines a corresponding class & propery then we should create a new resource
- val condition1_createCorrespondingProperty = correspondingClass != null &&
- correspondingProperty != null && pageClass.relatedClasses.contains(correspondingClass)
-
- // Condition #2
- // If we have more than one of the same template it means that we want to create multiple resources. See for example
- // the pages: enwiki:Volkswagen_Golf , enwiki:List_of_Playboy_Playmates_of_2012
- val pageTemplateSet = pageNode.getAnnotation(TemplateMapping.TEMPLATELIST_ANNOTATION).getOrElse(Seq.empty)
- val condition2_template_exists = pageTemplateSet.contains(node.title.decoded)
- if (!condition2_template_exists)
- node.setAnnotation(TemplateMapping.TEMPLATELIST_ANNOTATION, pageTemplateSet ++ Seq(node.title.decoded))
-
- // Condition #3
- // The current mapping is a subclass or a superclass of previous class or owl:Thing
- val condition3_subclass = mapToClass.relatedClasses.contains(pageClass) || pageClass.relatedClasses.contains(mapToClass) || mapToClass.equals(classOwlThing) || pageClass.equals(classOwlThing)
-
- // If all above conditions are met then use the main resource, otherwise create a new one
- val instanceUri =
- if ( (!condition1_createCorrespondingProperty) && (!condition2_template_exists) && condition3_subclass ) subjectUri
- else generateUri(subjectUri, node)
-
- //Add ontology instance
- if (instanceUri == subjectUri) {
- createMissingTypes(graph, instanceUri, node)
- }
- else {
- createInstance(graph, instanceUri, node)
- }
-
- if (condition1_createCorrespondingProperty)
- {
- //Connect new instance to the instance created from the root template
- graph += new Quad(context.language, DBpediaDatasets.OntologyPropertiesObjects, instanceUri, correspondingProperty, subjectUri, node.sourceIri)
- }
-
- //Extract properties
- graph ++= mappings.flatMap(_.extract(node, instanceUri))
- }
+ pageNode.getAnnotation(TemplateMapping.CLASS_ANNOTATION) match
+ {
+ case None => //So far, no template has been mapped on this page
+ {
+ //Add ontology instance
+ createInstance(graph, subjectUri, node)
+
+ //Save existing template (this is the first one)
+ node.setAnnotation(TemplateMapping.TEMPLATELIST_ANNOTATION, Seq(node.title.decoded))
+
+ //Extract properties
+ graph ++= mappings.flatMap(_.extract(node, subjectUri))
+ }
+ case Some(pageClass) => //This page already has a root template.
+ {
+ // Depending on the following conditions we create a new "blank node" or append the data to the main resource.
+ // Example case for creating new resources are the pages: enwiki:Volkswagen_Golf , enwiki:List_of_Playboy_Playmates_of_2012
+ // Example case we could append to existing class are where we have to different mapped templates that one is a subclass of the other
+
+ // Condition #1
+ // Check if the root template has been mapped to the corresponding Class of this template
+ // If the mapping already defines a corresponding class & propery then we should create a new resource
+ var condition1_create_correspondingproperty = correspondingClass != null &&
+ correspondingProperty != null && pageClass.relatedClasses.contains(correspondingClass)
+
+ // Condition #2
+ // If we have more than one of the same template it means that we want to create multiple resources. See for example
+ // the pages: enwiki:Volkswagen_Golf , enwiki:List_of_Playboy_Playmates_of_2012
+ val pageTemplateSet = pageNode.getAnnotation(TemplateMapping.TEMPLATELIST_ANNOTATION).getOrElse(Seq.empty)
+ val condition2_template_exists = pageTemplateSet.contains(node.title.decoded)
+ if (!condition2_template_exists) {
+ node.setAnnotation(TemplateMapping.TEMPLATELIST_ANNOTATION, pageTemplateSet ++ Seq(node.title.decoded))
}
-
- graph
- }
- private def createMissingTypes(graph: mutable.Buffer[Quad], uri : String, node : TemplateNode): Unit =
- {
- val pageClass = node.root.getAnnotation(TemplateMapping.CLASS_ANNOTATION).getOrElse(throw new IllegalArgumentException("missing class Annotation"))
+ // Condition #3
+ // The current mapping is a subclass or a superclass of previous class or owl:Thing
+ val condition3_subclass = mapToClass.relatedClasses.contains(pageClass) || pageClass.relatedClasses.contains(mapToClass) || mapToClass.equals(classOwlThing) || pageClass.equals(classOwlThing)
- // Compute missing types, i.e. the set difference between the page classes and this TemplateMapping relatedClasses
- val diffSet = mapToClass.relatedClasses.filterNot(c => pageClass.relatedClasses.contains(c))
- // Set annotations
- node.setAnnotation(TemplateMapping.CLASS_ANNOTATION, mapToClass)
- node.setAnnotation(TemplateMapping.INSTANCE_URI_ANNOTATION, uri)
+ // Condition #4
+ //if we have more than one info boxes and name property of infobox is different than title of page than infobox belong to different
+ //entity and need to be saved as new resource.
+ //This is to avoid assigning of one entity properties to another entity because a single page can have multiple info boxes and each
+ //about different entity, see dbr:Helene_Demuth
- // Set new annotation (if new map is a subclass)
- if (mapToClass.relatedClasses.contains(pageClass))
- node.root.setAnnotation(TemplateMapping.CLASS_ANNOTATION, mapToClass)
+ //checking if node is an infobox
+ val isInfobox = if (node.title.decoded.contains("Infobox")) {
+ true
+ } else {
+ false
+ }
- // Create missing type statements
- // Here we do not split the transitive and the direct types because different types may come from different mappings
- // Splitting the types of the main resource is done at the MappingExtractor.extract()
- for (cls <- diffSet)
- graph += new Quad(context.language, DBpediaDatasets.OntologyTypes, uri, propertyRdfType, cls.uri, node.sourceIri+"&mappedTemplate="+node.title.encoded)
+ var condition4_same_entity_infoBox = true
- }
+ if(isInfobox)
+ {
+ //getting name property from infobox
+ val allNames = node.children.filter(p => p.key == "name")
+ var name = subjectUri;
+ if(allNames.size > 0)
+ name = allNames(0).propertyNodeValueToPlainText
- private def createInstance(graph: mutable.Buffer[Quad], uri : String, node : TemplateNode): Unit =
- {
- val classes = mapToClass.relatedClasses
+ //getting subject of wikipedia page
+ var splittedURI = subjectUri.split("/")
+ var pageTitle = splittedURI(splittedURI.size - 1)
- //Set annotations
- node.setAnnotation(TemplateMapping.CLASS_ANNOTATION, mapToClass)
- node.setAnnotation(TemplateMapping.INSTANCE_URI_ANNOTATION, uri)
+ if(!name.contains(pageTitle) && !pageTitle.contains(name))
+ condition4_same_entity_infoBox = false
- if(node.root.getAnnotation(TemplateMapping.CLASS_ANNOTATION).isEmpty)
- {
- node.root.setAnnotation(TemplateMapping.CLASS_ANNOTATION, mapToClass)
}
-
- //Create type statements
- for (cls <- classes) {
- // Here we split the transitive types from the direct type assignment
- val typeDataset = if (cls.equals(mapToClass)) DBpediaDatasets.OntologyTypes else DBpediaDatasets.OntologyTypesTransitive
- graph += new Quad(context.language, typeDataset, uri, propertyRdfType, cls.uri, node.sourceIri+"&mappedTemplate="+node.title.encoded)
+ // If all above conditions are met then use the main resource, otherwise create a new one
+ val instanceUri = {
+ if ( (!condition1_create_correspondingproperty) && (!condition2_template_exists) && condition3_subclass && condition4_same_entity_infoBox) subjectUri
+ else generateUri(subjectUri, node)
}
- }
-
- /**
- * Generates a new URI from a template node
- *
- * @param subjectUri The base string of the generated URI
- * @param templateNode The template for which the URI is to be generated
- * @return The generated URI
- */
- private def generateUri(subjectUri : String, templateNode : TemplateNode) : String =
- {
- val properties = templateNode.children
- //Cannot generate URIs for empty templates
- if(properties.isEmpty)
- {
- return templateNode.generateUri(subjectUri, templateNode.title.decoded)
+ //Add ontology instance
+ if (instanceUri == subjectUri) {
+ createMissingTypes(graph, instanceUri, node)
}
-
- //Try to find a property which contains 'name'
- var nameProperty : PropertyNode = null
- for(property <- properties if nameProperty == null)
- {
- if(property.key.toLowerCase.contains("name"))
- {
- nameProperty = property
- }
+ else {
+ createInstance(graph, instanceUri, node)
}
- //If no name property has been found -> Use the first property of the template
- if(nameProperty == null)
+ if (condition1_create_correspondingproperty)
{
- nameProperty = properties.head
+ //Connect new instance to the instance created from the root template
+ graph += new Quad(context.language, DBpediaDatasets.OntologyPropertiesObjects, instanceUri, correspondingProperty, subjectUri, node.sourceIri)
}
- templateNode.generateUri(subjectUri, nameProperty)
+ //Extract properties
+ graph ++= mappings.flatMap(_.extract(node, instanceUri))
+ }
}
+
+ graph
+ }
+
+ private def createMissingTypes(graph: mutable.Buffer[Quad], uri : String, node : TemplateNode): Unit =
+ {
+ val pageClass = node.root.getAnnotation(TemplateMapping.CLASS_ANNOTATION).getOrElse(throw new IllegalArgumentException("missing class Annotation"))
+
+ // Compute missing types, i.e. the set difference between the page classes and this TemplateMapping relatedClasses
+ val diffSet = mapToClass.relatedClasses.filterNot(c => pageClass.relatedClasses.contains(c))
+
+ // Set annotations
+ node.setAnnotation(TemplateMapping.CLASS_ANNOTATION, mapToClass)
+ node.setAnnotation(TemplateMapping.INSTANCE_URI_ANNOTATION, uri)
+
+ // Set new annotation (if new map is a subclass)
+ if (mapToClass.relatedClasses.contains(pageClass))
+ node.root.setAnnotation(TemplateMapping.CLASS_ANNOTATION, mapToClass)
+
+ // Create missing type statements
+ // Here we do not split the transitive and the direct types because different types may come from different mappings
+ // Splitting the types of the main resource is done at the MappingExtractor.extract()
+ for (cls <- diffSet)
+ graph += new Quad(context.language, DBpediaDatasets.OntologyTypes, uri, propertyRdfType, cls.uri, node.sourceIri+"&mappedTemplate="+node.title.encoded)
+
+ }
+
+ private def createInstance(graph: mutable.Buffer[Quad], uri : String, node : TemplateNode): Unit =
+ {
+ val classes = mapToClass.relatedClasses
+
+ //Set annotations
+ node.setAnnotation(TemplateMapping.CLASS_ANNOTATION, mapToClass)
+ node.setAnnotation(TemplateMapping.INSTANCE_URI_ANNOTATION, uri)
+
+ if(node.root.getAnnotation(TemplateMapping.CLASS_ANNOTATION).isEmpty)
+ {
+ node.root.setAnnotation(TemplateMapping.CLASS_ANNOTATION, mapToClass)
+ }
+
+ //Create type statements
+ for (cls <- classes) {
+ // Here we split the transitive types from the direct type assignment
+ val typeDataset = if (cls.equals(mapToClass)) DBpediaDatasets.OntologyTypes else DBpediaDatasets.OntologyTypesTransitive
+ graph += new Quad(context.language, typeDataset, uri, propertyRdfType, cls.uri, node.sourceIri+"&mappedTemplate="+node.title.encoded)
+ }
+ }
+
+ /**
+ * Generates a new URI from a template node
+ *
+ * @param subjectUri The base string of the generated URI
+ * @param templateNode The template for which the URI is to be generated
+ * @return The generated URI
+ */
+ private def generateUri(subjectUri : String, templateNode : TemplateNode) : String =
+ {
+ val properties = templateNode.children
+
+ //Cannot generate URIs for empty templates
+ if(properties.isEmpty)
+ {
+ return templateNode.generateUri(subjectUri, templateNode.title.decoded)
+ }
+
+ //Try to find a property which contains 'name'
+ var nameProperty : PropertyNode = null
+ for(property <- properties if nameProperty == null)
+ {
+ if(property.key.toLowerCase.contains("name"))
+ {
+ nameProperty = property
+ }
+ }
+
+ //If no name property has been found -> Use the first property of the template
+ if(nameProperty == null)
+ {
+ nameProperty = properties.head
+ }
+
+ templateNode.generateUri(subjectUri, nameProperty)
+ }
}
private object TemplateMapping
{
- val CLASS_ANNOTATION = new AnnotationKey[OntologyClass]
- val TEMPLATELIST_ANNOTATION = new AnnotationKey[Seq[String]]
- val INSTANCE_URI_ANNOTATION = new AnnotationKey[String]
+ val CLASS_ANNOTATION = new AnnotationKey[OntologyClass]
+ val TEMPLATELIST_ANNOTATION = new AnnotationKey[Seq[String]]
+ val INSTANCE_URI_ANNOTATION = new AnnotationKey[String]
}
diff --git a/core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractor.scala b/core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractor.scala
index 1f896da295..64764d3bf5 100644
--- a/core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractor.scala
+++ b/core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractor.scala
@@ -4,6 +4,7 @@ import org.dbpedia.extraction.config.Config
import org.dbpedia.extraction.config.provenance.DBpediaDatasets
import org.dbpedia.extraction.ontology.{Ontology, OntologyProperty, RdfNamespace}
import org.dbpedia.extraction.transform.{Quad, QuadBuilder}
+import org.dbpedia.extraction.util.abstracts.AbstractUtils
import org.dbpedia.extraction.util.{Language, RecordEntry, RecordSeverity}
import org.dbpedia.extraction.wikiparser.{Namespace, WikiPage}
import org.dbpedia.extraction.wikiparser.impl.wikipedia.Namespaces
@@ -47,6 +48,8 @@ class WikipediaNifExtractor(
protected val recordAbstracts: Boolean = !context.configFile.nifParameters.isTestRun //not! will create dbpedia short and long abstracts
protected val shortAbstractLength: Int = context.configFile.abstractParameters.shortAbstractMinLength
protected val abstractsOnly: Boolean = context.configFile.nifParameters.abstractsOnly
+ protected val removeBrokenBrackets: Boolean = context.configFile.nifParameters.removeBrokenBracketsProperty
+
override protected val templateString: String = Namespaces.names(context.language).get(Namespace.Template.code) match {
case Some(x) => x
case None => "Template"
@@ -67,8 +70,15 @@ class WikipediaNifExtractor(
*/
override def extendSectionTriples(extractionResults: ExtractedSection, graphIri: String, subjectIri: String): Seq[Quad] = {
//this is only dbpedia relevant: for singling out long and short abstracts
+
if (recordAbstracts && extractionResults.section.id == "abstract" && extractionResults.getExtractedLength > 0) {
- List(longQuad(subjectIri, extractionResults.getExtractedText, graphIri), shortQuad(subjectIri, getShortAbstract(extractionResults), graphIri))
+ val (cleanLongAbstract, cleanShortAbstract) = if (removeBrokenBrackets) {
+ (AbstractUtils.removeBrokenBracketsInAbstracts(extractionResults.getExtractedText),
+ AbstractUtils.removeBrokenBracketsInAbstracts(getShortAbstract(extractionResults)))
+ } else {
+ (extractionResults.getExtractedText, getShortAbstract(extractionResults))
+ }
+ List(longQuad(subjectIri, cleanLongAbstract, graphIri), shortQuad(subjectIri, cleanShortAbstract, graphIri))
}
else
List()
@@ -219,4 +229,5 @@ class WikipediaNifExtractor(
test.addAll(doc.select(query))
test.size() > 0
}
+
}
diff --git a/core/src/main/scala/org/dbpedia/extraction/util/abstracts/AbstractUtils.scala b/core/src/main/scala/org/dbpedia/extraction/util/abstracts/AbstractUtils.scala
new file mode 100644
index 0000000000..b2f718eeb4
--- /dev/null
+++ b/core/src/main/scala/org/dbpedia/extraction/util/abstracts/AbstractUtils.scala
@@ -0,0 +1,45 @@
+package org.dbpedia.extraction.util.abstracts
+
+object AbstractUtils {
+
+ /**
+ * this method removes broken information with brackets like (; some info), (, some info) or ()
+ */
+
+ def removeBrokenBracketsInAbstracts(text: String): String = {
+ var closeBrackets = 0
+ val result = new StringBuilder()
+ var bracketsWithSemicolon = 0
+ var skipBrackets = 0
+
+ for (i <- 0 until text.length) {
+ if (text(i) == '(') {
+ if ((i < text.length - 1) && (text(i + 1) == ';' || text(i + 1) == ',') && bracketsWithSemicolon == 0) {
+ bracketsWithSemicolon = 1
+ } else if (bracketsWithSemicolon > 0) {
+ bracketsWithSemicolon += 1
+ }
+ }
+ else if (text(i) == ')') {
+ if (bracketsWithSemicolon != 0) {
+ closeBrackets+=1
+ }
+ if (closeBrackets > 0 && closeBrackets == bracketsWithSemicolon) {
+ bracketsWithSemicolon = 0
+ closeBrackets = 0
+ skipBrackets += 1
+ }
+ }
+ if (bracketsWithSemicolon == 0 && skipBrackets == 0) {
+
+ if (!(result.nonEmpty && result.last == ' ' && text(i) == ' ')) {
+ result.append(text(i))
+ }
+ }
+ if (skipBrackets > 0) {
+ skipBrackets -= 1
+ }
+ }
+ result.toString().replaceAll("\\s*\\(\\s*\\)", "")
+ }
+}
diff --git a/core/src/test/scala/org/dbpedia/extraction/mappings/NifExtractorTest.scala b/core/src/test/scala/org/dbpedia/extraction/mappings/NifExtractorTest.scala
index 0f7b0a184e..0c0fb8fce8 100644
--- a/core/src/test/scala/org/dbpedia/extraction/mappings/NifExtractorTest.scala
+++ b/core/src/test/scala/org/dbpedia/extraction/mappings/NifExtractorTest.scala
@@ -57,7 +57,7 @@ class NifExtractorTest extends FunSuite {
private def getHtml(title:WikiTitle): String={
mwConnector.retrievePage(title, context.configFile.nifParameters.nifQuery) match{
- case Some(pc) => AbstractExtractor.postProcessExtractedHtml(title, pc)
+ case Some(pc) => PlainAbstractExtractor.postProcessExtractedHtml(title, pc)
case None => ""
}
}
diff --git a/core/src/test/scala/org/dbpedia/extraction/mappings/AbstractExtractorTest.scala b/core/src/test/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractorTest.scala
similarity index 96%
rename from core/src/test/scala/org/dbpedia/extraction/mappings/AbstractExtractorTest.scala
rename to core/src/test/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractorTest.scala
index 02460210ea..f2fff94a48 100644
--- a/core/src/test/scala/org/dbpedia/extraction/mappings/AbstractExtractorTest.scala
+++ b/core/src/test/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractorTest.scala
@@ -12,7 +12,7 @@ import scala.io.Source
import scala.language.reflectiveCalls
@Ignore // unignore to test; MediaWiki server has to be in place
-class AbstractExtractorTest
+class PlainAbstractExtractorTest
{
private val testDataRootDir = new File("core/src/test/resources/org/dbpedia/extraction/mappings")
private val configFilePath = "extraction-framework/dump/extraction.nif.abstracts.properties"
@@ -47,7 +47,7 @@ class AbstractExtractorTest
def language = Language.English
def configFile : Config = new Config(configFilePath)
}
- private val extractor = new AbstractExtractor(context)
+ private val extractor = new PlainAbstractExtractor(context)
private val parser = WikiParser.getInstance()
diff --git a/core/src/test/scala/org/dbpedia/extraction/util/AbstractUtilsTest.scala b/core/src/test/scala/org/dbpedia/extraction/util/AbstractUtilsTest.scala
new file mode 100644
index 0000000000..16019d2374
--- /dev/null
+++ b/core/src/test/scala/org/dbpedia/extraction/util/AbstractUtilsTest.scala
@@ -0,0 +1,30 @@
+package org.dbpedia.extraction.util
+import org.dbpedia.extraction.util.abstracts.AbstractUtils
+import org.scalatest.FunSuite
+
+class AbstractUtilsTest extends FunSuite {
+ test("test removing broken brackets function") {
+ val text = "Berlin (; German: [bɛʁˈliːn] ()) is the capital and largest city of Germany by both area and population."
+ val expectedText = "Berlin is the capital and largest city of Germany by both area and population."
+ val resultText = AbstractUtils.removeBrokenBracketsInAbstracts(text)
+ assert(resultText == expectedText)
+ }
+ test("test removing broken brackets function with only empty brackets") {
+ val text = "Berlin () is the capital and largest () city of Germany by both area and population."
+ val expectedText = "Berlin is the capital and largest city of Germany by both area and population."
+ val resultText = AbstractUtils.removeBrokenBracketsInAbstracts(text)
+ assert(resultText == expectedText)
+ }
+ test("test removing broken brackets function with bracket and semicolon") {
+ val text = "Berlin (; German: [bɛʁˈliːn]) is the capital and largest city of Germany by both area and population."
+ val expectedText = "Berlin is the capital and largest city of Germany by both area and population."
+ val resultText = AbstractUtils.removeBrokenBracketsInAbstracts(text)
+ assert(resultText == expectedText)
+ }
+ test("test removing broken brackets function with bracket and comma") {
+ val text = "Berlin (, German: [bɛʁˈliːn]) is the capital and largest city of Germany by both area and population."
+ val expectedText = "Berlin is the capital and largest city of Germany by both area and population."
+ val resultText = AbstractUtils.removeBrokenBracketsInAbstracts(text)
+ assert(resultText == expectedText)
+ }
+}
diff --git a/dump/extraction.abstracts.properties b/dump/extraction.abstracts.properties
index c5089966d0..2e264104c0 100644
--- a/dump/extraction.abstracts.properties
+++ b/dump/extraction.abstracts.properties
@@ -29,7 +29,7 @@ namespaces=Main
# extractor class names starting with "." are prefixed by "org.dbpedia.extraction.mappings"
-extractors=.AbstractExtractor
+extractors=.PlainAbstractExtractor
# if ontology and mapping files are not given or do not exist, download info from mappings.dbpedia.org
# ontology=../ontology.xml see universal.properties
diff --git a/dump/pom.xml b/dump/pom.xml
index 95e1cc534a..df6767e297 100644
--- a/dump/pom.xml
+++ b/dump/pom.xml
@@ -15,7 +15,8 @@
DBpedia Dump Extraction
- PRODUCTIVE
+ PRODUCTIVE
+ PRODUCTIVE
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/model/Construct.scala b/dump/src/main/scala/org/dbpedia/validation/construct/model/Construct.scala
new file mode 100644
index 0000000000..0f37f63a3b
--- /dev/null
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/model/Construct.scala
@@ -0,0 +1,3 @@
+package org.dbpedia.validation.construct.model
+
+case class Construct(self: String, left: Option[String] = None, right: Option[String] = None)
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/model/package.scala b/dump/src/main/scala/org/dbpedia/validation/construct/model/package.scala
index a1a6131a2a..74488b24a4 100644
--- a/dump/src/main/scala/org/dbpedia/validation/construct/model/package.scala
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/model/package.scala
@@ -20,6 +20,11 @@ package object model {
val PATTERN_BASED, VOCAB_BASED, PART_BASED, GENERIC, TYPED_LITERAL: Value = Value
}
+ object ValidatorGroup extends Enumeration {
+
+ val RIGHT,LEFT, DEFAULT: Value = Value
+ }
+
object TestCaseType extends Enumeration {
val GENERIC, CUSTOM: Value = Value
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/NotContainsValidator.scala b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/NotContainsValidator.scala
index e75c11cab0..e3b27538fd 100644
--- a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/NotContainsValidator.scala
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/NotContainsValidator.scala
@@ -1,15 +1,26 @@
package org.dbpedia.validation.construct.model.validators
-import org.dbpedia.validation.construct.model.{ValidatorID, ValidatorIRI, ValidatorType}
+import org.dbpedia.validation.construct.model.{Construct, ValidatorGroup, ValidatorID, ValidatorIRI, ValidatorType}
-case class NotContainsValidator(ID: ValidatorID, iri: ValidatorIRI, sequence: String) extends Validator {
+case class NotContainsValidator(ID: ValidatorID, iri: ValidatorIRI, sequence: String, validatorGroup: ValidatorGroup.Value = ValidatorGroup.DEFAULT) extends Validator {
override val METHOD_TYPE: ValidatorType.Value = ValidatorType.PART_BASED
+ override val VALIDATOR_GROUP: ValidatorGroup.Value = validatorGroup
- override def run(nTriplePart: String): Boolean = {
-
- ! nTriplePart.contains(sequence)
-}
+ override def run(nTriplePart: Construct): Boolean = {
+ VALIDATOR_GROUP match {
+ case ValidatorGroup.RIGHT => nTriplePart.right match {
+ // TODO: maybe we need to rename "value"
+ case Some(value) => !value.contains(sequence)
+ case None => false
+ }
+ case ValidatorGroup.LEFT => nTriplePart.left match {
+ case Some(value) => !value.contains(sequence)
+ case None => false
+ }
+ case _ => !nTriplePart.self.contains(sequence)
+ }
+ }
override def info(): String = s"does not contain $sequence"
}
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/PatternValidator.scala b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/PatternValidator.scala
index 1d153fded4..3c30df831d 100644
--- a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/PatternValidator.scala
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/PatternValidator.scala
@@ -1,18 +1,28 @@
package org.dbpedia.validation.construct.model.validators
import java.util.regex.Pattern
+import org.dbpedia.validation.construct.model.{Construct, ValidatorGroup, ValidatorID, ValidatorIRI, ValidatorType}
-import org.dbpedia.validation.construct.model.{ValidatorID, ValidatorIRI, ValidatorType}
-
-case class PatternValidator(ID: ValidatorID, iri: ValidatorIRI, patternString: String) extends Validator {
+case class PatternValidator(ID: ValidatorID, iri: ValidatorIRI, patternString: String, validatorGroup: ValidatorGroup.Value = ValidatorGroup.DEFAULT) extends Validator {
val pattern: Pattern = patternString.r.pattern
override val METHOD_TYPE: ValidatorType.Value = ValidatorType.PART_BASED
-
- override def run(nTriplePart: String): Boolean = {
-
- pattern.matcher(nTriplePart).matches()
+ override val VALIDATOR_GROUP: ValidatorGroup.Value = validatorGroup
+
+ override def run(nTriplePart: Construct): Boolean = {
+ VALIDATOR_GROUP match {
+ case ValidatorGroup.RIGHT => nTriplePart.right match {
+ // TODO: maybe we need to rename "value"
+ case Some(value) => pattern.matcher(value).matches()
+ case None => false
+ }
+ case ValidatorGroup.LEFT => nTriplePart.left match {
+ case Some(value) => pattern.matcher(value).matches()
+ case None => false
+ }
+ case _ => pattern.matcher(nTriplePart.self).matches()
+ }
}
override def info(): String = s"matches pattern $patternString"
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/TypedLiteralValidator.scala b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/TypedLiteralValidator.scala
index da00966e7f..25428a968a 100644
--- a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/TypedLiteralValidator.scala
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/TypedLiteralValidator.scala
@@ -1,18 +1,30 @@
package org.dbpedia.validation.construct.model.validators
-import org.dbpedia.validation.construct.model.{ValidatorID, ValidatorIRI, ValidatorType}
+import org.dbpedia.validation.construct.model.{Construct, ValidatorGroup, ValidatorID, ValidatorIRI, ValidatorType}
-case class TypedLiteralValidator(ID: ValidatorID, iri: ValidatorIRI, patternString: String) extends Validator {
+case class TypedLiteralValidator(ID: ValidatorID, iri: ValidatorIRI, patternString: String, validatorGroup: ValidatorGroup.Value = ValidatorGroup.DEFAULT) extends Validator {
private val pattern = patternString.r.pattern
override val METHOD_TYPE: ValidatorType.Value = ValidatorType.TYPED_LITERAL
-
- override def run(nTriplePart: String): Boolean = {
-
- val lexicalForm = nTriplePart.trim.split("\"").dropRight(1).drop(1).mkString("")
-
+ override val VALIDATOR_GROUP: ValidatorGroup.Value = validatorGroup
+
+ override def run(nTriplePart: Construct): Boolean = {
+ val lexicalForm = VALIDATOR_GROUP match {
+ case ValidatorGroup.RIGHT => nTriplePart.right match {
+ // TODO: 1) maybe we need to rename "value"
+ // 2) discuss what to do if we want to check the value that doesn't exist on
+ // the left or right side, at the moment we only return false in these cases
+ case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
+ case None => return false
+ }
+ case ValidatorGroup.LEFT => nTriplePart.left match {
+ case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
+ case None => return false
+ }
+ case _ => nTriplePart.self.trim.split("\"").dropRight(1).drop(1).mkString("")
+ }
pattern.matcher(lexicalForm).matches()
}
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/Validator.scala b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/Validator.scala
index c8374b85d2..bcd4a66b72 100644
--- a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/Validator.scala
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/Validator.scala
@@ -1,6 +1,6 @@
package org.dbpedia.validation.construct.model.validators
-import org.dbpedia.validation.construct.model.{ValidatorID, ValidatorIRI, ValidatorType}
+import org.dbpedia.validation.construct.model.{Construct, ValidatorGroup, ValidatorID, ValidatorIRI, ValidatorType}
trait Validator {
@@ -8,6 +8,8 @@ trait Validator {
val METHOD_TYPE: ValidatorType.Value
+ val VALIDATOR_GROUP: ValidatorGroup.Value = ValidatorGroup.DEFAULT
+
val iri: ValidatorIRI
/**
@@ -15,7 +17,7 @@ trait Validator {
* @param nTriplePart part of an NTripleRow { row.trim.split(" ",3) }
* @return true if test successful
*/
- def run(nTriplePart: String): Boolean
+ def run(nTriplePart: Construct): Boolean
def info(): String
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/VocabValidator.scala b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/VocabValidator.scala
index 69bb1d4c8d..7a5b0acdd5 100644
--- a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/VocabValidator.scala
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/VocabValidator.scala
@@ -1,7 +1,7 @@
package org.dbpedia.validation.construct.model.validators
-import org.dbpedia.validation.construct.model.{ValidatorID, ValidatorIRI, ValidatorType}
+import org.dbpedia.validation.construct.model.{Construct, ValidatorGroup, ValidatorID, ValidatorIRI, ValidatorType}
import scala.collection.immutable.HashSet
@@ -13,15 +13,24 @@ import scala.collection.immutable.HashSet
* @param vocabUrl
* @param vocab
*/
-case class VocabValidator(ID: ValidatorID, iri: ValidatorIRI, vocabUrl: String, vocab: HashSet[String]) extends Validator {
+case class VocabValidator(ID: ValidatorID, iri: ValidatorIRI, vocabUrl: String, vocab: HashSet[String],validatorGroup: ValidatorGroup.Value = ValidatorGroup.DEFAULT) extends Validator {
override val METHOD_TYPE: ValidatorType.Value = ValidatorType.VOCAB_BASED
-
- override def run(nTriplePart: String): Boolean = {
-
- val bool = vocab.contains(nTriplePart)
-// if (! bool ) println(vocabUrl,nTriplePart)
- bool
+ override val VALIDATOR_GROUP: ValidatorGroup.Value = validatorGroup
+
+ override def run(nTriplePart: Construct): Boolean = {
+ VALIDATOR_GROUP match {
+ case ValidatorGroup.RIGHT => nTriplePart.right match {
+ // TODO: maybe we need to rename "value"
+ case Some(value) => vocab.contains(value)
+ case None => false
+ }
+ case ValidatorGroup.LEFT => nTriplePart.left match {
+ case Some(value) => vocab.contains(value)
+ case None => false
+ }
+ case _ => vocab.contains(nTriplePart.self)
+ }
}
override def info(): String = s"one of vocab $vocabUrl"
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericIRIValidator.scala b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericIRIValidator.scala
index 0807c9a897..08d80f7ae4 100644
--- a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericIRIValidator.scala
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericIRIValidator.scala
@@ -2,7 +2,7 @@ package org.dbpedia.validation.construct.model.validators.generic
import org.apache.jena.riot.system.IRIResolver
import org.dbpedia.validation.construct.model.validators.Validator
-import org.dbpedia.validation.construct.model.{ValidatorID, ValidatorIRI, ValidatorType}
+import org.dbpedia.validation.construct.model.{Construct, ValidatorID, ValidatorIRI, ValidatorType}
case class GenericIRIValidator(ID: ValidatorID) extends Validator {
@@ -10,9 +10,9 @@ case class GenericIRIValidator(ID: ValidatorID) extends Validator {
override val METHOD_TYPE: ValidatorType.Value = ValidatorType.GENERIC
- override def run(nTriplePart: String): Boolean = {
+ override def run(nTriplePart: Construct): Boolean = {
- ! IRIResolver.checkIRI(nTriplePart)
+ !IRIResolver.checkIRI(nTriplePart.self)
}
override def info(): String = "IRI Validation with Apache Jena IRI parser (prevalence:= all IRIs)"
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralLangTagValidator.scala b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralLangTagValidator.scala
index dccbcfacec..6d8931b00c 100644
--- a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralLangTagValidator.scala
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralLangTagValidator.scala
@@ -1,7 +1,7 @@
package org.dbpedia.validation.construct.model.validators.generic
import org.dbpedia.validation.construct.model
-import org.dbpedia.validation.construct.model.{ValidatorID, ValidatorIRI, ValidatorType}
+import org.dbpedia.validation.construct.model.{Construct, ValidatorID, ValidatorIRI, ValidatorType}
import org.dbpedia.validation.construct.model.validators.Validator
/**
@@ -15,9 +15,9 @@ case class GenericLiteralLangTagValidator(ID: ValidatorID) extends Validator {
private val pattern = ".*@[a-zA-Z]+(-[a-zA-Z0-9]+)*$".r.pattern
- override def run(nTriplePart: String): Boolean = {
+ override def run(nTriplePart: Construct): Boolean = {
- pattern.matcher(nTriplePart).matches()
+ pattern.matcher(nTriplePart.self).matches()
}
override def info(): String = "Literal language tag conformity to BCP47 (prevalence:= literals with lang tags)"
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralValidator.scala b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralValidator.scala
index 19820fa7bf..c316371323 100644
--- a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralValidator.scala
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralValidator.scala
@@ -2,13 +2,12 @@ package org.dbpedia.validation.construct.model.validators.generic
import java.io.ByteArrayInputStream
import java.nio.charset.StandardCharsets
-
import org.apache.jena.riot.{RIOT, RiotException}
import org.apache.jena.riot.lang.LangNTriples
import org.apache.jena.riot.system.{ErrorHandlerFactory, IRIResolver, ParserProfileStd, PrefixMapFactory, RiotLib}
import org.apache.jena.riot.tokens.TokenizerFactory
import org.dbpedia.validation.construct.model.validators.Validator
-import org.dbpedia.validation.construct.model.{ValidatorID, ValidatorIRI, ValidatorType}
+import org.dbpedia.validation.construct.model.{Construct, ValidatorID, ValidatorIRI, ValidatorType}
/**
* TODO
@@ -21,9 +20,9 @@ case class GenericLiteralValidator(ID: ValidatorID) extends Validator {
override val METHOD_TYPE: ValidatorType.Value = ValidatorType.GENERIC
- override def run(nTriplePart: String): Boolean = {
+ override def run(nTriplePart: Construct): Boolean = {
- val triple = "<> <> "+nTriplePart+" ."
+ val triple = "<> <> "+nTriplePart.self+" ."
val profile = {
new ParserProfileStd(RiotLib.factoryRDF, ErrorHandlerFactory.errorHandlerStrict,
IRIResolver.create, PrefixMapFactory.createForInput,
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericRdfLangStringValidator.scala b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericRdfLangStringValidator.scala
index fb9b705852..87ba36478e 100644
--- a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericRdfLangStringValidator.scala
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericRdfLangStringValidator.scala
@@ -1,7 +1,7 @@
package org.dbpedia.validation.construct.model.validators.generic
import org.dbpedia.validation.construct.model
-import org.dbpedia.validation.construct.model.{ValidatorID, ValidatorIRI}
+import org.dbpedia.validation.construct.model.{Construct, ValidatorID, ValidatorIRI}
import org.dbpedia.validation.construct.model.validators.Validator
case class GenericRdfLangStringValidator(ID: ValidatorID) extends Validator {
@@ -9,8 +9,8 @@ case class GenericRdfLangStringValidator(ID: ValidatorID) extends Validator {
override val METHOD_TYPE: model.ValidatorType.Value = model.ValidatorType.TYPED_LITERAL
override val iri: ValidatorIRI = "#GENERIC_RDF_LANG_STRING_VALIDATOR"
- override def run(nTriplePart: String): Boolean = {
- ! nTriplePart.endsWith("")
+ override def run(nTriplePart: Construct): Boolean = {
+ !nTriplePart.self.endsWith("")
}
override def info(): String = "rdf:langString is an implicit type and must never be serialized. (prevalence := all typed literals)"
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericValidator.scala b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericValidator.scala
index 4df8d0917e..c6af912989 100644
--- a/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericValidator.scala
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericValidator.scala
@@ -2,7 +2,7 @@ package org.dbpedia.validation.construct.model.validators.generic
import org.dbpedia.validation.construct.model
import org.dbpedia.validation.construct.model.validators.Validator
-import org.dbpedia.validation.construct.model.{ValidatorID, ValidatorIRI, ValidatorType}
+import org.dbpedia.validation.construct.model.{Construct, ValidatorID, ValidatorIRI, ValidatorType}
case class GenericValidator(ID: ValidatorID) extends Validator {
@@ -10,7 +10,7 @@ case class GenericValidator(ID: ValidatorID) extends Validator {
override val METHOD_TYPE: model.ValidatorType.Value = ValidatorType.GENERIC
- override def run(nTriplePart: String): Boolean = true
+ override def run(nTriplePart: Construct): Boolean = true
override def info(): String = "Missing validator: requires one ore more validators (always true)"
}
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/tests/TestSuiteFactory.scala b/dump/src/main/scala/org/dbpedia/validation/construct/tests/TestSuiteFactory.scala
index 95a2bfdd7f..244957393c 100644
--- a/dump/src/main/scala/org/dbpedia/validation/construct/tests/TestSuiteFactory.scala
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/tests/TestSuiteFactory.scala
@@ -25,4 +25,4 @@ object TestSuiteFactory {
new NTripleTestSuite(triggerCollection, validatorCollection, testCaseCollection)
}
}
-}
+}
\ No newline at end of file
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/NTripleTestGenerator.scala b/dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/NTripleTestGenerator.scala
index 625c6f57ad..909d46eeef 100644
--- a/dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/NTripleTestGenerator.scala
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/NTripleTestGenerator.scala
@@ -2,13 +2,12 @@ package org.dbpedia.validation.construct.tests.generators
import java.io.InputStreamReader
import java.net.URL
-
-import org.apache.jena.query.{QueryExecutionFactory, QueryFactory}
+import org.apache.jena.query.{QueryExecutionFactory, QueryFactory, QuerySolution}
import org.apache.jena.rdf.model.{Model, ModelFactory}
import org.apache.jena.riot.{RDFDataMgr, RDFLanguages}
import org.dbpedia.validation.construct.model.triggers._
import org.dbpedia.validation.construct.model.triggers.generic.{GenericIRITrigger, GenericLangLiteralTrigger, GenericLiteralTrigger, GenericPlainLiteralTrigger, GenericTypedLiteralTrigger}
-import org.dbpedia.validation.construct.model.{TestCase, TestCaseType, TriggerIRI, ValidatorID, ValidatorIRI}
+import org.dbpedia.validation.construct.model.{TestCase, TestCaseType, TriggerIRI, ValidatorGroup, ValidatorID, ValidatorIRI}
import org.dbpedia.validation.construct.model.validators._
import org.dbpedia.validation.construct.model.validators.generic.{GenericIRIValidator, GenericLiteralLangTagValidator, GenericLiteralValidator, GenericRdfLangStringValidator, GenericValidator}
@@ -20,6 +19,8 @@ import scala.collection.mutable
object NTripleTestGenerator extends TestGenerator {
private val delim = "\t"
+ private val rightValidator = "rightValidator"
+ private val leftValidator = "leftValidator"
def loadTestGenerator(testModel: Model): HashMap[TriggerIRI, Array[ValidatorIRI]] = {
@@ -211,7 +212,7 @@ object NTripleTestGenerator extends TestGenerator {
currentValidatorID += 1
// generic rdf lang string
- val genericRdfLangStringValidator =GenericRdfLangStringValidator(currentValidatorID)
+ val genericRdfLangStringValidator = GenericRdfLangStringValidator(currentValidatorID)
validatorCollection.append(genericRdfLangStringValidator)
validatorMap.put(genericRdfLangStringValidator.iri, Array[Int](genericRdfLangStringValidator.ID))
currentValidatorID += 1
@@ -230,7 +231,6 @@ object NTripleTestGenerator extends TestGenerator {
QueryExecutionFactory.create(validatorQuery, testModel).execSelect().foreach(
validatorQuerySolution => {
-
val groupedValidators = ArrayBuffer[Int]()
/*
@@ -256,8 +256,8 @@ object NTripleTestGenerator extends TestGenerator {
if (validatorQuerySolution.contains("patterns")) {
validatorQuerySolution.getLiteral("patterns").getLexicalForm.split(delim).foreach(patternString => {
-
- validatorCollection.append(PatternValidator(currentValidatorID, validatorIRI, patternString))
+ val validatorGroup = getValidatorGroup(validatorQuerySolution)
+ validatorCollection.append(PatternValidator(currentValidatorID, validatorIRI, patternString, validatorGroup))
groupedValidators.append(currentValidatorID)
currentValidatorID += 1
})
@@ -269,8 +269,8 @@ object NTripleTestGenerator extends TestGenerator {
if (validatorQuerySolution.contains("oneOfVocabs")) {
validatorQuerySolution.getLiteral("oneOfVocabs").getLexicalForm.split(delim).foreach(vocabUrl => {
-
- validatorCollection.append(VocabValidator(currentValidatorID, validatorIRI, vocabUrl, getVocab(vocabUrl)))
+ val validatorGroup = getValidatorGroup(validatorQuerySolution)
+ validatorCollection.append(VocabValidator(currentValidatorID, validatorIRI, vocabUrl, getVocab(vocabUrl),validatorGroup))
groupedValidators.append(currentValidatorID)
currentValidatorID += 1
})
@@ -324,12 +324,22 @@ object NTripleTestGenerator extends TestGenerator {
if (validatorQuerySolution.contains("pattern")) {
val patternString = validatorQuerySolution.getLiteral("pattern").getLexicalForm
-
- validatorCollection.append(TypedLiteralValidator(currentValidatorID, validatorIRI, patternString))
+ val validatorGroup = getValidatorGroup(validatorQuerySolution)
+ validatorCollection.append(TypedLiteralValidator(currentValidatorID, validatorIRI, patternString, validatorGroup))
grouepdTestApproachIDs.append(currentValidatorID)
currentValidatorID += 1
}
-
+ /*
+ v:doesNotContain
+ */
+ if (validatorQuerySolution.contains("doesNotContains")) {
+ validatorQuerySolution.getLiteral("doesNotContains").getLexicalForm.split(delim).foreach(charSeq => {
+ val validatorGroup = getValidatorGroup(validatorQuerySolution)
+ validatorCollection.append(NotContainsValidator(currentValidatorID, validatorIRI, charSeq, validatorGroup))
+ grouepdTestApproachIDs.append(currentValidatorID)
+ currentValidatorID += 1
+ })
+ }
validatorMap.put(validatorIRI, grouepdTestApproachIDs.toArray)
}
)
@@ -337,6 +347,21 @@ object NTripleTestGenerator extends TestGenerator {
(validatorCollection.toArray, HashMap[ValidatorIRI, Array[ValidatorID]]() ++ validatorMap)
}
+ def getValidatorGroup(validatorQuerySolution: QuerySolution): ValidatorGroup.Value = {
+ if (validatorQuerySolution.contains("validatorGroup")) {
+ val validatorGroup = validatorQuerySolution.getResource("validatorGroup").getLocalName
+ if (validatorGroup == rightValidator) {
+ ValidatorGroup.RIGHT
+ } else if (validatorGroup == leftValidator){
+ ValidatorGroup.LEFT
+ } else {
+ ValidatorGroup.DEFAULT
+ }
+ } else {
+ ValidatorGroup.DEFAULT
+ }
+ }
+
def getVocab(uri: String): HashSet[String] = {
val model = ModelFactory.createDefaultModel()
@@ -345,7 +370,6 @@ object NTripleTestGenerator extends TestGenerator {
val query = QueryFactory.create(Queries.oneOfVocabQueryStr)
val resultSet = QueryExecutionFactory.create(query, model).execSelect
val properties = ArrayBuffer[String]()
- 12
while (resultSet.hasNext) {
properties.append(resultSet.next().getResource("property").getURI)
}
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/Queries.scala b/dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/Queries.scala
index 88e6ff6a7e..21ceeca3cf 100644
--- a/dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/Queries.scala
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/Queries.scala
@@ -65,11 +65,13 @@ object Queries {
def literalValidatorQueryStr(): String =
s"""$prefixDefinition
|
- |SELECT Distinct ?validator ?comment ?pattern
+ |SELECT Distinct ?validator ?comment ?pattern ?validatorGroup ?doesNotContains
|{
| ?validator
| a v:Datatype_Literal_Validator .
+ | Optional{ ?validator v:validatorGroup ?validatorGroup }
| Optional{ ?validator rdfs:comment ?comment }
+ | Optional{ ?validator v:doesNotContain ?doesNotContains . }
| Optional{ ?validator v:pattern ?pattern }
|}
""".stripMargin
@@ -92,8 +94,10 @@ object Queries {
|
|SELECT ?generator ?trigger ?validator
|{
- | ?generator v:trigger ?trigger ;
- | v:validator ?validator .
+ | ?generator
+ | a v:TestGenerator ;
+ | v:trigger ?trigger ;
+ | v:validator ?validator .
|
|}
""".stripMargin
diff --git a/dump/src/main/scala/org/dbpedia/validation/construct/tests/suites/NTripleTestSuite.scala b/dump/src/main/scala/org/dbpedia/validation/construct/tests/suites/NTripleTestSuite.scala
index 3c4630b491..2ed15b4a59 100644
--- a/dump/src/main/scala/org/dbpedia/validation/construct/tests/suites/NTripleTestSuite.scala
+++ b/dump/src/main/scala/org/dbpedia/validation/construct/tests/suites/NTripleTestSuite.scala
@@ -3,7 +3,7 @@ package org.dbpedia.validation.construct.tests.suites
import org.apache.jena.rdf.model.Model
import org.apache.spark.broadcast.Broadcast
import org.apache.spark.sql.SQLContext
-import org.dbpedia.validation.construct.model.{TestCase, TestCaseType, TestScore, TriggerType}
+import org.dbpedia.validation.construct.model.{Construct, TestCase, TestCaseType, TestScore, TriggerType}
import org.dbpedia.validation.construct.model.triggers.{IRITrigger, Trigger}
import org.dbpedia.validation.construct.model.validators.Validator
import org.dbpedia.validation.construct.tests.generators.NTripleTestGenerator
@@ -64,11 +64,13 @@ class NTripleTestSuite(override val triggerCollection: Array[Trigger],
testReports
}
+
+
/**
* Assumption: The whitespace following subject, predicate, and object must be a single space, (U+0020).
* All other locations that allow whitespace must be empty. (https://www.w3.org/TR/n-triples/#canonical-ntriples)
*/
- def prepareFlatTerseLine(line: String): Array[String] = {
+ def prepareFlatTerseLine(line: String): Array[Construct] = {
val spo = line.split(">", 3)
@@ -95,11 +97,12 @@ class NTripleTestSuite(override val triggerCollection: Array[Trigger],
case ae: ArrayIndexOutOfBoundsException => println(line); ae.printStackTrace()
}
- Array(s, p, o)
+ Array(Construct(s), Construct(p, Some(s), Some(o)), Construct(o))
+ //Array(s, p, o)
}
def validateNTriplePart(
- nTriplePart: String,
+ nTriplePart: Construct,
testCaseCount: Int,
triggerCollection: Array[Trigger],
validatorCollection: Array[Validator],
@@ -114,7 +117,7 @@ class NTripleTestSuite(override val triggerCollection: Array[Trigger],
val errorsOfTestCase = Array.fill[Long](testCaseCount)(0)
val nTriplePartType = {
- if (nTriplePart.startsWith("\"")) TriggerType.LITERAL
+ if (nTriplePart.self.startsWith("\"")) TriggerType.LITERAL
else TriggerType.IRI
}
@@ -122,7 +125,7 @@ class NTripleTestSuite(override val triggerCollection: Array[Trigger],
trigger => {
- if (trigger.isTriggered(nTriplePart)) {
+ if (trigger.isTriggered(nTriplePart.self)) {
// if (trigger.iri != "#GENERIC_IRI_TRIGGER") covered = true
diff --git a/dump/src/test/bash/minidump-overview.md b/dump/src/test/bash/minidump-overview.md
index 05dd67f1a9..e50d4d4593 100644
--- a/dump/src/test/bash/minidump-overview.md
+++ b/dump/src/test/bash/minidump-overview.md
@@ -45,6 +45,7 @@
* https://en.wikipedia.org/wiki/IKEA
* https://en.wikipedia.org/wiki/Jim_Pewter
* https://en.wikipedia.org/wiki/Kerala_Agricultural_University
+* https://en.wikipedia.org/wiki/Marian_Breland_Bailey
* https://en.wikipedia.org/wiki/Mini_(Mark_I)
* https://en.wikipedia.org/wiki/N.EX.T
* https://en.wikipedia.org/wiki/Ranma_½
diff --git a/dump/src/test/bash/uris.lst b/dump/src/test/bash/uris.lst
index f7040b13ce..c9fa9a0be9 100644
--- a/dump/src/test/bash/uris.lst
+++ b/dump/src/test/bash/uris.lst
@@ -41,6 +41,7 @@ https://en.wikipedia.org/wiki/IBM
https://en.wikipedia.org/wiki/IKEA
https://en.wikipedia.org/wiki/Jim_Pewter
https://en.wikipedia.org/wiki/Kerala_Agricultural_University
+https://en.wikipedia.org/wiki/Marian_Breland_Bailey
https://en.wikipedia.org/wiki/Mini_(Mark_I)
https://en.wikipedia.org/wiki/N.EX.T
https://en.wikipedia.org/wiki/Ranma_½
diff --git a/dump/src/test/resources/ci-tests/dbpedia-specific-ci-tests.ttl b/dump/src/test/resources/ci-tests/dbpedia-specific-ci-tests.ttl
index 0e6be35f05..e30671f74f 100644
--- a/dump/src/test/resources/ci-tests/dbpedia-specific-ci-tests.ttl
+++ b/dump/src/test/resources/ci-tests/dbpedia-specific-ci-tests.ttl
@@ -1,3 +1,4 @@
+@base .
@prefix v: .
@prefix trigger: .
@prefix validator: .
@@ -19,8 +20,8 @@ trigger:wikipedia
a v:RDF_IRI_Trigger ;
trigger:pattern "^http://en.wikipedia.org/wiki/.*" ;
rdfs:label "wikipedia" .
-
-<#wikipedia_IRIs>
+
+<#wikipedia_IRIs>
a v:TestGenerator ;
v:trigger trigger:wikipedia ;
# same as dbpedia
@@ -28,40 +29,46 @@ trigger:wikipedia
v:validator validator:dbpedia_resource_delims ;
v:validator [
a v:IRI_Validator ;
- v:doesNotContain "<" , ">", "\"" , " ", "{", "}", "|", "\\", "^" , "`"
+ v:doesNotContain "<" , ">", "\"" , " ", "{", "}", "|", "\\", "^" , "`"
] .
-trigger:generic_wikipedia_dbpedia_extraction
+trigger:generic_wikipedia_dbpedia_extraction
a v:RDF_IRI_Trigger ;
trigger:pattern "^http://(ga\\.|af\\.|als\\.|am\\.|an\\.|ar\\.|arz\\.|ast\\.|az\\.|azb\\.|ba\\.|bar\\.|batsmg\\.|be\\.|bg\\.|bn\\.|bpy\\.|br\\.|bs\\.|bug\\.|ca\\.|cdo\\.|ce\\.|ceb\\.|ckb\\.|cs\\.|cv\\.|cy\\.|da\\.|de\\.|el\\.|eml\\.|en\\.|eo\\.|es\\.|et\\.|eu\\.|fa\\.|fi\\.|fo\\.|fr\\.|fy\\.|gd\\.|gl\\.|gu\\.|he\\.|hi\\.|hr\\.|hsb\\.|ht\\.|hu\\.|hy\\.|ia\\.|id\\.|ilo\\.|io\\.|is\\.|it\\.|ja\\.|jv\\.|ka\\.|kk\\.|kn\\.|ko\\.|ku\\.|ky\\.|la\\.|lb\\.|li\\.|lmo\\.|lt\\.|lv\\.|mai\\.|mg\\.|mhr\\.|min\\.|mk\\.|ml\\.|mn\\.|mr\\.|mrj\\.|ms\\.|my\\.|mzn\\.|nan\\.|nap\\.|nds\\.|ne\\.|new\\.|nl\\.|nn\\.|no\\.|oc\\.|or\\.|os\\.|pa\\.|pl\\.|pms\\.|pnb\\.|pt\\.|qu\\.|ro\\.|ru\\.|sa\\.|sah\\.|scn\\.|sco\\.|sd\\.|sh\\.|si\\.|simple\\.|sk\\.|sl\\.|sq\\.|sr\\.|su\\.|sv\\.|sw\\.|ta\\.|te\\.|tg\\.|th\\.|tl\\.|tr\\.|tt\\.|uk\\.|ur\\.|uz\\.|vec\\.|vi\\.|vo\\.|wa\\.|war\\.|wuu\\.|xmf\\.|yi\\.|yo\\.|yue\\.|zh\\.)?dbpedia.org/resource/((?!(\\?.*(nif=|dbpv=).*(dbpv=|nif=))).)*$" ;
rdfs:label "DBpedia IRIs from Wikipedia used in Generic Extraction" ;
rdfs:comment "Starting with http://dbpedia.org or for 140 languages with http://xx.dbpedia.org" .
-
-trigger:mappings_wikipedia_dbpedia_extraction
+
+trigger:mappings_wikipedia_dbpedia_extraction
a v:RDF_IRI_Trigger ;
trigger:pattern "^http://(ar\\.|az\\.|be\\.|bg\\.|bn\\.|ca\\.|cs\\.|cy\\.|da\\.|de\\.|el\\.|en\\.|eo\\.|es\\.|et\\.|eu\\.|fa\\.|fi\\.|fr\\.|ga\\.|gl\\.|hi\\.|hr\\.|hu\\.|hy\\.|id\\.|it\\.|ja\\.|ko\\.|lt\\.|lv\\.|mk\\.|nl\\.|pl\\.|pt\\.|ro\\.|ru\\.|sk\\.|sl\\.|sr\\.|sv\\.|tr\\.|uk\\.|ur\\.|vi\\.|war\\.|zh\\.|commons\\.)?dbpedia.org/resource/((?!(\\?.*(nif=|dbpv=).*(dbpv=|nif=))).)*$" ;
rdfs:label "DBpedia IRIs from Wikipedia used in Mappings Extraction" ;
rdfs:comment "Starting with http://dbpedia.org or for 40 mapped languages with http://xx.dbpedia.org" .
-trigger:wikidata_dbpedia_extraction
+trigger:wikidata_dbpedia_extraction
a v:RDF_IRI_Trigger ;
trigger:pattern "^http://wikidata.dbpedia.org/resource/Q((?!(\\?.*(nif=|dbpv=).*(dbpv=|nif=))).)*$" ;
rdfs:label "DBpedia IRIs from Wikidata extraction" ;
rdfs:comment "Starting with http://wikidata.dbpedia.org/resource/Q" .
-
-trigger:wikidata
+
+trigger:wikidata
a v:RDF_IRI_Trigger ;
trigger:pattern "^http://www.wikidata.org/entity/Q.*" ;
rdfs:label "Wikidata IRIs" ;
rdfs:comment "Starting with http://www.wikidata.org/entity/Q" .
+trigger:abstract_property
+ a v:RDF_IRI_Trigger ;
+ trigger:pattern "http://dbpedia.org/ontology/abstract" ;
+ rdfs:label "abstract IRIs" ;
+ rdfs:comment "Match abstracts" .
+
trigger:dbpedia_nif
a v:RDF_IRI_Trigger ;
trigger:pattern "^http://(\\w*\\.)?dbpedia.org/resource/.*\\?.*(nif=|dbpv=).*(dbpv=|nif=).*" ;
rdfs:label "DBpedia NIF IRIs" ;
rdfs:comment "Containing NIF query part" .
-trigger:dbpedia_ontology
+trigger:dbpedia_ontology
a v:RDF_IRI_Trigger ;
trigger:pattern "^http://dbpedia.org/ontology/.*" ;
rdfs:label "DBpedia Ontology IRIs" ;
@@ -72,7 +79,7 @@ trigger:dbpedia_property
trigger:pattern "^http://(\\w*\\.)?dbpedia.org/property/[a-z].*" ;
rdfs:label "DBpedia Property IRIs" ;
rdfs:comment "http://dbpedia.org/property/*" .
-
+
############### TODO Vocab triggers, can be automated
trigger:wgs84
@@ -92,13 +99,13 @@ trigger:w3_rdfs
trigger:pattern "^http://www.w3.org/2000/01/rdf-schema#.*" ;
rdfs:label "rdfs trigger" ;
rdfs:comment "http://www.w3.org/2000/01/rdf-schema#" .
-
+
trigger:w3_rdf
a v:RDF_IRI_Trigger ;
trigger:pattern "^http://www.w3.org/1999/02/22-rdf-syntax-ns#.*" ;
rdfs:label "rdf trigger" ;
rdfs:comment "http://www.w3.org/1999/02/22-rdf-syntax-ns#" .
-
+
trigger:foaf
a v:RDF_IRI_Trigger ;
trigger:pattern "^http://xmlns.com/foaf/0.1/.*" ;
@@ -123,8 +130,14 @@ trigger:nif
rdfs:label "nif vocab trigger" ;
rdfs:comment "http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#" .
+trigger:generic_iri
+ a v:RDF_IRI_Trigger ;
+ trigger:pattern "^https?://.*" ;
+ rdfs:label "Generic IRI" ;
+ rdfs:comment "Match IRIs" .
+
#########################
-# Reusable Validators, several Validators per TestCase
+# Reusable Validators, several Validators per TestCase
#########################
# todo check https://sourceforge.net/p/dbpedia/mailman/message/28982391/
@@ -134,19 +147,19 @@ validator:dissallowed_chars
rdfs:comment """Dissallowed in URIs, cf. https://www.ietf.org/rfc/rfc3987.txt: Systems accepting IRIs MAY also deal with the printable characters in US-ASCII that are not allowed in URIs, namely "<", ">", '"', space, "{", "}", "|", "\", "^", and "`", in step 2 above. If these characters are found but are not converted, then the conversion SHOULD fail. Please note that the number sign ("#"), the percent sign ("%"), and the square bracket characters ("[", "]") are not part of the above list and MUST NOT be converted. """ ;
v:doesNotContain "<" , ">", "\"" , " ", "{", "}", "|", "\\", "^" , "`" .
-validator:reserved_gen_delims
- a v:IRI_Validator ;
+validator:reserved_gen_delims
+ a v:IRI_Validator ;
rdfs:comment """reserved gen-delims from https://www.ietf.org/rfc/rfc3987.txt ":", "?", "#", "[", "]", "/", "@" """ ;
v:doesNotContain ":", "?", "#", "[", "]", "@" .
-
+
validator:reserved_sub_delims
- a v:IRI_Validator ;
+ a v:IRI_Validator ;
rdfs:comment """reserved sub-delims from https://www.ietf.org/rfc/rfc3987.txt "!", "$", "&", "'", "(", ")", "*", "+", ",", ";", "=" """ ;
v:doesNotContain "!", "$", "&", "'", "(", ")", "*", "+", ",", ";", "=" .
-validator:dbpedia_resource_delims
+validator:dbpedia_resource_delims
a v:IRI_Validator ;
- rdfs:comment """
+ rdfs:comment """
1. gen-delims are not allowed, except ":" and "@" per rfc3987
"ipchar = iunreserved / pct-encoded / sub-delims / ":" / "@" "
2. sub-delims are allowed:
@@ -156,19 +169,18 @@ validator:dbpedia_resource_delims
reserved gen-delims from above """ ;
v:doesNotContain "?", "#", "[", "]" ;
v:doesNotContain "%21", "%24", "%26", "%27", "%28", "%29", "%2A", "%2B", "%2C", "%3B", "%3D" .
-
-
-validator:dbpedia_ontology
+
+validator:dbpedia_ontology
a v:IRI_Validator ;
# todo get download URL of the ontology from the bus
- # v:oneOfVocab
+ # v:oneOfVocab
# todo use this for now
v:oneOfVocab .
-# no priority to implement this
+# no priority to implement this
validator:foaf
a v:IRI_Validator ;
- v:oneOfVocab .
+ v:oneOfVocab .
validator:w3_rdf
a v:IRI_Validator ;
@@ -180,8 +192,8 @@ validator:w3_rdfs
validator:wgs84
a v:IRI_Validator ;
- v:oneOfVocab .
-
+ v:oneOfVocab .
+
validator:georss
a v:IRI_Validator ;
v:oneOfVocab .
@@ -198,25 +210,25 @@ validator:itsrdf
# Specific instantiations below
#########################
-<#genericDBpediaWikipediaIRIs>
+<#genericDBpediaWikipediaIRIs>
a v:TestGenerator ;
v:trigger trigger:generic_wikipedia_dbpedia_extraction ;
v:validator validator:dissallowed_chars ;
v:validator validator:dbpedia_resource_delims .
-
-<#mappingsDBpediaWikipediaIRIs>
+
+<#mappingsDBpediaWikipediaIRIs>
a v:TestGenerator ;
v:trigger trigger:mappings_wikipedia_dbpedia_extraction ;
v:validator validator:dissallowed_chars ;
v:validator validator:dbpedia_resource_delims .
-
-<#dbpediaOntology>
+
+<#dbpediaOntology>
a v:TestGenerator ;
v:trigger trigger:dbpedia_ontology ;
v:validator validator:dissallowed_chars ;
v:validator validator:dbpedia_ontology .
- <#dbpediaGenericProperty>
+<#dbpediaGenericProperty>
a v:TestGenerator ;
v:trigger [
a v:RDF_IRI_Trigger ;
@@ -225,7 +237,7 @@ validator:itsrdf
] .
# TODO validator
-# no priority to implement this
+# no priority to implement this
<#foaf>
a v:TestGenerator ;
v:trigger trigger:foaf ;
@@ -260,11 +272,45 @@ validator:itsrdf
a v:TestGenerator ;
v:trigger trigger:itsrdf ;
v:validator validator:itsrdf .
-
-<#wikidata_IRIs>
+
+<#wikidata_IRIs>
a v:TestGenerator ;
v:trigger trigger:wikidata ;
v:validator [
a v:IRI_Validator ;
v:pattern "^http://www.wikidata.org/entity/Q[0-9]+$"
] .
+
+<#abstracts>
+ a v:TestGenerator ;
+ v:trigger trigger:abstract_property ;
+ v:validator [
+ a v:Datatype_Literal_Validator ;
+ v:validatorGroup v:rightValidator ;
+ v:doesNotContain "(;"
+ ] .
+
+<#forward_slash_in_resource_names>
+ a v:TestGenerator ;
+ v:trigger trigger:generic_wikipedia_dbpedia_extraction ;
+ v:validator [
+ a v:IRI_Validator ;
+ v:pattern "^((?!(http://(ga\\.|af\\.|als\\.|am\\.|an\\.|ar\\.|arz\\.|ast\\.|az\\.|azb\\.|ba\\.|bar\\.|batsmg\\.|be\\.|bg\\.|bn\\.|bpy\\.|br\\.|bs\\.|bug\\.|ca\\.|cdo\\.|ce\\.|ceb\\.|ckb\\.|cs\\.|cv\\.|cy\\.|da\\.|de\\.|el\\.|eml\\.|en\\.|eo\\.|es\\.|et\\.|eu\\.|fa\\.|fi\\.|fo\\.|fr\\.|fy\\.|gd\\.|gl\\.|gu\\.|he\\.|hi\\.|hr\\.|hsb\\.|ht\\.|hu\\.|hy\\.|ia\\.|id\\.|ilo\\.|io\\.|is\\.|it\\.|ja\\.|jv\\.|ka\\.|kk\\.|kn\\.|ko\\.|ku\\.|ky\\.|la\\.|lb\\.|li\\.|lmo\\.|lt\\.|lv\\.|mai\\.|mg\\.|mhr\\.|min\\.|mk\\.|ml\\.|mn\\.|mr\\.|mrj\\.|ms\\.|my\\.|mzn\\.|nan\\.|nap\\.|nds\\.|ne\\.|new\\.|nl\\.|nn\\.|no\\.|oc\\.|or\\.|os\\.|pa\\.|pl\\.|pms\\.|pnb\\.|pt\\.|qu\\.|ro\\.|ru\\.|sa\\.|sah\\.|scn\\.|sco\\.|sd\\.|sh\\.|si\\.|simple\\.|sk\\.|sl\\.|sq\\.|sr\\.|su\\.|sv\\.|sw\\.|ta\\.|te\\.|tg\\.|th\\.|tl\\.|tr\\.|tt\\.|uk\\.|ur\\.|uz\\.|vec\\.|vi\\.|vo\\.|wa\\.|war\\.|wuu\\.|xmf\\.|yi\\.|yo\\.|yue\\.|zh\\.)?dbpedia.org/resource/.*/.)).)*$"
+ ] .
+
+<#iri_slash_n>
+ a v:TestGenerator ;
+ v:trigger trigger:generic_iri ;
+ v:validator [
+ a v:IRI_Validator ;
+ v:doesNotContain "\\n"
+ ] .
+
+<#multiple_dbpedia_resources_in_IRI>
+ a v:TestGenerator ;
+ v:trigger trigger:generic_wikipedia_dbpedia_extraction ;
+ v:validator [
+ a v:IRI_Validator ;
+ v:pattern "^((?!(http://(.*\\.)?dbpedia.org/resource/http://(.*\\.)?dbpedia.org/resource/.*)).)*$" ;
+ ] .
+
diff --git a/dump/src/test/resources/ci-tests/xsd_ci-tests.ttl b/dump/src/test/resources/ci-tests/xsd_ci-tests.ttl
index fdf33addab..10420392f1 100644
--- a/dump/src/test/resources/ci-tests/xsd_ci-tests.ttl
+++ b/dump/src/test/resources/ci-tests/xsd_ci-tests.ttl
@@ -52,28 +52,28 @@
v:trigger [
a v:RDF_Literal_Trigger ;
rdfs:label "Datatype xsd:integer" ;
- trigger:datatype
+ trigger:datatype
] ;
v:validator [
a v:Datatype_Literal_Validator ;
rdfs:comment "taken from https://www.w3.org/TR/xmlschema11-2/#integer" ;
- v:pattern "^[\\-+]?[0-9]+$"
+ v:pattern "^[\\-+]?[0-9]+$"
] .
-
+
<#xsd_nonNegativeInteger>
a v:TestGenerator ;
v:trigger [
a v:RDF_Literal_Trigger ;
rdfs:label "Datatype xsd:nonNegativeInteger" ;
- trigger:datatype
+ trigger:datatype
] ;
v:validator [
a v:Datatype_Literal_Validator ;
rdfs:comment """
taken from https://www.w3.org/TR/xmlschema11-2/#nonNegativeInteger
- NOTE: removed '-' from official regex
+ NOTE: removed '-' from official regex
""" ;
- v:pattern "^[+]?[0-9]+$"
+ v:pattern "^[+]?[0-9]+$"
] .
<#xsd_float>
@@ -81,12 +81,12 @@
v:trigger [
a v:RDF_Literal_Trigger ;
rdfs:label "Datatype xsd:float" ;
- trigger:datatype
+ trigger:datatype
] ;
v:validator [
a v:Datatype_Literal_Validator ;
rdfs:comment """taken from https://www.w3.org/TR/xmlschema11-2/#float""" ;
# todo not sure about this one
- v:pattern "^(\\+|-)?([0-9]+(\\.[0-9]*)?|\\.[0-9]+)([Ee](\\+|-)?[0-9]+)?|(\\+|-)?INF|NaN$"
+ v:pattern "^(\\+|-)?([0-9]+(\\.[0-9]*)?|\\.[0-9]+)([Ee](\\+|-)?[0-9]+)?|(\\+|-)?INF|NaN$"
] .
diff --git a/dump/src/test/resources/cv-test-groups.csv b/dump/src/test/resources/cv-test-groups.csv
new file mode 100644
index 0000000000..93a10d1935
--- /dev/null
+++ b/dump/src/test/resources/cv-test-groups.csv
@@ -0,0 +1,18 @@
+TEST_NAME,ALL,PRODUCTIVE
+#wikipedia_IRIs,yes,yes
+#genericDBpediaWikipediaIRIs,yes,yes
+#mappingsDBpediaWikipediaIRIs,yes,yes
+#dbpediaOntology,yes,yes
+#dbpediaGenericProperty,yes,yes
+#foaf,yes,yes
+#w3_rdf,yes,yes
+#w3_rdfs,yes,yes
+#wgs84,yes,yes
+#georss,yes,yes
+#skos,yes,yes
+#itsrdf,yes,yes
+#wikidata_IRIs,yes,yes
+#abstracts,yes,yes
+#forward_slash_in_resource_names,yes,yes
+#iri_slash_n,yes,yes
+#multiple_dbpedia_resources_in_IRI,yes,yes
diff --git a/dump/src/test/resources/extraction-configs/extraction.nif.abstracts.properties b/dump/src/test/resources/extraction-configs/extraction.nif.abstracts.properties
index d1aa854466..1083e6db10 100644
--- a/dump/src/test/resources/extraction-configs/extraction.nif.abstracts.properties
+++ b/dump/src/test/resources/extraction-configs/extraction.nif.abstracts.properties
@@ -37,8 +37,8 @@ namespaces=Main
# extractor class names starting with "." are prefixed by "org.dbpedia.extraction.mappings"
-extractors=.NifExtractor
-
+extractors=.HtmlAbstractExtractor
+remove-broken-brackets-html-abstracts=true
# if ontology and mapping files are not given or do not exist, download info from mappings.dbpedia.org
# ontology=see universal.properties
# mappings=see universal.properties
diff --git a/dump/src/test/resources/extraction-configs/extraction.plain.abstracts.properties b/dump/src/test/resources/extraction-configs/extraction.plain.abstracts.properties
new file mode 100644
index 0000000000..d865cb4c45
--- /dev/null
+++ b/dump/src/test/resources/extraction-configs/extraction.plain.abstracts.properties
@@ -0,0 +1,96 @@
+# make sure to fill out the ../core/src/main/resources/universal.properties first and reinstall
+
+# Replace with your Wikipedia dump download directory (should not change over the course of a release)
+base-dir=./target/minidumptest/base
+log-dir=./target/minidumptest/log
+spark-local-dir=./target/minidumptest/spark-local
+spark-master=local[32]
+
+# The log file directory - used to store all log files created in the course of all extractions
+#
+#log-dir= see: ../core/src/main/resources/universal.properties
+
+# WikiPages failed to extract in the first try can be retried with this option (especially interesting when extraction from the mediawiki api)
+retry-failed-pages=false
+
+# Source file. If source file name ends with .gz or .bz2, it is unzipped on the fly.
+# Must exist in the directory xxwiki/yyyymmdd and have the prefix xxwiki-yyyymmdd-
+# where xx is the wiki code and yyyymmdd is the dump date.
+
+# default:
+# source=pages-articles.xml.bz2
+
+# alternatives:
+# source=pages-articles.xml.gz
+# source=pages-articles.xml
+
+# use only directories that contain a 'download-complete' file? Default is false.
+require-download-complete=false
+
+# List of languages or article count ranges, e.g. 'en,de,fr' or '10000-20000' or '10000-', or '@mappings'
+# NOTE sync with minidumps
+#languages=af,als,am,an,arz,ast,azb,ba,bar,bat-smg,bpy,br,bs,bug,cdo,ce,ceb,ckb,cv,fo,fy,gd,he,hsb,ht,ia,ilo,io,is,jv,ka,kn,ku,ky,la,lb,li,lmo,mai,mg,min,ml,mn,mr,mrj,ms,mt,my,mzn,nah,nap,nds,ne,new,nn,no,oc,or,os,pa,pms,pnb,qu,sa,sah,scn,sco,sh,si,simple,sq,su,sw,ta,te,tg,th,tl,tt,uz,vec,wa,xmf,yo,zh-min-nan,zh-yue
+languages=en
+# default namespaces: Main, File, Category, Template
+# we only want abstracts for articles -> only main namespace
+namespaces=Main
+
+# extractor class names starting with "." are prefixed by "org.dbpedia.extraction.mappings"
+
+extractors=.PlainAbstractExtractor
+remove-broken-brackets-plain-abstracts=true
+# if ontology and mapping files are not given or do not exist, download info from mappings.dbpedia.org
+# ontology=see universal.properties
+# mappings=see universal.properties
+
+# Serialization URI policies and file formats. Quick guide:
+# uri-policy keys: uri, generic, xml-safe, reject-long
+# uri-policy position modifiers: -subjects, -predicates, -objects, -datatypes, -contexts
+# uri-policy values: comma-separated languages or '*' for all languages
+# format values: n-triples, n-quads, turtle-triples, turtle-quads, trix-triples, trix-quads
+# See http://git.io/DBpedia-serialization-format-properties for details.
+
+# For backwards compatibility, en uses generic URIs. All others use local IRIs.
+# uri-policy.uri=uri:en; generic:en; xml-safe-predicates:*
+uri-policy.iri=generic:en; xml-safe-predicates:*
+
+# NT is unreadable anyway - might as well use URIs for en
+# format.nt.gz=n-triples;uri-policy.uri
+# format.nq.gz=n-quads;uri-policy.uri
+
+# Turtle is much more readable - use nice IRIs for all languages
+format.ttl.bz2=turtle-triples;uri-policy.iri
+#format.tql.bz2=turtle-quads;uri-policy.iri
+
+
+#the following parameters are for the mediawiki api connection used in nif and abstract extraction
+mwc-apiUrl=https://{{LANG}}.wikipedia.org/w/api.php
+mwc-maxRetries=5
+mwc-connectMs=4000
+mwc-readMs=30000
+mwc-sleepFactor=2000
+
+#parameters specific for the abstract extraction
+abstract-query=&format=xml&action=query&prop=extracts&exintro=&explaintext=&titles=%s
+# the tag path of the XML tags under which the result is expected
+abstract-tags=api,query,pages,page,extract
+# the properties used to specify long- and short abstracts (should not change)
+short-abstracts-property=rdfs:comment
+long-abstracts-property=abstract
+# the short abstract is at least this long
+short-abstract-min-length=200
+
+#parameters specific to the nif extraction
+
+#only extract abstract (not the whole page)
+nif-extract-abstract-only=false
+#the request query string
+nif-query=&format=xml&action=parse&prop=text&page=%s&pageid=%d
+#the xml path of the response
+nif-tags=api,parse,text
+# will leave out the long and short abstract datasets
+nif-isTestRun=false
+# will write all anchor texts for each nif instance
+nif-write-anchor=true
+# write only the anchor text for link instances
+nif-write-link-anchor=true
diff --git a/dump/src/test/resources/minidumps/af/wiki.xml.bz2 b/dump/src/test/resources/minidumps/af/wiki.xml.bz2
index f8be9d1f73..c2210f6fbb 100644
Binary files a/dump/src/test/resources/minidumps/af/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/af/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/als/wiki.xml.bz2 b/dump/src/test/resources/minidumps/als/wiki.xml.bz2
index fc17063d52..af28b1c72b 100644
Binary files a/dump/src/test/resources/minidumps/als/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/als/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/an/wiki.xml.bz2 b/dump/src/test/resources/minidumps/an/wiki.xml.bz2
index a464d30f82..412f25d294 100644
Binary files a/dump/src/test/resources/minidumps/an/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/an/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ar/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ar/wiki.xml.bz2
index 1018804782..0f36ad0b7a 100644
Binary files a/dump/src/test/resources/minidumps/ar/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ar/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/arz/wiki.xml.bz2 b/dump/src/test/resources/minidumps/arz/wiki.xml.bz2
index 8a2797bc54..00d7784288 100644
Binary files a/dump/src/test/resources/minidumps/arz/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/arz/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ast/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ast/wiki.xml.bz2
index fd5c170975..41439a2165 100644
Binary files a/dump/src/test/resources/minidumps/ast/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ast/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/az/wiki.xml.bz2 b/dump/src/test/resources/minidumps/az/wiki.xml.bz2
index f2d4cdbfbf..4a80e40698 100644
Binary files a/dump/src/test/resources/minidumps/az/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/az/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/azb/wiki.xml.bz2 b/dump/src/test/resources/minidumps/azb/wiki.xml.bz2
index ec2072e11f..d36addcaf9 100644
Binary files a/dump/src/test/resources/minidumps/azb/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/azb/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ba/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ba/wiki.xml.bz2
index 097a129145..c6514c1860 100644
Binary files a/dump/src/test/resources/minidumps/ba/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ba/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/bar/wiki.xml.bz2 b/dump/src/test/resources/minidumps/bar/wiki.xml.bz2
index 245e7e8dfa..c263237aaf 100644
Binary files a/dump/src/test/resources/minidumps/bar/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/bar/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/be/wiki.xml.bz2 b/dump/src/test/resources/minidumps/be/wiki.xml.bz2
index d3f7ac1d49..9cf15b2c51 100644
Binary files a/dump/src/test/resources/minidumps/be/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/be/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/bg/wiki.xml.bz2 b/dump/src/test/resources/minidumps/bg/wiki.xml.bz2
index a8c01de2ad..31ca9f9230 100644
Binary files a/dump/src/test/resources/minidumps/bg/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/bg/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/bn/wiki.xml.bz2 b/dump/src/test/resources/minidumps/bn/wiki.xml.bz2
index aaadcfbd3f..1569affa13 100644
Binary files a/dump/src/test/resources/minidumps/bn/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/bn/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/br/wiki.xml.bz2 b/dump/src/test/resources/minidumps/br/wiki.xml.bz2
index 82ed35b55c..5864439082 100644
Binary files a/dump/src/test/resources/minidumps/br/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/br/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/bs/wiki.xml.bz2 b/dump/src/test/resources/minidumps/bs/wiki.xml.bz2
index d9686bca64..71a04eceb4 100644
Binary files a/dump/src/test/resources/minidumps/bs/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/bs/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ca/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ca/wiki.xml.bz2
index a83a1641f2..74386d63d6 100644
Binary files a/dump/src/test/resources/minidumps/ca/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ca/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ceb/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ceb/wiki.xml.bz2
index c8ceab0620..24b79348b7 100644
Binary files a/dump/src/test/resources/minidumps/ceb/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ceb/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ckb/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ckb/wiki.xml.bz2
index b914d2e50c..ee2a5c4e8f 100644
Binary files a/dump/src/test/resources/minidumps/ckb/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ckb/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/commons/wiki.xml.bz2 b/dump/src/test/resources/minidumps/commons/wiki.xml.bz2
index 7ff407ba12..655b2c06f5 100644
Binary files a/dump/src/test/resources/minidumps/commons/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/commons/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/cs/wiki.xml.bz2 b/dump/src/test/resources/minidumps/cs/wiki.xml.bz2
index c9896e78a6..099cc9c17f 100644
Binary files a/dump/src/test/resources/minidumps/cs/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/cs/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/cy/wiki.xml.bz2 b/dump/src/test/resources/minidumps/cy/wiki.xml.bz2
index 997df121f8..291cea4bc0 100644
Binary files a/dump/src/test/resources/minidumps/cy/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/cy/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/da/wiki.xml.bz2 b/dump/src/test/resources/minidumps/da/wiki.xml.bz2
index b7ca438616..f2b342828e 100644
Binary files a/dump/src/test/resources/minidumps/da/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/da/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/de/wiki.xml.bz2 b/dump/src/test/resources/minidumps/de/wiki.xml.bz2
index 45038a502b..0fb01cc173 100644
Binary files a/dump/src/test/resources/minidumps/de/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/de/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/el/wiki.xml.bz2 b/dump/src/test/resources/minidumps/el/wiki.xml.bz2
index 81d1d0e76d..d81493b63c 100644
Binary files a/dump/src/test/resources/minidumps/el/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/el/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/en/wiki.xml.bz2 b/dump/src/test/resources/minidumps/en/wiki.xml.bz2
index 23fb578028..2df64a52ec 100644
Binary files a/dump/src/test/resources/minidumps/en/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/en/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/eo/wiki.xml.bz2 b/dump/src/test/resources/minidumps/eo/wiki.xml.bz2
index 6f05e8c2ce..9e52ac671f 100644
Binary files a/dump/src/test/resources/minidumps/eo/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/eo/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/es/wiki.xml.bz2 b/dump/src/test/resources/minidumps/es/wiki.xml.bz2
index 5779b405a3..dba1b51032 100644
Binary files a/dump/src/test/resources/minidumps/es/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/es/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/et/wiki.xml.bz2 b/dump/src/test/resources/minidumps/et/wiki.xml.bz2
index c98619579f..dd34dc3c38 100644
Binary files a/dump/src/test/resources/minidumps/et/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/et/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/eu/wiki.xml.bz2 b/dump/src/test/resources/minidumps/eu/wiki.xml.bz2
index 68c1aa0792..8443b9e8ed 100644
Binary files a/dump/src/test/resources/minidumps/eu/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/eu/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/fa/wiki.xml.bz2 b/dump/src/test/resources/minidumps/fa/wiki.xml.bz2
index 776775470f..baf58d65a1 100644
Binary files a/dump/src/test/resources/minidumps/fa/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/fa/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/fi/wiki.xml.bz2 b/dump/src/test/resources/minidumps/fi/wiki.xml.bz2
index 043dc9f963..e499706a79 100644
Binary files a/dump/src/test/resources/minidumps/fi/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/fi/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/fr/wiki.xml.bz2 b/dump/src/test/resources/minidumps/fr/wiki.xml.bz2
index 7d000a5475..ee4a9a9f92 100644
Binary files a/dump/src/test/resources/minidumps/fr/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/fr/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/fy/wiki.xml.bz2 b/dump/src/test/resources/minidumps/fy/wiki.xml.bz2
index 31d904c87b..75ee600c17 100644
Binary files a/dump/src/test/resources/minidumps/fy/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/fy/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ga/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ga/wiki.xml.bz2
index d33fd375c3..1e01683ec6 100644
Binary files a/dump/src/test/resources/minidumps/ga/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ga/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/gd/wiki.xml.bz2 b/dump/src/test/resources/minidumps/gd/wiki.xml.bz2
index c492587034..c2b2d8f6d3 100644
Binary files a/dump/src/test/resources/minidumps/gd/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/gd/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/gl/wiki.xml.bz2 b/dump/src/test/resources/minidumps/gl/wiki.xml.bz2
index 8b667cca31..80b9650bec 100644
Binary files a/dump/src/test/resources/minidumps/gl/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/gl/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/he/wiki.xml.bz2 b/dump/src/test/resources/minidumps/he/wiki.xml.bz2
index 9a55d6848b..d6ec6756cf 100644
Binary files a/dump/src/test/resources/minidumps/he/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/he/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/hr/wiki.xml.bz2 b/dump/src/test/resources/minidumps/hr/wiki.xml.bz2
index bac0e7d4bc..8dc24ede98 100644
Binary files a/dump/src/test/resources/minidumps/hr/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/hr/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/hu/wiki.xml.bz2 b/dump/src/test/resources/minidumps/hu/wiki.xml.bz2
index 3bd2f31ab6..27b35ff926 100644
Binary files a/dump/src/test/resources/minidumps/hu/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/hu/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/hy/wiki.xml.bz2 b/dump/src/test/resources/minidumps/hy/wiki.xml.bz2
index 954e142690..5d7bad3c51 100644
Binary files a/dump/src/test/resources/minidumps/hy/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/hy/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/id/wiki.xml.bz2 b/dump/src/test/resources/minidumps/id/wiki.xml.bz2
index 582ea227c3..383be771c7 100644
Binary files a/dump/src/test/resources/minidumps/id/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/id/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/is/wiki.xml.bz2 b/dump/src/test/resources/minidumps/is/wiki.xml.bz2
index 8b685ff944..d2561f009d 100644
Binary files a/dump/src/test/resources/minidumps/is/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/is/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/it/wiki.xml.bz2 b/dump/src/test/resources/minidumps/it/wiki.xml.bz2
index 10a57cdf57..97407badd2 100644
Binary files a/dump/src/test/resources/minidumps/it/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/it/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ja/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ja/wiki.xml.bz2
index 3970ff04c4..f89e432ea2 100644
Binary files a/dump/src/test/resources/minidumps/ja/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ja/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ka/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ka/wiki.xml.bz2
index 8a46c644f9..b067f74474 100644
Binary files a/dump/src/test/resources/minidumps/ka/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ka/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/kn/wiki.xml.bz2 b/dump/src/test/resources/minidumps/kn/wiki.xml.bz2
index d31cc11758..a9c772130b 100644
Binary files a/dump/src/test/resources/minidumps/kn/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/kn/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ko/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ko/wiki.xml.bz2
index f062da377c..74171ba89b 100644
Binary files a/dump/src/test/resources/minidumps/ko/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ko/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ku/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ku/wiki.xml.bz2
index f0cb5d8b6d..3d386a9566 100644
Binary files a/dump/src/test/resources/minidumps/ku/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ku/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ky/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ky/wiki.xml.bz2
index 5bd5d7f425..ae8f20805e 100644
Binary files a/dump/src/test/resources/minidumps/ky/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ky/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/la/wiki.xml.bz2 b/dump/src/test/resources/minidumps/la/wiki.xml.bz2
index 6112b83cc5..28a9fb0576 100644
Binary files a/dump/src/test/resources/minidumps/la/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/la/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/lb/wiki.xml.bz2 b/dump/src/test/resources/minidumps/lb/wiki.xml.bz2
index 9a32f8e3be..b0fe0e6381 100644
Binary files a/dump/src/test/resources/minidumps/lb/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/lb/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/lt/wiki.xml.bz2 b/dump/src/test/resources/minidumps/lt/wiki.xml.bz2
index 4200f706bc..c27b73a800 100644
Binary files a/dump/src/test/resources/minidumps/lt/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/lt/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/lv/wiki.xml.bz2 b/dump/src/test/resources/minidumps/lv/wiki.xml.bz2
index 9a700f67c5..d0d8cb9cf6 100644
Binary files a/dump/src/test/resources/minidumps/lv/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/lv/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/mk/wiki.xml.bz2 b/dump/src/test/resources/minidumps/mk/wiki.xml.bz2
index cca43ecbc1..520578f785 100644
Binary files a/dump/src/test/resources/minidumps/mk/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/mk/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ml/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ml/wiki.xml.bz2
index 02b98f116b..cb3500bc75 100644
Binary files a/dump/src/test/resources/minidumps/ml/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ml/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/mn/wiki.xml.bz2 b/dump/src/test/resources/minidumps/mn/wiki.xml.bz2
index 3fcd00538e..49593df6c0 100644
Binary files a/dump/src/test/resources/minidumps/mn/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/mn/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ms/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ms/wiki.xml.bz2
index ffe7dee6a1..279525b096 100644
Binary files a/dump/src/test/resources/minidumps/ms/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ms/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/nds/wiki.xml.bz2 b/dump/src/test/resources/minidumps/nds/wiki.xml.bz2
index 031e8f2efe..6aa3050151 100644
Binary files a/dump/src/test/resources/minidumps/nds/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/nds/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/nl/wiki.xml.bz2 b/dump/src/test/resources/minidumps/nl/wiki.xml.bz2
index 73d2c97d47..f4faf3a23d 100644
Binary files a/dump/src/test/resources/minidumps/nl/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/nl/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/nn/wiki.xml.bz2 b/dump/src/test/resources/minidumps/nn/wiki.xml.bz2
index c428058796..b3e6fb48e8 100644
Binary files a/dump/src/test/resources/minidumps/nn/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/nn/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/no/wiki.xml.bz2 b/dump/src/test/resources/minidumps/no/wiki.xml.bz2
index 82b4385ff4..a29c0d308f 100644
Binary files a/dump/src/test/resources/minidumps/no/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/no/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/os/wiki.xml.bz2 b/dump/src/test/resources/minidumps/os/wiki.xml.bz2
index a7423cfc94..68733214ff 100644
Binary files a/dump/src/test/resources/minidumps/os/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/os/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/pl/wiki.xml.bz2 b/dump/src/test/resources/minidumps/pl/wiki.xml.bz2
index 5ed2d08f5a..ad9aade1dc 100644
Binary files a/dump/src/test/resources/minidumps/pl/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/pl/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/pt/wiki.xml.bz2 b/dump/src/test/resources/minidumps/pt/wiki.xml.bz2
index 436684e631..40589c3209 100644
Binary files a/dump/src/test/resources/minidumps/pt/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/pt/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ro/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ro/wiki.xml.bz2
index ca8c04af60..c5358556f4 100644
Binary files a/dump/src/test/resources/minidumps/ro/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ro/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ru/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ru/wiki.xml.bz2
index 797ab5784d..69b34a8d49 100644
Binary files a/dump/src/test/resources/minidumps/ru/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ru/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/sco/wiki.xml.bz2 b/dump/src/test/resources/minidumps/sco/wiki.xml.bz2
index 7609abf55e..441462ade8 100644
Binary files a/dump/src/test/resources/minidumps/sco/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/sco/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/sh/wiki.xml.bz2 b/dump/src/test/resources/minidumps/sh/wiki.xml.bz2
index 36cd3bdef3..c2565bf550 100644
Binary files a/dump/src/test/resources/minidumps/sh/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/sh/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/si/wiki.xml.bz2 b/dump/src/test/resources/minidumps/si/wiki.xml.bz2
index e083b92d08..ddf29f01a8 100644
Binary files a/dump/src/test/resources/minidumps/si/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/si/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/simple/wiki.xml.bz2 b/dump/src/test/resources/minidumps/simple/wiki.xml.bz2
index 28afda3daf..0671d7e7dd 100644
Binary files a/dump/src/test/resources/minidumps/simple/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/simple/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/sl/wiki.xml.bz2 b/dump/src/test/resources/minidumps/sl/wiki.xml.bz2
index e00748fd6f..ca48e488b3 100644
Binary files a/dump/src/test/resources/minidumps/sl/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/sl/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/sq/wiki.xml.bz2 b/dump/src/test/resources/minidumps/sq/wiki.xml.bz2
index e48b043753..086125cdb7 100644
Binary files a/dump/src/test/resources/minidumps/sq/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/sq/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/sr/wiki.xml.bz2 b/dump/src/test/resources/minidumps/sr/wiki.xml.bz2
index 5069813f61..052049e1a6 100644
Binary files a/dump/src/test/resources/minidumps/sr/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/sr/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/sv/wiki.xml.bz2 b/dump/src/test/resources/minidumps/sv/wiki.xml.bz2
index 30ac9d8ca0..9bd1739ad0 100644
Binary files a/dump/src/test/resources/minidumps/sv/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/sv/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ta/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ta/wiki.xml.bz2
index 1d6a181755..3514f8aedc 100644
Binary files a/dump/src/test/resources/minidumps/ta/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ta/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/te/wiki.xml.bz2 b/dump/src/test/resources/minidumps/te/wiki.xml.bz2
index fc5376373f..f4ff0017f1 100644
Binary files a/dump/src/test/resources/minidumps/te/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/te/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/tg/wiki.xml.bz2 b/dump/src/test/resources/minidumps/tg/wiki.xml.bz2
index 0c196c42b9..f5041ede81 100644
Binary files a/dump/src/test/resources/minidumps/tg/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/tg/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/th/wiki.xml.bz2 b/dump/src/test/resources/minidumps/th/wiki.xml.bz2
index ed8d64786d..d9e23db165 100644
Binary files a/dump/src/test/resources/minidumps/th/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/th/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/tl/wiki.xml.bz2 b/dump/src/test/resources/minidumps/tl/wiki.xml.bz2
index 746a77fa3b..642c33a821 100644
Binary files a/dump/src/test/resources/minidumps/tl/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/tl/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/tr/wiki.xml.bz2 b/dump/src/test/resources/minidumps/tr/wiki.xml.bz2
index a1c4428b9f..a50ef76b30 100644
Binary files a/dump/src/test/resources/minidumps/tr/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/tr/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/tt/wiki.xml.bz2 b/dump/src/test/resources/minidumps/tt/wiki.xml.bz2
index fb9fe61a6d..394dcd5f0b 100644
Binary files a/dump/src/test/resources/minidumps/tt/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/tt/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/uk/wiki.xml.bz2 b/dump/src/test/resources/minidumps/uk/wiki.xml.bz2
index e7026dffe2..eb1e460b19 100644
Binary files a/dump/src/test/resources/minidumps/uk/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/uk/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/ur/wiki.xml.bz2 b/dump/src/test/resources/minidumps/ur/wiki.xml.bz2
index 7edc954634..43b1967934 100644
Binary files a/dump/src/test/resources/minidumps/ur/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/ur/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/vec/wiki.xml.bz2 b/dump/src/test/resources/minidumps/vec/wiki.xml.bz2
index 0de979bf9a..3194333a6a 100644
Binary files a/dump/src/test/resources/minidumps/vec/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/vec/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/vi/wiki.xml.bz2 b/dump/src/test/resources/minidumps/vi/wiki.xml.bz2
index 9c4d86bac9..ecfc264fe4 100644
Binary files a/dump/src/test/resources/minidumps/vi/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/vi/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/war/wiki.xml.bz2 b/dump/src/test/resources/minidumps/war/wiki.xml.bz2
index 36e83f9907..0563ebbf1f 100644
Binary files a/dump/src/test/resources/minidumps/war/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/war/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/minidumps/zh/wiki.xml.bz2 b/dump/src/test/resources/minidumps/zh/wiki.xml.bz2
index 02d52d2cd2..9c55bd2b25 100644
Binary files a/dump/src/test/resources/minidumps/zh/wiki.xml.bz2 and b/dump/src/test/resources/minidumps/zh/wiki.xml.bz2 differ
diff --git a/dump/src/test/resources/testGroups.csv b/dump/src/test/resources/shacl-test-groups.csv
similarity index 84%
rename from dump/src/test/resources/testGroups.csv
rename to dump/src/test/resources/shacl-test-groups.csv
index 6e9a261669..47f649dd89 100644
--- a/dump/src/test/resources/testGroups.csv
+++ b/dump/src/test/resources/shacl-test-groups.csv
@@ -14,5 +14,7 @@ TEST_NAME,ALL,PRODUCTIVE
#Citation_english_language_title_datatype_validation,yes,yes
#Citation_english_language_work_datatype_validation,yes,yes
#Citation_english_languagа_year_datatype_validation,yes,yes
-#en_property_isbn_citation,yes,yes
-#wgs84_lat_long,yes,yes
\ No newline at end of file
+#en_property_isbn_citation,yes,no
+#wgs84_lat_long,yes,yes
+#en_abstract_validation,yes,yes
+#Marian_Breland_Bailey,yes,yes
diff --git a/dump/src/test/resources/shacl-tests/instances/Marian_Breland_Bailey.ttl b/dump/src/test/resources/shacl-tests/instances/Marian_Breland_Bailey.ttl
new file mode 100644
index 0000000000..09c6c36b64
--- /dev/null
+++ b/dump/src/test/resources/shacl-tests/instances/Marian_Breland_Bailey.ttl
@@ -0,0 +1,21 @@
+@base .
+@prefix sh: .
+@prefix wgs84: .
+@prefix xsd: .
+@prefix dbr: .
+@prefix dbp: .
+@prefix dbo: .
+@prefix rdf: .
+@prefix rdfs: .
+@prefix prov: .
+@prefix foaf: .
+
+<#Marian_Breland_Bailey>
+ a sh:NodeShape ;
+ sh:targetNode dbr:Marian_Breland_Bailey__Keller_Breland__1 ;
+
+ sh:property [
+ sh:path foaf:name ;
+ sh:hasValue "Keller Breland"@en ;
+ ] .
+
diff --git a/dump/src/test/resources/shacl-tests/properties/dbp_abstract.ttl b/dump/src/test/resources/shacl-tests/properties/dbp_abstract.ttl
new file mode 100644
index 0000000000..30c1963b1d
--- /dev/null
+++ b/dump/src/test/resources/shacl-tests/properties/dbp_abstract.ttl
@@ -0,0 +1,19 @@
+@base .
+@prefix sh: .
+@prefix wgs84: .
+@prefix xsd: .
+@prefix dbr: .
+@prefix dbp: .
+@prefix dbo: .
+@prefix rdf: .
+@prefix rdfs: .
+@prefix prov: .
+
+<#en_abstract_validation>
+ a sh:NodeShape ;
+ sh:targetSubjectsOf ;
+ sh:message "Error, found (; in dbo:abstract. "@en ;
+ sh:property [
+ sh:path ;
+ sh:pattern "^((?!\\(\\;).)*$" ;
+ ] .
diff --git a/dump/src/test/resources/shaclTestsCoverageTable.md b/dump/src/test/resources/shaclTestsCoverageTable.md
index 3d974b2730..32f0730cfe 100644
--- a/dump/src/test/resources/shaclTestsCoverageTable.md
+++ b/dump/src/test/resources/shaclTestsCoverageTable.md
@@ -26,9 +26,10 @@ wikipage-uri|shacl-test|issue|comment
[http://cv.dbpedia.org/resource/Берлин](http://dief.tools.dbpedia.org/server/extraction/cv/extract?title=Берлин&revid=&format=trix&extractors=custom) |
[http://cy.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/cy/extract?title=Berlin&revid=&format=trix&extractors=custom) |
[http://da.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/da/extract?title=Berlin&revid=&format=trix&extractors=custom) |
-[http://de.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/de/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) |
-[http://de.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/de/extract?title=Berlin&revid=&format=trix&extractors=custom) |
+[http://de.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/de/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
+[http://de.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/de/extract?title=Berlin&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://el.dbpedia.org/resource/Βερολίνο](http://dief.tools.dbpedia.org/server/extraction/el/extract?title=Βερολίνο&revid=&format=trix&extractors=custom) |
+[http://en.dbpedia.org/resource/%3F_(film)](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=%3F_(film)&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/%3F_(film)](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=%3F_(film)&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/accessDate](http://dbpedia.org/property/accessDate) #Citation_english_languagа_accessDate_datatype_validation |
[http://en.dbpedia.org/resource/%3F_(film)](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=%3F_(film)&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/date](http://dbpedia.org/property/date) #Citation_english_language_date_datatype_validation |
[http://en.dbpedia.org/resource/%3F_(film)](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=%3F_(film)&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/isbn](http://dbpedia.org/property/isbn) #en_property_isbn_citation |
@@ -39,6 +40,7 @@ wikipage-uri|shacl-test|issue|comment
[http://en.dbpedia.org/resource/%3F_(film)](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=%3F_(film)&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/work](http://dbpedia.org/property/work) #Citation_english_language_work_datatype_validation |
[http://en.dbpedia.org/resource/%3F_(film)](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=%3F_(film)&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/year](http://dbpedia.org/property/year) #Citation_english_languagа_year_datatype_validation |
[http://en.dbpedia.org/resource/%60Abdu%27l-Bah%C3%A1](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=%60Abdu%27l-Bah%C3%A1&revid=&format=trix&extractors=custom) |
+[http://en.dbpedia.org/resource/Angela_Merkel](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Angela_Merkel&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/Angela_Merkel](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Angela_Merkel&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/accessDate](http://dbpedia.org/property/accessDate) #Citation_english_languagа_accessDate_datatype_validation |
[http://en.dbpedia.org/resource/Angela_Merkel](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Angela_Merkel&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/date](http://dbpedia.org/property/date) #Citation_english_language_date_datatype_validation |
[http://en.dbpedia.org/resource/Angela_Merkel](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Angela_Merkel&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/isbn](http://dbpedia.org/property/isbn) #en_property_isbn_citation |
@@ -49,6 +51,7 @@ wikipage-uri|shacl-test|issue|comment
[http://en.dbpedia.org/resource/Angela_Merkel](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Angela_Merkel&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/work](http://dbpedia.org/property/work) #Citation_english_language_work_datatype_validation |
[http://en.dbpedia.org/resource/Angela_Merkel](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Angela_Merkel&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/year](http://dbpedia.org/property/year) #Citation_english_languagа_year_datatype_validation |
[http://en.dbpedia.org/resource/Angela_Merkel](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Angela_Merkel&revid=&format=trix&extractors=custom) | [http://dbpedia.org/resource/Angela_Merkel](http://dbpedia.org/resource/Angela_Merkel) #Angela_Merkel |
+[http://en.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/accessDate](http://dbpedia.org/property/accessDate) #Citation_english_languagа_accessDate_datatype_validation |
[http://en.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/date](http://dbpedia.org/property/date) #Citation_english_language_date_datatype_validation |
[http://en.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/isbn](http://dbpedia.org/property/isbn) #en_property_isbn_citation |
@@ -58,6 +61,7 @@ wikipage-uri|shacl-test|issue|comment
[http://en.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/title](http://dbpedia.org/property/title) #Citation_english_language_title_datatype_validation |
[http://en.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/work](http://dbpedia.org/property/work) #Citation_english_language_work_datatype_validation |
[http://en.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/year](http://dbpedia.org/property/year) #Citation_english_languagа_year_datatype_validation |
+[http://en.dbpedia.org/resource/Asda](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Asda&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/Asda](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Asda&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/accessDate](http://dbpedia.org/property/accessDate) #Citation_english_languagа_accessDate_datatype_validation |
[http://en.dbpedia.org/resource/Asda](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Asda&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/date](http://dbpedia.org/property/date) #Citation_english_language_date_datatype_validation |
[http://en.dbpedia.org/resource/Asda](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Asda&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/last1](http://dbpedia.org/property/last1) #Citation_english_language_last1_datatype_validation |
@@ -65,6 +69,7 @@ wikipage-uri|shacl-test|issue|comment
[http://en.dbpedia.org/resource/Asda](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Asda&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/title](http://dbpedia.org/property/title) #Citation_english_language_title_datatype_validation |
[http://en.dbpedia.org/resource/Asda](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Asda&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/work](http://dbpedia.org/property/work) #Citation_english_language_work_datatype_validation |
[http://en.dbpedia.org/resource/Asda](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Asda&revid=&format=trix&extractors=custom) | [http://www.w3.org/2003/01/geo/wgs84_pos#long](http://www.w3.org/2003/01/geo/wgs84_pos#long) #wgs84_lat_long | | generic test for range of wgs84 lat/long |
+[http://en.dbpedia.org/resource/Atlantic_Ocean](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Atlantic_Ocean&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/Atlantic_Ocean](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Atlantic_Ocean&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/accessDate](http://dbpedia.org/property/accessDate) #Citation_english_languagа_accessDate_datatype_validation |
[http://en.dbpedia.org/resource/Atlantic_Ocean](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Atlantic_Ocean&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/date](http://dbpedia.org/property/date) #Citation_english_language_date_datatype_validation |
[http://en.dbpedia.org/resource/Atlantic_Ocean](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Atlantic_Ocean&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/isbn](http://dbpedia.org/property/isbn) #en_property_isbn_citation |
@@ -75,6 +80,7 @@ wikipage-uri|shacl-test|issue|comment
[http://en.dbpedia.org/resource/Atlantic_Ocean](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Atlantic_Ocean&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/work](http://dbpedia.org/property/work) #Citation_english_language_work_datatype_validation |
[http://en.dbpedia.org/resource/Atlantic_Ocean](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Atlantic_Ocean&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/year](http://dbpedia.org/property/year) #Citation_english_languagа_year_datatype_validation |
[http://en.dbpedia.org/resource/Atlantic_Ocean](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Atlantic_Ocean&revid=&format=trix&extractors=custom) | [http://www.w3.org/2003/01/geo/wgs84_pos#long](http://www.w3.org/2003/01/geo/wgs84_pos#long) #wgs84_lat_long | | generic test for range of wgs84 lat/long |
+[http://en.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Berlin&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Berlin&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/accessDate](http://dbpedia.org/property/accessDate) #Citation_english_languagа_accessDate_datatype_validation |
[http://en.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Berlin&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/date](http://dbpedia.org/property/date) #Citation_english_language_date_datatype_validation |
[http://en.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Berlin&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/isbn](http://dbpedia.org/property/isbn) #en_property_isbn_citation |
@@ -85,7 +91,8 @@ wikipage-uri|shacl-test|issue|comment
[http://en.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Berlin&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/work](http://dbpedia.org/property/work) #Citation_english_language_work_datatype_validation |
[http://en.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Berlin&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/year](http://dbpedia.org/property/year) #Citation_english_languagа_year_datatype_validation |
[http://en.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Berlin&revid=&format=trix&extractors=custom) | [http://www.w3.org/2003/01/geo/wgs84_pos#long](http://www.w3.org/2003/01/geo/wgs84_pos#long) #wgs84_lat_long | | generic test for range of wgs84 lat/long |
-[http://en.dbpedia.org/resource/Dahlak_SC](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Dahlak_SC&revid=&format=trix&extractors=custom) |
+[http://en.dbpedia.org/resource/Dahlak_SC](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Dahlak_SC&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
+[http://en.dbpedia.org/resource/Ferdinand_Piëch](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Ferdinand_Piëch&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/Ferdinand_Piëch](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Ferdinand_Piëch&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/accessDate](http://dbpedia.org/property/accessDate) #Citation_english_languagа_accessDate_datatype_validation |
[http://en.dbpedia.org/resource/Ferdinand_Piëch](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Ferdinand_Piëch&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/date](http://dbpedia.org/property/date) #Citation_english_language_date_datatype_validation |
[http://en.dbpedia.org/resource/Ferdinand_Piëch](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Ferdinand_Piëch&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/isbn](http://dbpedia.org/property/isbn) #en_property_isbn_citation |
@@ -94,6 +101,7 @@ wikipage-uri|shacl-test|issue|comment
[http://en.dbpedia.org/resource/Ferdinand_Piëch](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Ferdinand_Piëch&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/title](http://dbpedia.org/property/title) #Citation_english_language_title_datatype_validation |
[http://en.dbpedia.org/resource/Ferdinand_Piëch](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Ferdinand_Piëch&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/work](http://dbpedia.org/property/work) #Citation_english_language_work_datatype_validation |
[http://en.dbpedia.org/resource/Food_(disambiguation)](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Food_(disambiguation)&revid=&format=trix&extractors=custom) | [http://dbpedia.org/resource/Food_(disambiguation)](http://dbpedia.org/resource/Food_(disambiguation)) #Food_(disambiguation)_en |
+[http://en.dbpedia.org/resource/IBM](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=IBM&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/IBM](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=IBM&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/accessDate](http://dbpedia.org/property/accessDate) #Citation_english_languagа_accessDate_datatype_validation |
[http://en.dbpedia.org/resource/IBM](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=IBM&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/date](http://dbpedia.org/property/date) #Citation_english_language_date_datatype_validation |
[http://en.dbpedia.org/resource/IBM](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=IBM&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/isbn](http://dbpedia.org/property/isbn) #en_property_isbn_citation |
@@ -103,6 +111,7 @@ wikipage-uri|shacl-test|issue|comment
[http://en.dbpedia.org/resource/IBM](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=IBM&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/title](http://dbpedia.org/property/title) #Citation_english_language_title_datatype_validation |
[http://en.dbpedia.org/resource/IBM](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=IBM&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/work](http://dbpedia.org/property/work) #Citation_english_language_work_datatype_validation |
[http://en.dbpedia.org/resource/IBM](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=IBM&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/year](http://dbpedia.org/property/year) #Citation_english_languagа_year_datatype_validation |
+[http://en.dbpedia.org/resource/IKEA](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=IKEA&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/IKEA](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=IKEA&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/accessDate](http://dbpedia.org/property/accessDate) #Citation_english_languagа_accessDate_datatype_validation |
[http://en.dbpedia.org/resource/IKEA](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=IKEA&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/date](http://dbpedia.org/property/date) #Citation_english_language_date_datatype_validation |
[http://en.dbpedia.org/resource/IKEA](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=IKEA&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/isbn](http://dbpedia.org/property/isbn) #en_property_isbn_citation |
@@ -113,16 +122,21 @@ wikipage-uri|shacl-test|issue|comment
[http://en.dbpedia.org/resource/IKEA](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=IKEA&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/year](http://dbpedia.org/property/year) #Citation_english_languagа_year_datatype_validation |
[http://en.dbpedia.org/resource/IKEA](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=IKEA&revid=&format=trix&extractors=custom) | [http://dbpedia.org/resource/IKEA](http://dbpedia.org/resource/IKEA) #IKEA | [https://github.com/dbpedia/extraction-framework/issues/630](https://github.com/dbpedia/extraction-framework/issues/630) | no company type for some specific entities (e.g. IKEA; Samsung) |
[http://en.dbpedia.org/resource/IKEA](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=IKEA&revid=&format=trix&extractors=custom) | [http://www.w3.org/2003/01/geo/wgs84_pos#long](http://www.w3.org/2003/01/geo/wgs84_pos#long) #wgs84_lat_long | | generic test for range of wgs84 lat/long |
+[http://en.dbpedia.org/resource/Jim_Pewter](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Jim_Pewter&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/Jim_Pewter](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Jim_Pewter&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/title](http://dbpedia.org/property/title) #Citation_english_language_title_datatype_validation |
+[http://en.dbpedia.org/resource/Kerala_Agricultural_University](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Kerala_Agricultural_University&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/Kerala_Agricultural_University](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Kerala_Agricultural_University&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/accessDate](http://dbpedia.org/property/accessDate) #Citation_english_languagа_accessDate_datatype_validation |
[http://en.dbpedia.org/resource/Kerala_Agricultural_University](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Kerala_Agricultural_University&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/title](http://dbpedia.org/property/title) #Citation_english_language_title_datatype_validation |
[http://en.dbpedia.org/resource/Kerala_Agricultural_University](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Kerala_Agricultural_University&revid=&format=trix&extractors=custom) | [http://www.w3.org/2003/01/geo/wgs84_pos#long](http://www.w3.org/2003/01/geo/wgs84_pos#long) #wgs84_lat_long | | generic test for range of wgs84 lat/long |
-[http://en.dbpedia.org/resource/Mini_(Mark_I)](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Mini_(Mark_I)&revid=&format=trix&extractors=custom) |
-[http://en.dbpedia.org/resource/N.EX.T](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=N.EX.T&revid=&format=trix&extractors=custom) |
+[http://en.dbpedia.org/resource/Marian_Breland_Bailey](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Marian_Breland_Bailey&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
+[http://en.dbpedia.org/resource/Mini_(Mark_I)](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Mini_(Mark_I)&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
+[http://en.dbpedia.org/resource/N.EX.T](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=N.EX.T&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
+[http://en.dbpedia.org/resource/Ranma_½](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Ranma_½&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/Ranma_½](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Ranma_½&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/accessDate](http://dbpedia.org/property/accessDate) #Citation_english_languagа_accessDate_datatype_validation |
[http://en.dbpedia.org/resource/Ranma_½](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Ranma_½&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/date](http://dbpedia.org/property/date) #Citation_english_language_date_datatype_validation |
[http://en.dbpedia.org/resource/Ranma_½](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Ranma_½&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/last](http://dbpedia.org/property/last) #Citation_english_language_last_datatype_validation |
[http://en.dbpedia.org/resource/Ranma_½](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Ranma_½&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/title](http://dbpedia.org/property/title) #Citation_english_language_title_datatype_validation |
+[http://en.dbpedia.org/resource/Redd_Kross](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Redd_Kross&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/Redd_Kross](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Redd_Kross&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/accessDate](http://dbpedia.org/property/accessDate) #Citation_english_languagа_accessDate_datatype_validation |
[http://en.dbpedia.org/resource/Redd_Kross](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Redd_Kross&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/date](http://dbpedia.org/property/date) #Citation_english_language_date_datatype_validation |
[http://en.dbpedia.org/resource/Redd_Kross](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Redd_Kross&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/isbn](http://dbpedia.org/property/isbn) #en_property_isbn_citation |
@@ -131,17 +145,20 @@ wikipage-uri|shacl-test|issue|comment
[http://en.dbpedia.org/resource/Redd_Kross](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Redd_Kross&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/title](http://dbpedia.org/property/title) #Citation_english_language_title_datatype_validation |
[http://en.dbpedia.org/resource/Redd_Kross](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Redd_Kross&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/year](http://dbpedia.org/property/year) #Citation_english_languagа_year_datatype_validation |
[http://en.dbpedia.org/resource/Ren_%26_Stimpy_%22Adult_Party_Cartoon%22](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Ren_%26_Stimpy_%22Adult_Party_Cartoon%22&revid=&format=trix&extractors=custom) |
+[http://en.dbpedia.org/resource/Samsung](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Samsung&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/Samsung](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Samsung&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/accessDate](http://dbpedia.org/property/accessDate) #Citation_english_languagа_accessDate_datatype_validation |
[http://en.dbpedia.org/resource/Samsung](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Samsung&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/date](http://dbpedia.org/property/date) #Citation_english_language_date_datatype_validation |
+[http://en.dbpedia.org/resource/Samsung](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Samsung&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/isbn](http://dbpedia.org/property/isbn) #en_property_isbn_citation |
+[http://en.dbpedia.org/resource/Samsung](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Samsung&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/last1](http://dbpedia.org/property/last1) #Citation_english_language_last1_datatype_validation |
[http://en.dbpedia.org/resource/Samsung](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Samsung&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/last](http://dbpedia.org/property/last) #Citation_english_language_last_datatype_validation |
[http://en.dbpedia.org/resource/Samsung](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Samsung&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/title](http://dbpedia.org/property/title) #Citation_english_language_title_datatype_validation |
[http://en.dbpedia.org/resource/Samsung](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Samsung&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/work](http://dbpedia.org/property/work) #Citation_english_language_work_datatype_validation |
[http://en.dbpedia.org/resource/Samsung](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Samsung&revid=&format=trix&extractors=custom) | [http://dbpedia.org/resource/Samsung](http://dbpedia.org/resource/Samsung) #Samsung | [https://github.com/dbpedia/extraction-framework/issues/630](https://github.com/dbpedia/extraction-framework/issues/630) | no company type for some specific entities (e.g. IKEA; Samsung) |
[http://en.dbpedia.org/resource/The_Amazing_Spider-Man_(2012_film)](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=The_Amazing_Spider-Man_(2012_film)&revid=&format=trix&extractors=custom) |
[http://en.dbpedia.org/resource/The_Ren_%26_Stimpy_Show](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=The_Ren_%26_Stimpy_Show&revid=&format=trix&extractors=custom) |
+[http://en.dbpedia.org/resource/Vehicle_registration_plates_of_China](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Vehicle_registration_plates_of_China&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/Vehicle_registration_plates_of_China](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Vehicle_registration_plates_of_China&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/accessDate](http://dbpedia.org/property/accessDate) #Citation_english_languagа_accessDate_datatype_validation |
[http://en.dbpedia.org/resource/Vehicle_registration_plates_of_China](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Vehicle_registration_plates_of_China&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/date](http://dbpedia.org/property/date) #Citation_english_language_date_datatype_validation |
-[http://en.dbpedia.org/resource/Vehicle_registration_plates_of_China](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Vehicle_registration_plates_of_China&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/title](http://dbpedia.org/property/title) #Citation_english_language_title_datatype_validation |
[http://eo.dbpedia.org/resource/Berlino](http://dief.tools.dbpedia.org/server/extraction/eo/extract?title=Berlino&revid=&format=trix&extractors=custom) |
[http://es.dbpedia.org/resource/Berlín](http://dief.tools.dbpedia.org/server/extraction/es/extract?title=Berlín&revid=&format=trix&extractors=custom) |
[http://et.dbpedia.org/resource/Berliin](http://dief.tools.dbpedia.org/server/extraction/et/extract?title=Berliin&revid=&format=trix&extractors=custom) |
@@ -149,9 +166,10 @@ wikipage-uri|shacl-test|issue|comment
[http://fa.dbpedia.org/resource/برلین](http://dief.tools.dbpedia.org/server/extraction/fa/extract?title=برلین&revid=&format=trix&extractors=custom) |
[http://fi.dbpedia.org/resource/Berliini](http://dief.tools.dbpedia.org/server/extraction/fi/extract?title=Berliini&revid=&format=trix&extractors=custom) |
[http://fo.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/fo/extract?title=Berlin&revid=&format=trix&extractors=custom) |
-[http://fr.dbpedia.org/resource/Antoine_Meillet](http://dief.tools.dbpedia.org/server/extraction/fr/extract?title=Antoine_Meillet&revid=&format=trix&extractors=custom) |
+[http://fr.dbpedia.org/resource/Antoine_Meillet](http://dief.tools.dbpedia.org/server/extraction/fr/extract?title=Antoine_Meillet&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
+[http://fr.dbpedia.org/resource/Autriche](http://dief.tools.dbpedia.org/server/extraction/fr/extract?title=Autriche&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://fr.dbpedia.org/resource/Autriche](http://dief.tools.dbpedia.org/server/extraction/fr/extract?title=Autriche&revid=&format=trix&extractors=custom) | [http://www.w3.org/2003/01/geo/wgs84_pos#long](http://www.w3.org/2003/01/geo/wgs84_pos#long) #wgs84_lat_long | | generic test for range of wgs84 lat/long |
-[http://fr.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/fr/extract?title=Berlin&revid=&format=trix&extractors=custom) |
+[http://fr.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/fr/extract?title=Berlin&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://fy.dbpedia.org/resource/Berlyn](http://dief.tools.dbpedia.org/server/extraction/fy/extract?title=Berlyn&revid=&format=trix&extractors=custom) |
[http://ga.dbpedia.org/resource/Beirlín](http://dief.tools.dbpedia.org/server/extraction/ga/extract?title=Beirlín&revid=&format=trix&extractors=custom) |
[http://gd.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/gd/extract?title=Berlin&revid=&format=trix&extractors=custom) |
@@ -195,8 +213,8 @@ wikipage-uri|shacl-test|issue|comment
[http://nds.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/nds/extract?title=Berlin&revid=&format=trix&extractors=custom) |
[http://ne.dbpedia.org/resource/बर्लिन](http://dief.tools.dbpedia.org/server/extraction/ne/extract?title=बर्लिन&revid=&format=trix&extractors=custom) |
[http://new.dbpedia.org/resource/बर्लिन](http://dief.tools.dbpedia.org/server/extraction/new/extract?title=बर्लिन&revid=&format=trix&extractors=custom) |
-[http://nl.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/nl/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) |
-[http://nl.dbpedia.org/resource/Berlijn](http://dief.tools.dbpedia.org/server/extraction/nl/extract?title=Berlijn&revid=&format=trix&extractors=custom) |
+[http://nl.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/nl/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
+[http://nl.dbpedia.org/resource/Berlijn](http://dief.tools.dbpedia.org/server/extraction/nl/extract?title=Berlijn&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://nn.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/nn/extract?title=Berlin&revid=&format=trix&extractors=custom) |
[http://no.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/no/extract?title=Berlin&revid=&format=trix&extractors=custom) |
[http://oc.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/oc/extract?title=Berlin&revid=&format=trix&extractors=custom) |
@@ -208,7 +226,7 @@ wikipage-uri|shacl-test|issue|comment
[http://pnb.dbpedia.org/resource/برلن](http://dief.tools.dbpedia.org/server/extraction/pnb/extract?title=برلن&revid=&format=trix&extractors=custom) |
[http://pt.dbpedia.org/resource/Berlim](http://dief.tools.dbpedia.org/server/extraction/pt/extract?title=Berlim&revid=&format=trix&extractors=custom) |
[http://qu.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/qu/extract?title=Berlin&revid=&format=trix&extractors=custom) |
-[http://ro.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/ro/extract?title=Berlin&revid=&format=trix&extractors=custom) |
+[http://ro.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/ro/extract?title=Berlin&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://ru.dbpedia.org/resource/Берлин](http://dief.tools.dbpedia.org/server/extraction/ru/extract?title=Берлин&revid=&format=trix&extractors=custom) |
[http://sa.dbpedia.org/resource/बर्लिन](http://dief.tools.dbpedia.org/server/extraction/sa/extract?title=बर्लिन&revid=&format=trix&extractors=custom) |
[http://sah.dbpedia.org/resource/Берлин](http://dief.tools.dbpedia.org/server/extraction/sah/extract?title=Берлин&revid=&format=trix&extractors=custom) |
diff --git a/dump/src/test/scala/org/dbpedia/extraction/dump/ConstructValidationTest.scala b/dump/src/test/scala/org/dbpedia/extraction/dump/ConstructValidationTest.scala
index a637a4f255..0fa9b5e3f2 100644
--- a/dump/src/test/scala/org/dbpedia/extraction/dump/ConstructValidationTest.scala
+++ b/dump/src/test/scala/org/dbpedia/extraction/dump/ConstructValidationTest.scala
@@ -1,17 +1,24 @@
package org.dbpedia.extraction.dump
-import java.io.{File, FileInputStream, FileOutputStream}
+import org.apache.jena.graph
+import org.apache.jena.rdf.model
+import org.apache.jena.rdf.model.impl.StatementImpl
+import org.apache.jena.rdf.model.StmtIterator
-import org.apache.jena.rdf.model.ModelFactory
+import java.io.{File, FileInputStream, FileOutputStream}
+import org.apache.jena.rdf.model.{Alt, Bag, Literal, Model, ModelFactory, Property, RDFNode, RSIterator, ReifiedStatement, Resource, ResourceF, Statement, StmtIterator}
import org.apache.jena.riot.{RDFDataMgr, RDFLanguages}
import org.apache.spark.sql.SQLContext
-import org.dbpedia.extraction.dump.TestConfig.{XSDCITestFile, ciTestFile, ciTestModel, date, mappingsConfig, sparkSession}
+import org.dbpedia.extraction.dump.TestConfig.{XSDCITestFile, ciTestFile, ciTestModel, classLoader, date, mappingsConfig, sparkSession}
import org.dbpedia.extraction.dump.tags.ConstructValidationTestTag
import org.dbpedia.validation.construct.report.ReportWriter
import org.dbpedia.validation.construct.report.formats.ReportFormat
import org.dbpedia.validation.construct.tests.TestSuiteFactory
import org.dbpedia.validation.construct.tests.suites.NTripleTestSuite
import org.scalatest.{BeforeAndAfterAll, DoNotDiscover, FunSuite}
+import org.apache.jena.rdf.model.RDFNode
+
+import java.util.function.Consumer
@DoNotDiscover
class ConstructValidationTest extends FunSuite with BeforeAndAfterAll {
@@ -22,19 +29,30 @@ class ConstructValidationTest extends FunSuite with BeforeAndAfterAll {
new File("./target/testreports/").mkdirs()
}
- test("IRI Coverage Tests", ConstructValidationTestTag) {
-
+ test("IRI Coverage Tests. Productive group tests", ConstructValidationTestTag) {
val SQLContext: SQLContext = sparkSession.sqlContext
-
val testFiles = Array(ciTestFile, XSDCITestFile)
-
val testModel = ModelFactory.createDefaultModel()
testFiles.foreach(testFile => testModel.read(testFile))
+ val groupKeys = Utils.loadTestGroupsKeys(Utils.getGroup("cvTestGroup"), "cv-test-groups.csv", "no")
+ val selectValues = groupKeys.map(x => s" ").toSet
+ .mkString("\n")
+
+ val iterator = testModel.listStatements()
+ val testGeneratorURI = "http://dev.vocab.org/TestGenerator"
+
+ while (iterator.hasNext) {
+ val statement = iterator.nextStatement
+ val rdfSubject = statement.getSubject.asResource()
+ val rdfObject = statement.getObject
+ if (rdfSubject.isURIResource && selectValues.contains(rdfSubject.getURI) && rdfObject.toString.equals(testGeneratorURI)) {
+ iterator.remove()
+ }
+ }
val testSuite = TestSuiteFactory.create(testModel, TestSuiteFactory.TestSuiteType.NTriples).asInstanceOf[NTripleTestSuite]
val testScores = testSuite.test(s"${mappingsConfig.dumpDir.getAbsolutePath}/*/$date/*.ttl.bz2")(SQLContext)
-
new File("target/testreports/").mkdirs()
val htmlOS = new FileOutputStream(s"./target/testreports/minidump.html", false)
ReportWriter.write("DIEF Minidump NTriple Test Cases", testScores(0), testSuite, ReportFormat.HTML, htmlOS)
diff --git a/dump/src/test/scala/org/dbpedia/extraction/dump/ExtractionTest.scala b/dump/src/test/scala/org/dbpedia/extraction/dump/ExtractionTest.scala
index 440dcd3681..18c975d409 100644
--- a/dump/src/test/scala/org/dbpedia/extraction/dump/ExtractionTest.scala
+++ b/dump/src/test/scala/org/dbpedia/extraction/dump/ExtractionTest.scala
@@ -2,10 +2,9 @@ package org.dbpedia.extraction.dump
import java.io.File
import java.util.concurrent.ConcurrentLinkedQueue
-
import org.apache.commons.io.FileUtils
import org.dbpedia.extraction.config.Config
-import org.dbpedia.extraction.dump.TestConfig.{date, genericConfig, mappingsConfig, minidumpDir, nifAbstractConfig, sparkSession, wikidataConfig}
+import org.dbpedia.extraction.dump.TestConfig.{classLoader, date, genericConfig, mappingsConfig, minidumpDir, nifAbstractConfig, plainAbstractConfig, sparkSession, wikidataConfig}
import org.dbpedia.extraction.dump.extract.ConfigLoader
import org.dbpedia.extraction.dump.tags.ExtractionTestTag
import org.scalatest.{BeforeAndAfterAll, DoNotDiscover, FunSuite}
@@ -45,9 +44,13 @@ class ExtractionTest extends FunSuite with BeforeAndAfterAll {
extract(wikidataConfig, jobsRunning)
}
- test("extract nifAbstract datasets", ExtractionTestTag) {
- val jobsRunning = new ConcurrentLinkedQueue[Future[Unit]]()
- extract(nifAbstractConfig, jobsRunning)
+ test("extract abstract datasets", ExtractionTestTag) {
+ val jobsRunning1 = new ConcurrentLinkedQueue[Future[Unit]]()
+ extract(nifAbstractConfig, jobsRunning1)
+ Utils.renameAbstractsDatasetFiles("html")
+ val jobsRunning2 = new ConcurrentLinkedQueue[Future[Unit]]()
+ extract(plainAbstractConfig, jobsRunning2)
+ Utils.renameAbstractsDatasetFiles("plain")
}
def extractSpark(config: Config, jobsRunning: ConcurrentLinkedQueue[Future[Unit]]): Unit = {
diff --git a/dump/src/test/scala/org/dbpedia/extraction/dump/ShaclTest.scala b/dump/src/test/scala/org/dbpedia/extraction/dump/ShaclTest.scala
index d10f7b76b0..957852dffc 100644
--- a/dump/src/test/scala/org/dbpedia/extraction/dump/ShaclTest.scala
+++ b/dump/src/test/scala/org/dbpedia/extraction/dump/ShaclTest.scala
@@ -29,22 +29,8 @@ class ShaclTest extends FunSuite with BeforeAndAfterAll {
new File("./target/testreports/").mkdirs()
}
- def getGroup: String = {
- val resourceInputStream = Option(getClass.getClassLoader.getResourceAsStream("properties-from-pom.properties"))
- val properties = new Properties()
- resourceInputStream match {
- case Some(inputStream) => properties.load(inputStream)
- case None => return TestConfig.defaultTestGroup
- }
- val groupOption = Option(properties.getProperty("testGroup"))
- groupOption match {
- case Some(group) => group
- case None => TestConfig.defaultTestGroup
- }
- }
-
test("RDFUnit with SHACL", ShaclTestTag) {
- val (schema: SchemaSource, testSuite: TestSuite) = generateShaclTestSuiteFromMultipleFiles(getGroup)
+ val (schema: SchemaSource, testSuite: TestSuite) = generateShaclTestSuiteFromMultipleFiles(Utils.getGroup("shaclTestGroup"))
val shaclTestCaseResults =
validateMinidumpWithTestSuite(schema, testSuite, TestCaseExecutionType.shaclTestCaseResult, "./target/testreports/shacl-tests.html")
@@ -128,7 +114,7 @@ class ShaclTest extends FunSuite with BeforeAndAfterAll {
}
assert(custom_SHACL_tests.size() > 0, "size not 0")
- val groupKeys = loadTestGroupsKeys(testGroup, "testGroups.csv")
+ val groupKeys = Utils.loadTestGroupsKeys(testGroup, "shacl-test-groups.csv", "yes")
assert(groupKeys.nonEmpty)
val selectValues = groupKeys.map(x => s" ")
.mkString("\n")
@@ -163,30 +149,6 @@ class ShaclTest extends FunSuite with BeforeAndAfterAll {
(schema, testSuite)
}
- def loadTestGroupsKeys(group: String, path: String): Array[String] = {
- println(
- s"""##############
- | GROUP $group
- |##############""".stripMargin)
- val flag = "yes"
- val filePath = classLoader.getResource(path).getFile
- val file = scala.io.Source.fromFile(filePath)
-
- val table: Array[Array[String]] = file.getLines().map(_.split(",")).toArray
- val columnsNames: Array[String] = table.head
-
- if (!columnsNames.contains(group)) {
- Array[String]()
- }
- else {
- val indexOfGroup = columnsNames.indexOf(group)
- val groupsKeys: Array[String] = table.tail.flatMap(row =>
- if (row(indexOfGroup) == flag) Array[String](row(0))
- else Array[String]())
-
- groupsKeys
- }
- }
def recursiveListFiles(f: File): Array[File] = {
val these = f.listFiles
diff --git a/dump/src/test/scala/org/dbpedia/extraction/dump/TestConfig.scala b/dump/src/test/scala/org/dbpedia/extraction/dump/TestConfig.scala
index c95860b2fe..dc3f09e725 100644
--- a/dump/src/test/scala/org/dbpedia/extraction/dump/TestConfig.scala
+++ b/dump/src/test/scala/org/dbpedia/extraction/dump/TestConfig.scala
@@ -17,6 +17,7 @@ object TestConfig {
val mappingsConfig = new Config(classLoader.getResource("extraction-configs/mappings.extraction.minidump.properties").getFile)
val genericConfig = new Config(classLoader.getResource("extraction-configs/generic-spark.extraction.minidump.properties").getFile)
val nifAbstractConfig = new Config(classLoader.getResource("extraction-configs/extraction.nif.abstracts.properties").getFile)
+ val plainAbstractConfig = new Config(classLoader.getResource("extraction-configs/extraction.plain.abstracts.properties").getFile)
val wikidataConfig = new Config(classLoader.getResource("extraction-configs/wikidata.extraction.properties").getFile)
val minidumpDir = new File(classLoader.getResource("minidumps").getFile)
diff --git a/dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala b/dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala
new file mode 100644
index 0000000000..acd74bae5b
--- /dev/null
+++ b/dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala
@@ -0,0 +1,60 @@
+package org.dbpedia.extraction.dump
+
+import org.dbpedia.extraction.dump.TestConfig.{classLoader, date}
+
+import java.io.File
+import java.util.Properties
+
+object Utils {
+ def loadTestGroupsKeys(group: String, path: String, option: String = "yes"): Array[String] = {
+ println(
+ s"""##############
+ | GROUP $group
+ |##############""".stripMargin)
+
+ val filePath = classLoader.getResource(path).getFile
+ val file = scala.io.Source.fromFile(filePath)
+
+ val table: Array[Array[String]] = file.getLines().map(_.split(",")).toArray
+ val columnsNames: Array[String] = table.head
+
+ if (!columnsNames.contains(group)) {
+ Array[String]()
+ }
+ else {
+ val indexOfGroup = columnsNames.indexOf(group)
+ val groupsKeys: Array[String] = table.tail.flatMap(row =>
+ if (row(indexOfGroup) == option) Array[String](row(0))
+ else Array[String]())
+ groupsKeys
+ }
+ }
+
+ def getGroup(testName: String): String = {
+ val resourceInputStream = Option(getClass.getClassLoader.getResourceAsStream("properties-from-pom.properties"))
+ val properties = new Properties()
+ resourceInputStream match {
+ case Some(inputStream) => properties.load(inputStream)
+ case None => return TestConfig.defaultTestGroup
+ }
+ val groupOption = Option(properties.getProperty(testName))
+ groupOption match {
+ case Some(group) => group
+ case None => TestConfig.defaultTestGroup
+ }
+ }
+
+ def renameAbstractsDatasetFiles(datasetName: String): Unit = {
+ val minidumpDir = new File("./target/minidumptest/base")
+ minidumpDir.listFiles().foreach(f => {
+ val longAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts.ttl.bz2")
+ if (longAbstractsFile.exists()) {
+ longAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2"))
+ }
+ val shortAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts.ttl.bz2")
+ if (shortAbstractsFile.exists()) {
+ shortAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2"))
+ }
+ })
+ }
+}
diff --git a/live/live.default.xml b/live/live.default.xml
index 8f03513758..76cc8c79f6 100644
--- a/live/live.default.xml
+++ b/live/live.default.xml
@@ -294,7 +294,7 @@
-
+
diff --git a/server/server.default.properties b/server/server.default.properties
index ca9dbacb5e..7fd0741db4 100644
--- a/server/server.default.properties
+++ b/server/server.default.properties
@@ -56,7 +56,7 @@ mappingsTestExtractors=.LabelExtractor,.MappingExtractor
# Default extractors for all languages
extractors=.LabelExtractor,.PageIdExtractor,.RevisionIdExtractor,.WikiPageOutDegreeExtractor,.WikiPageLengthExtractor,\
-.MappingExtractor,.GeoExtractor,.AbstractExtractorWikipedia,.ArticlePageExtractor,\
+.MappingExtractor,.GeoExtractor,.HtmlAbstractExtractor,.ArticlePageExtractor,\
.ArticleCategoriesExtractor,.CategoryLabelExtractor,.SkosCategoriesExtractor,.ArticleTemplatesExtractor,\
.ExternalLinksExtractor,.InterLanguageLinksExtractor,.ProvenanceExtractor,\
.InfoboxExtractor
diff --git a/server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala b/server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala
index 7f547e000a..eef709eb28 100644
--- a/server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala
+++ b/server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala
@@ -5,121 +5,130 @@ import scala.collection.mutable
import org.dbpedia.extraction.mappings._
import org.dbpedia.extraction.util.StringUtils.prettyMillis
import org.dbpedia.extraction.wikiparser.{Namespace,TemplateNode}
+import org.dbpedia.extraction.wikiparser.impl.wikipedia.Namespaces
import MappingStats.InvalidTarget
object MappingStatsHolder {
-
private val logger = Logger.getLogger(getClass.getName)
def apply(wikiStats: WikipediaStats, mappings: Mappings, ignoreList: IgnoreList): MappingStatsHolder = {
-
- val language = wikiStats.language
-
- val millis = System.currentTimeMillis
- logger.info("Updating "+language.wikiCode+" mapped statistics")
-
- val templateMappings = mappings.templateMappings
-
- var statistics = new mutable.ArrayBuffer[MappingStats]()
-
- val templateNamespace = Namespace.Template.name(language) + ":"
-
- for ((rawTemplate, templateStats) <- wikiStats.templates)
- {
- if (rawTemplate startsWith templateNamespace) {
-
- val templateName = rawTemplate.substring(templateNamespace.length)
- val isMapped = templateMappings.contains(templateName)
- val mappedProps =
- if (isMapped) new PropertyCollector(templateMappings(templateName)).properties
- else Set.empty[String]
-
- var properties = new mutable.HashMap[String, (Int, Boolean)]
-
- for ((name, count) <- templateStats.properties) {
- properties(name) = (count, mappedProps.contains(name))
- }
-
- for (name <- mappedProps) {
- if (! properties.contains(name)) properties(name) = (InvalidTarget, true)
- }
-
- statistics += new MappingStats(templateStats, templateName, isMapped, properties.toMap, ignoreList)
-
- } else {
- logger.warning(language.wikiCode+" template '"+rawTemplate+"' does not start with '"+templateNamespace+"'")
+ val language = wikiStats.language
+
+ val millis = System.currentTimeMillis
+ logger.info("Updating " + language.wikiCode + " mapped statistics")
+
+ val templateMappings = mappings.templateMappings
+
+ var statistics = new mutable.ArrayBuffer[MappingStats]()
+
+ // Default template namespace name for the language
+ val templateNamespace = Namespace.Template.name(language) + ":"
+
+ // Build all valid namespace prefixes for Template namespace (code 10)
+ // Handles languages like Macedonian that expose multiple valid prefixes
+ val validTemplatePrefixes = Namespaces.names(language)
+ .filter(_._2 == 10) // Template namespace code is 10
+ .keys
+ .map(_ + ":")
+ .toSet + templateNamespace
+
+ for ((rawTemplate, templateStats) <- wikiStats.templates) {
+ val matchedPrefix = validTemplatePrefixes.find(rawTemplate.startsWith)
+
+ if (matchedPrefix.isDefined) {
+ val templateName = rawTemplate.substring(matchedPrefix.get.length)
+ val isMapped = templateMappings.contains(templateName)
+ val mappedProps =
+ if (isMapped) new PropertyCollector(templateMappings(templateName)).properties
+ else Set.empty[String]
+
+ var properties = new mutable.HashMap[String, (Int, Boolean)]
+
+ for ((name, count) <- templateStats.properties) {
+ properties(name) = (count, mappedProps.contains(name))
+ }
+
+ for (name <- mappedProps) {
+ if (!properties.contains(name)) properties(name) = (InvalidTarget, true)
}
+
+ statistics += new MappingStats(templateStats, templateName, isMapped, properties.toMap, ignoreList)
+ } else {
+ logger.warning(language.wikiCode + " template '" + rawTemplate + "' does not start with any valid template namespace prefix")
}
-
- val redirects = wikiStats.redirects.filterKeys(title => templateMappings.contains(title.substring(templateNamespace.length))).map(_.swap)
-
- val holder = new MappingStatsHolder(mappings, statistics.toList, redirects, ignoreList)
-
- logger.info("Updated "+language.wikiCode+" mapped statistics in "+prettyMillis(System.currentTimeMillis - millis))
-
- holder
+ }
+
+ val redirects = wikiStats.redirects
+ .filterKeys { title =>
+ val matchedPrefix = validTemplatePrefixes.find(title.startsWith)
+ matchedPrefix.isDefined && templateMappings.contains(title.substring(matchedPrefix.get.length))
+ }
+ .map(_.swap)
+
+ val holder = new MappingStatsHolder(mappings, statistics.toList, redirects, ignoreList)
+
+ logger.info("Updated " + language.wikiCode + " mapped statistics in " + prettyMillis(System.currentTimeMillis - millis))
+
+ holder
}
-
}
/**
* Contains statistics data computed from Wikipedia statistics numbers and template mappings.
- * Also holds on to the mappings to make synchronization in MappingStatsManager easier.
+ * Also holds on to the mappings to make synchronization in MappingStatsManager easier.
* TODO: better solution for mappings?
*/
class MappingStatsHolder(val mappings: Mappings, val mappedStatistics: List[MappingStats], val reversedRedirects: Map[String, String], ignoreList: IgnoreList) {
-
- private def countTemplates(all: Boolean, count: MappingStats => Int): Int = {
- var sum = 0
- for (ms <- mappedStatistics) {
- if (all || ms.isMapped) {
- if (! ignoreList.isTemplateIgnored(ms.templateName)) {
- sum += count(ms)
- }
+ private def countTemplates(all: Boolean, count: MappingStats => Int): Int = {
+ var sum = 0
+ for (ms <- mappedStatistics) {
+ if (all || ms.isMapped) {
+ if (!ignoreList.isTemplateIgnored(ms.templateName)) {
+ sum += count(ms)
}
}
- sum
}
+ sum
+ }
+
+ private def countAllTemplates(count: MappingStats => Int): Int = countTemplates(true, count)
+ private def countMappedTemplates(count: MappingStats => Int): Int = countTemplates(false, count)
+
+ val templateCount = countAllTemplates(_ => 1)
+ val mappedTemplateCount = countMappedTemplates(_ => 1)
+
+ val templateUseCount = countAllTemplates(_.templateCount)
+ val mappedTemplateUseCount = countMappedTemplates(_.templateCount)
+
+ val propertyCount = countAllTemplates(_.propertyCount)
+ val mappedPropertyCount = countMappedTemplates(_.mappedPropertyCount)
- private def countAllTemplates(count: MappingStats => Int): Int = countTemplates(true, count)
- private def countMappedTemplates(count: MappingStats => Int): Int = countTemplates(false, count)
-
- val templateCount = countAllTemplates(_ => 1)
- val mappedTemplateCount = countMappedTemplates(_ => 1)
-
- val templateUseCount = countAllTemplates(_.templateCount)
- val mappedTemplateUseCount = countMappedTemplates(_.templateCount)
-
- val propertyCount = countAllTemplates(_.propertyCount)
- val mappedPropertyCount = countMappedTemplates(_.mappedPropertyCount)
-
- val propertyUseCount = countAllTemplates(_.propertyUseCount)
- val mappedPropertyUseCount = countMappedTemplates(_.mappedPropertyUseCount)
-
- val mappedTemplateRatio = mappedTemplateCount.toDouble / templateCount.toDouble
- val mappedPropertyRatio = mappedPropertyCount.toDouble / propertyCount.toDouble
-
- val mappedTemplateUseRatio = mappedTemplateUseCount.toDouble / templateUseCount.toDouble
- val mappedPropertyUseRatio = mappedPropertyUseCount.toDouble / propertyUseCount.toDouble
+ val propertyUseCount = countAllTemplates(_.propertyUseCount)
+ val mappedPropertyUseCount = countMappedTemplates(_.mappedPropertyUseCount)
+
+ val mappedTemplateRatio = mappedTemplateCount.toDouble / templateCount.toDouble
+ val mappedPropertyRatio = mappedPropertyCount.toDouble / propertyCount.toDouble
+
+ val mappedTemplateUseRatio = mappedTemplateUseCount.toDouble / templateUseCount.toDouble
+ val mappedPropertyUseRatio = mappedPropertyUseCount.toDouble / propertyUseCount.toDouble
}
class PropertyCollector(mapping: Extractor[TemplateNode]) {
-
val properties = new mutable.HashSet[String]
-
+
classMapping(mapping) // go get'em!
-
- private def classMapping(mapping: Extractor[TemplateNode]) : Unit = mapping match {
+
+ private def classMapping(mapping: Extractor[TemplateNode]): Unit = mapping match {
case tm: TemplateMapping => tm.mappings.foreach(propertyMapping)
case cm: ConditionalMapping =>
cm.cases.foreach(conditionMapping)
cm.defaultMappings.foreach(propertyMapping)
}
-
- private def conditionMapping(mapping: ConditionMapping) : Unit =
+
+ private def conditionMapping(mapping: ConditionMapping): Unit =
classMapping(mapping.mapping)
-
- private def propertyMapping(mapping: PropertyMapping) : Unit = mapping match {
+
+ private def propertyMapping(mapping: PropertyMapping): Unit = mapping match {
case m: SimplePropertyMapping => this + m.templateProperty
case m: GeoCoordinatesMapping => this + m.coordinates + m.latitude + m.longitude + m.longitudeDegrees + m.longitudeMinutes + m.longitudeSeconds + m.longitudeDirection + m.latitudeDegrees + m.latitudeMinutes + m.latitudeSeconds + m.latitudeDirection
case m: CalculateMapping => this + m.templateProperty1 + m.templateProperty2
@@ -128,8 +137,8 @@ class PropertyCollector(mapping: Extractor[TemplateNode]) {
case m: IntermediateNodeMapping => m.mappings.foreach(propertyMapping)
case m: ConstantMapping => // ignore
}
-
- private def +(name: String) : PropertyCollector = {
+
+ private def +(name: String): PropertyCollector = {
if (name != null) properties.add(name)
this
}