Is this a data problem or a problem of the module?
ich habe bei der Umwandlung von TCF (aus Weblicht) folgendes Problme:
wenn ich eine TCF-Datein nehme, deren Ausgangstext manuell in Weblicht eingegeben wurde, wird die Datei von snp verarbeitet. Wenn ich ein TCF nutze, dessen Ausgangstext in Weblich hochgeladen wurde, kommt folgender Fehler:
+----------------------------------- step 1 -----------------------------------+
|importer: TCFImporter |
|path: file:/C:/Users/Volker/saltnpep_Arbeit/snp%20arbeitsumgebung/ordner_in|
|corpus index: 0 |
|properties: |
| shrinkTokenAnnotations: true |
| useCommonAnnotatedElement:false |
| |
+----------------------------------- step 2 -----------------------------------+
|exporter: PAULAExporter |
|path: file:/C:/Users/Volker/saltnpep_Arbeit/snp%20arbeitsumgebung/salt_strafr|
|properties: |
| humanReadable: false |
| |
+------------------------------------------------------------------------------+
--------------------------- pepper job status ---------------------------
id: '95sdf2fn
active documents: 0 of 4
status: IN_PROGRESS
total progress: 0%
processing time: 0:00:00.328
salt:/0/ordner_in/result-1436304121111.tcf(NOT_STARTED/sleep) 0%
Bad tokenization: Full text not matching token text! Error around token t2084
Cannot map 'de.hu_berlin.german.korpling.saltnpepper.salt.saltCore.impl.SElementIdImpl@52cd0ba (namespace: graph, name: i
d, value: salt:/ordner_in/result-1436304121111.tcf)' with module 'TCFImporter', because of a mapping result was 'FAILED'.
An exception was thrown by the mapper threads 'Thread[TCFImporter_mapper(salt:/ordner_in/result-1436304121111.tcf),5,TCFI
mporter_mapperGroup]'.
de.hu_berlin.german.korpling.saltnpepper.pepper.modules.exceptions.PepperModuleDataException: A data error occured for a
Pepper mapper 'de.hu_berlin.german.korpling.saltnpepper.pepperModules.tcfModules.TCFMapperImport'. Bad tokenization: Full
text not matching token text!
at de.hu_berlin.german.korpling.saltnpepper.pepperModules.tcfModules.TCFMapperImport$TCFReader.endElement(TCFMapp
erImport.java:734)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at de.hu_berlin.german.korpling.saltnpepper.pepper.modules.impl.PepperMapperImpl.readXMLResource(PepperMapperImpl
.java:295)
at de.hu_berlin.german.korpling.saltnpepper.pepperModules.tcfModules.TCFMapperImport.mapSDocument(TCFMapperImport
.java:94)
at de.hu_berlin.german.korpling.saltnpepper.pepper.modules.impl.PepperMapperControllerImpl.map(PepperMapperContro
llerImpl.java:240)
at de.hu_berlin.german.korpling.saltnpepper.pepper.modules.impl.PepperMapperControllerImpl.run(PepperMapperContro
llerImpl.java:185)
conversion ended successfully, required time: 0:00:01.078 s
Die TCF-Datei an sich ist eigentlich in Ordnung - zumindest läasst sie sich in Webanno öffnen.
Is this a data problem or a problem of the module?
ich habe bei der Umwandlung von TCF (aus Weblicht) folgendes Problme:
wenn ich eine TCF-Datein nehme, deren Ausgangstext manuell in Weblicht eingegeben wurde, wird die Datei von snp verarbeitet. Wenn ich ein TCF nutze, dessen Ausgangstext in Weblich hochgeladen wurde, kommt folgender Fehler:
+----------------------------------- step 1 -----------------------------------+
|importer: TCFImporter |
|path: file:/C:/Users/Volker/saltnpep_Arbeit/snp%20arbeitsumgebung/ordner_in|
|corpus index: 0 |
|properties: |
| shrinkTokenAnnotations: true |
| useCommonAnnotatedElement:false |
| |
+----------------------------------- step 2 -----------------------------------+
|exporter: PAULAExporter |
|path: file:/C:/Users/Volker/saltnpep_Arbeit/snp%20arbeitsumgebung/salt_strafr|
|properties: |
| humanReadable: false |
| |
+------------------------------------------------------------------------------+
--------------------------- pepper job status ---------------------------
id: '95sdf2fn
active documents: 0 of 4
status: IN_PROGRESS
total progress: 0%
processing time: 0:00:00.328
salt:/0/ordner_in/result-1436304121111.tcf(NOT_STARTED/sleep) 0%
Bad tokenization: Full text not matching token text! Error around token t2084
Cannot map 'de.hu_berlin.german.korpling.saltnpepper.salt.saltCore.impl.SElementIdImpl@52cd0ba (namespace: graph, name: i
d, value: salt:/ordner_in/result-1436304121111.tcf)' with module 'TCFImporter', because of a mapping result was 'FAILED'.
An exception was thrown by the mapper threads 'Thread[TCFImporter_mapper(salt:/ordner_in/result-1436304121111.tcf),5,TCFI
mporter_mapperGroup]'.
de.hu_berlin.german.korpling.saltnpepper.pepper.modules.exceptions.PepperModuleDataException: A data error occured for a
Pepper mapper 'de.hu_berlin.german.korpling.saltnpepper.pepperModules.tcfModules.TCFMapperImport'. Bad tokenization: Full
text not matching token text!
at de.hu_berlin.german.korpling.saltnpepper.pepperModules.tcfModules.TCFMapperImport$TCFReader.endElement(TCFMapp
erImport.java:734)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at de.hu_berlin.german.korpling.saltnpepper.pepper.modules.impl.PepperMapperImpl.readXMLResource(PepperMapperImpl
.java:295)
at de.hu_berlin.german.korpling.saltnpepper.pepperModules.tcfModules.TCFMapperImport.mapSDocument(TCFMapperImport
.java:94)
at de.hu_berlin.german.korpling.saltnpepper.pepper.modules.impl.PepperMapperControllerImpl.map(PepperMapperContro
llerImpl.java:240)
at de.hu_berlin.german.korpling.saltnpepper.pepper.modules.impl.PepperMapperControllerImpl.run(PepperMapperContro
llerImpl.java:185)
conversion ended successfully, required time: 0:00:01.078 s
Die TCF-Datei an sich ist eigentlich in Ordnung - zumindest läasst sie sich in Webanno öffnen.