Skip to content

problem with WebLicht generated TCFFile #26

@FlorianZipser

Description

@FlorianZipser

Is this a data problem or a problem of the module?

ich habe bei der Umwandlung von TCF (aus Weblicht) folgendes Problme:

wenn ich eine TCF-Datein nehme, deren Ausgangstext manuell in Weblicht eingegeben wurde, wird die Datei von snp verarbeitet. Wenn ich ein TCF nutze, dessen Ausgangstext in Weblich hochgeladen wurde, kommt folgender Fehler:

+----------------------------------- step 1 -----------------------------------+
|importer: TCFImporter |
|path: file:/C:/Users/Volker/saltnpep_Arbeit/snp%20arbeitsumgebung/ordner_in|
|corpus index: 0 |
|properties: |
| shrinkTokenAnnotations: true |
| useCommonAnnotatedElement:false |
| |
+----------------------------------- step 2 -----------------------------------+
|exporter: PAULAExporter |
|path: file:/C:/Users/Volker/saltnpep_Arbeit/snp%20arbeitsumgebung/salt_strafr|
|properties: |
| humanReadable: false |
| |
+------------------------------------------------------------------------------+

--------------------------- pepper job status ---------------------------
id: '95sdf2fn
active documents: 0 of 4
status: IN_PROGRESS
total progress: 0%
processing time: 0:00:00.328

salt:/0/ordner_in/result-1436304121111.tcf(NOT_STARTED/sleep) 0%

Bad tokenization: Full text not matching token text! Error around token t2084
Cannot map 'de.hu_berlin.german.korpling.saltnpepper.salt.saltCore.impl.SElementIdImpl@52cd0ba (namespace: graph, name: i
d, value: salt:/ordner_in/result-1436304121111.tcf)' with module 'TCFImporter', because of a mapping result was 'FAILED'.

An exception was thrown by the mapper threads 'Thread[TCFImporter_mapper(salt:/ordner_in/result-1436304121111.tcf),5,TCFI
mporter_mapperGroup]'.
de.hu_berlin.german.korpling.saltnpepper.pepper.modules.exceptions.PepperModuleDataException: A data error occured for a
Pepper mapper 'de.hu_berlin.german.korpling.saltnpepper.pepperModules.tcfModules.TCFMapperImport'. Bad tokenization: Full
text not matching token text!
at de.hu_berlin.german.korpling.saltnpepper.pepperModules.tcfModules.TCFMapperImport$TCFReader.endElement(TCFMapp
erImport.java:734)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at de.hu_berlin.german.korpling.saltnpepper.pepper.modules.impl.PepperMapperImpl.readXMLResource(PepperMapperImpl
.java:295)
at de.hu_berlin.german.korpling.saltnpepper.pepperModules.tcfModules.TCFMapperImport.mapSDocument(TCFMapperImport
.java:94)
at de.hu_berlin.german.korpling.saltnpepper.pepper.modules.impl.PepperMapperControllerImpl.map(PepperMapperContro
llerImpl.java:240)
at de.hu_berlin.german.korpling.saltnpepper.pepper.modules.impl.PepperMapperControllerImpl.run(PepperMapperContro
llerImpl.java:185)
conversion ended successfully, required time: 0:00:01.078 s


Die TCF-Datei an sich ist eigentlich in Ordnung - zumindest läasst sie sich in Webanno öffnen.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions