-
-
Notifications
You must be signed in to change notification settings - Fork 7
Description
When the test set predictions are written to the predictions scores file (test.trg-predictions.detok.txt..scores.tsv), it does not properly format the predictions string and the reference string(s). If any of these strings contain an unmatched quotation mark, the TSV file will not load properly in tools (e.g., Google Sheets) that parse the TSV file. The unmatched quotation mark in the string will cause these tools to search for a matching quotation mark, bypassing the tab separator, and cause multiple fields and/or lines of the TSV file to be considered as part of the same string.
For instance, the Prediction string in this line of a prediction file has an unterminated double-quote:
76 14.54 42.86 16.67 10.00 6.25 1.000 39.15 41.80 39.55 25.35 0.55227655 "इमिगु तुति हिया स्यायेत याकनं च्वनी। मनूतय्त स्यायेत इमिगु तुति न्ह्यज्याः।
Importing this predictions file into Google Sheets will result in the Prediction string, the Reference string, and the following two lines of additional predictions to be combined as the value for the Prediction string. The unmatched quotation mark needs to be escaped when it is written to the predictions scores file.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status