-
Notifications
You must be signed in to change notification settings - Fork 16
Feat/flex line dirs #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/flex line dirs #142
Conversation
Needs more testing, converting to draft for now. |
At the time of generation of the section, the {gt,ocr}_words generators were drained. Fix by using a list. Fixes gh-124.
9405df7
to
a70260c
Compare
I've added a check list above to go through the various CLIs and test them. Because this also adds support to specify a plain text encoding. I've also added this to the check list. |
|
Ha, |
Fixed in 14a4bc5. |
|
|
Manual test of ocrd-dinglehopper also correctly warns about autodetecting the plain text encoding + has the option to give an explicit encoding.
Don't see how to stick the information about the plain text encoding into the METS file - that could be an improvement over this. Maybe @bertsky has an idea? (I see comparing to txt GT as useful in some cases, e.g. when working with corpora where only the text is available but no PAGE/ALTO.) |
The help text of
|
|
I've added a test for plain text files with BOM. |
This adds more flexibility w.r.t. evaluating directories of line texts.
Test dinglehopper
Test dinglehopper-line-dirs
Test dinglehopper-extract
Test
dinglehopper-summarize
Test ocrd-dinglehopper
Update docs w.r.t this feature
dinglehopper-line-dirs --help
README.md
Review Unexpected UTF-8 problems #123