Wrapper for docTR end-to-end text detection and recognition.
The wrapper takes a VideoDocument with TimeFrame annotations with label property (for example, from SWT app that classifies scenes).
See input section of the app metadata for more details.
From the docTR documentation
The docTR model returns a Document object
Here is the typical Document layout:
Document(
(pages): [Page(
dimensions=(340, 600)
(blocks): [Block(
(lines): [Line(
(words): [
Word(value='No.', confidence=0.91),
Word(value='RECEIPT', confidence=0.99),
Word(value='DATE', confidence=0.96),
]
)]
(artefacts): []
)]
)]
)
The docTR wrapper preserves this structured information in the output MMIF by creating
lapps Paragraph Sentence and Token annotations corresponding to the Block, Line, and Word from the docTR output.
General user instructions for CLAMS apps are available at CLAMS Apps documentation.
Below is a list of additional information specific to this app.
- Requires mmif-python[cv] for the
VideoDocumenthelper functions - Requires GPU to run at a reasonable speed
For the full list of parameters, please refer to the app metadata from the CLAMS App Directory or the metadata.py file in this repository.