Decrease reliance on non-Python APIs

This could be streamlined somewhat by using something like [tesserocr](https://github.com/sirfz/tesserocr) or [pyocr](https://gitlab.gnome.org/World/OpenPaperwork/pyocr) instead of using shell scripts.

Additionally, it would be great if there were a way to extract entities from a PDF without needing to run `preprocess.sh` to convert each page to an image and run tesseract on it.

Ghostscript - https://stackoverflow.com/a/36113000/1956065

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decrease reliance on non-Python APIs #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Decrease reliance on non-Python APIs #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions