Skip to content

ImageMagick and Tesseract failures #4

@Penquincoder

Description

@Penquincoder

Just set this up and ran into some bugs while trying to import PDF files:

System: CentOS 7 3.10.0-1062.9.1.el7.x86_64
Method: Download release goEDMS_0.1.8_Linux_x86_64.tar.gz
SESTATUS: permissive

Modified serverConfig.toml for correct paths to convert and tesseract

$ which tesseract
/bin/tesseract

$which convert
/bin/convert
[ingress]
    IngressPath = 'staging'

[ocr]
    TesseractBin = "/bin/tesseract"
    MagickBin = "/bin/convert"   

Copied existing PDFs to /opt/goEDMS/staging, and receive the following errors in goedms.log for ALL pdfs to ingest:

{"level":"info","time":"2020-02-02T23:02:09-06:00","message":"Converting PDF To image for OCR/opt/goEDMS/staging/bill.pdf"}
{"level":"info","time":"2020-02-02T23:02:09-06:00","message":"Creating temp image for OCR at: /opt/goEDMS/temp/bill.png"}
{"level":"error","time":"2020-02-02T23:02:09-06:00","message":"Unable to convert PDF Using Magick: /opt/goEDMS/staging/bill.pdfexit status 1"}
{"level":"error","time":"2020-02-02T23:02:09-06:00","message":"OCR Processing failed on file: /opt/goEDMS/staging/bill.pdf: exit status 1"}  

No documents appear in the web-gui.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions