File tree Expand file tree Collapse file tree 2 files changed +7
-2
lines changed
Expand file tree Collapse file tree 2 files changed +7
-2
lines changed Original file line number Diff line number Diff line change 1+ 1.1.2
2+ -----
3+ * Fix issue with logging while forcing OCR on PDF documents
4+
151.1.1
26-----
37
48* Update to tika 1.23
59* Add dockerhub image and update documentation on its use: https://hub.docker.com/r/gradiant/faro
610* Fix #32: logging duplicates
7- * Fix #37 : fixing metadata when a list is extracted in some fields (dates and pages)
11+ * Fix #37 : fixing metadata when a list is extracted in some fields (dates and pages)
812
9131.1.0
1014-----
Original file line number Diff line number Diff line change @@ -72,6 +72,7 @@ def parse_file(file_path):
7272 force_ocr = True
7373 else :
7474 filesize_chars_ratio = filesize / chars
75+ logger .debug ("PDF filesize_chars_ratio: {:.2f}" .format (filesize_chars_ratio ))
7576 if filesize_chars_ratio > pdf_ocr_ratio :
7677 force_ocr = True
7778 logger .debug ('size: {}, chars: {}, ratio: {}' .format (
@@ -80,8 +81,8 @@ def parse_file(file_path):
8081 filesize_chars_ratio ))
8182
8283 if force_ocr :
84+ logger .info ("performing OCR on PDF file: {}" .format (file_path ))
8385 parsed ['metadata' ]['ocr_parsing' ] = True
84- logger .info ("PDF filesize_chars_ratio: {:.2f}...performing OCR" .format (filesize_chars_ratio ))
8586 parsed_ocr_text = parser .from_file (
8687 file_path ,
8788 service = 'text' ,
You can’t perform that action at this time.
0 commit comments