Skip to content

Commit caa3eba

Browse files
authored
Merge pull request #43 from Gradiant/release/v1.1.2_hotfix
fix logging issues while ocring raster pdf files
2 parents 78c84c6 + 598facc commit caa3eba

File tree

2 files changed

+7
-2
lines changed

2 files changed

+7
-2
lines changed

CHANGELOG

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,14 @@
1+
1.1.2
2+
-----
3+
* Fix issue with logging while forcing OCR on PDF documents
4+
15
1.1.1
26
-----
37

48
* Update to tika 1.23
59
* Add dockerhub image and update documentation on its use: https://hub.docker.com/r/gradiant/faro
610
* Fix #32: logging duplicates
7-
* Fix #37 : fixing metadata when a list is extracted in some fields (dates and pages)
11+
* Fix #37 : fixing metadata when a list is extracted in some fields (dates and pages)
812

913
1.1.0
1014
-----

faro/io_parser.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ def parse_file(file_path):
7272
force_ocr = True
7373
else:
7474
filesize_chars_ratio = filesize / chars
75+
logger.debug("PDF filesize_chars_ratio: {:.2f}".format(filesize_chars_ratio))
7576
if filesize_chars_ratio > pdf_ocr_ratio:
7677
force_ocr = True
7778
logger.debug('size: {}, chars: {}, ratio: {}'.format(
@@ -80,8 +81,8 @@ def parse_file(file_path):
8081
filesize_chars_ratio))
8182

8283
if force_ocr:
84+
logger.info("performing OCR on PDF file: {}".format(file_path))
8385
parsed['metadata']['ocr_parsing'] = True
84-
logger.info("PDF filesize_chars_ratio: {:.2f}...performing OCR".format(filesize_chars_ratio))
8586
parsed_ocr_text = parser.from_file(
8687
file_path,
8788
service='text',

0 commit comments

Comments
 (0)