Skip to content

OCR issue: skipping text? #18

@hzadeh17

Description

@hzadeh17

So I was trying to look further into Bcc's and I found that there are indeed quite a few emails in the drive (particularly in deqs 1, 2, and 4) where Bcc: is included in the email header. But, in many instances this is not reflected in the OCR text--but perhaps more importantly, in this format of email output where Bcc's do show up, entire bodies of email are skipped over by OCR.

For example: this text file has the headers but not the bodies of the emails included in deq01_Part316 in the drive. Same with others like it, like deq01_Part385 and this file.

(Note that as far as Bcc: goes, it does sometimes show up, as in this text file.)

maybe this is not something we can fix, but still probably good to know. I wonder if there is a way to check how much this is happening? The body text that is being skipped is light blue, so maybe that is why...but that doesn't explain why some of the header text that was black (i.e. Bcc:) is also being skipped.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions