Skip to content

Extremely slow parsing for a specific email (2 minutes slow, I can share EML privately) #389

@valeriansaliou

Description

@valeriansaliou

Hello Andris,

We've been happy users of mailparser for about 8 years now (time flies!). Running a large scale email ingestion production system on it.

2 days ago, we started experiencing event loops blockages that lasted from 2 to 6 minutes per micro-service running mailparser. Recurringly. We've managed to extract the recurring email that causes this issue, in EML format.

The email has an enormous text content, we're talking about 2MB encoded in this part:

Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=UTF-8;
 format=flowed

The email eventually gets parsed by the library, but it takes a tremendous time.

Could I suggest adding some configurable/opt-in safety limits on the text/HTML parts size that get parsed?

I unfortunately cannot enforce those limits based on the total size of the raw mail Buffer being passed to the library, before running the actual parsing, since most emails would be MB heavy due to attachments. I believe those limits will have to be enforced within the mailparser library.

Let me know if I should send you the original EML causing this hang issue over email for reproduction (I cannot share it here since it comes from one of our user, and thus is a private email).

Adding some more context, parsing of this EML has been tested to take:

  • 15s on 1x Apple M1 Pro core
  • 2 minutes on loaded 1x Intel Xeon core (circa 2019)

Valerian.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions