Skip to content

Conversation

@ryangsteele
Copy link

@ryangsteele ryangsteele commented Aug 11, 2025

There are two changes in this MR:

1. Fix improper normalization for addresses with multi-segment TLDs whose domains use Fastmail MXs

The library currently assumes that if the LOCAL_PART_AS_HOSTNAME flag is set for a provider (as is the case with Fastmail), and the domain of the email address to be normalized is greater than 2 segments, that the last two segments are the domain. This logic works if the TLD is a single segment, like .com or .net, but fails with multi-segment TLDs like com.au and .co.uk.

For example, if the email address being normalized is foo+bar@baz.co.uk, and baz.co.uk's MX records indicate it's using Fastmail as a mail provider (e.g., in1-smtp.messagingengine.com.), then the email address is incorrectly normalized as baz@co.uk instead of foo@baz.co.uk.

To facilitate this change, I've made use of the tldextract library to correctly determine the TLD and avoid making assumptions based strictly upon the number of segments. Given that psl was recently archived, tldextract appeared to be the best available choice for this purpose.

2. Cache entries expired immediately if aiodns returned TTL of -1 for MX record lookup

aiodns was returning invalid (as per RFC 1035) TTL value of -1 for Gmail's MX records, causing cache entries to be considered immediately expired. This meant that on the second call to mx_records('gmail.com'), the cache entry was deleted and recreated instead of being reused, resetting the hit counter to 1. In such situations the tests would fail:

======================================================================
FAIL: test_cache (test_normalizer.NormalizerTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ryans/git/email-normalize/tests/test_normalizer.py", line 23, in wrapper
    args[0].loop.run_until_complete(func[0](*args, **kwargs))
  File "/Users/ryans/.pyenv/versions/3.9.16/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/Users/ryans/git/email-normalize/tests/test_normalizer.py", line 80, in test_cache
    self.assertEqual(email_normalize.cache['gmail.com'].hits, 2)
AssertionError: 1 != 2

The change I implemented here is to filter out invalid TTL values (< 0) and fall back to failure_ttl when no valid TTL values are available.

@gmr
Copy link
Owner

gmr commented Aug 11, 2025

Few thoughts:

  1. I'm not keen on adding another 3rd party dependency. aiodns is there because it's needed to do asyncio DNS resolution.
  2. I'm not sure what this is supposed to do or why it's a problem with this library

@ryangsteele
Copy link
Author

Apologies, this MR absolutely needed some more context to describe the problem and proposed solution, as well as docs and a version bump.

I've since added all of these things; mind taking another gander and letting me know if the proposed change seems well-reasoned and appropriate, @gmr ?

@ryangsteele ryangsteele force-pushed the fix-multisegment-tld-normalization-fastmail branch 2 times, most recently from 34ef4a4 to adff0cf Compare August 14, 2025 16:20
aiodns was returning TTL values of -1 for Gmail's MX records,
causing cache entries to be considered immediately expired. This
meant that on the second call to mx_records('gmail.com'), the
cache entry was deleted and recreated instead of being reused,
resetting the hit counter to 1. The fix filters out invalid TTL
values (<= 0) and falls back to failure_ttl when no valid TTL
values are available, ensuring cache entries have reasonable
expiration times.
@ryangsteele ryangsteele force-pushed the fix-multisegment-tld-normalization-fastmail branch from adff0cf to e276b12 Compare August 14, 2025 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants