Simple IMAP-based tool to find duplicate emails between two Gmail accounts and label them.
Note: Works with Advanced Protection accounts! Uses app-specific passwords and IMAP.
✅ Works with Advanced Protection accounts
✅ Fast - only fetches email headers (Message-IDs)
✅ Safe - dry-run mode by default
✅ Simple - single Python file, no external dependencies
✅ Direct labeling via IMAP
- Connects to Account A via IMAP
- Fetches all Message-IDs (unique email identifiers) - headers only, not full emails
- Connects to Account B via IMAP
- Searches for emails with matching Message-IDs
- Applies "duplicate" label to matches in Account B
For both Gmail accounts:
- Go to https://myaccount.google.com/security
- Enable 2-Step Verification (if not already enabled)
- Scroll to "App passwords" section
- Generate password for "Mail" app
- Copy the 16-character password (ignore spaces)
Note: App passwords work even with Advanced Protection enabled!
cd gmailDedupe
cp .env.example .envEdit .env and add your credentials:
ACCOUNT_A_EMAIL=accounta@gmail.com
ACCOUNT_A_PASSWORD=abcdefghijklmnop
ACCOUNT_B_EMAIL=accountb@gmail.com
ACCOUNT_B_PASSWORD=qrstuvwxyzabcdefImportant: Never commit .env to git (it's in .gitignore)
In Gmail settings for both accounts:
- Settings → Forwarding and POP/IMAP
- Enable IMAP access
First, run in dry-run mode to see what would be labeled:
python dedupe.pyOutput:
============================================================
Gmail Deduplication via IMAP
============================================================
Mode: 🔍 DRY RUN
Max emails: All
Label: 'duplicate'
Step 1: Connect to Account A
Connecting to accounta@gmail.com...
✅ Connected successfully
Step 2: Fetch Message-IDs from Account A
Found 2,847 emails
Fetching Message-IDs (headers only)...
✅ Collected 2,847 unique Message-IDs
Step 3: Connect to Account B
Connecting to accountb@gmail.com...
✅ Connected successfully
Step 4: Find and label duplicates in Account B
Searching for 2,847 Message-IDs in Account B...
✅ Search complete: 234 duplicates found
📄 Report saved to: duplicates_report.txt
============================================================
Summary
============================================================
Account A emails: 2,847
Duplicates found in Account B: 234
⚠️ DRY RUN MODE - No labels were applied
📄 Detailed report saved to: duplicates_report.txt
To apply labels:
1. Review duplicates_report.txt (if generated)
2. Edit dedupe.py
3. Change: DRY_RUN = False
4. Run: python dedupe.py
Dry Run Report File
The dry run generates duplicates_report.txt with details of each duplicate:
Duplicate #1
Message-ID: <abc123@mail.example.com>
Subject: Your receipt from Example Store
From: noreply@example.com
Date: Mon, 15 Jan 2024 10:23:45 -0800
UID: 12345
Duplicate #2
Message-ID: <xyz789@newsletter.example.com>
Subject: Weekly Newsletter - Jan 2024
From: newsletter@example.com
Date: Tue, 16 Jan 2024 08:00:00 -0800
UID: 12346
...
After verifying the dry run results:
- Edit
dedupe.py - Change
DRY_RUN = TruetoDRY_RUN = False - Run again:
python dedupe.pyEdit these variables at the top of dedupe.py:
DRY_RUN = True # Set to False to apply labels
MAX_EMAILS = None # Limit emails processed (None = all)
DUPLICATE_LABEL = 'duplicate' # Label name to applyModify the get_message_ids() function to add a date filter:
# In get_message_ids() function, change search criteria:
_, message_numbers = imap.search(None, 'SINCE', '01-Jan-2024')Set MAX_EMAILS to process only a subset:
MAX_EMAILS = 1000 # Process first 1000 emailsDUPLICATE_LABEL = 'my-custom-label'- Make sure you're using app-specific passwords, not your regular Gmail password
- Verify 2-Step Verification is enabled
- Check that IMAP is enabled in Gmail settings
- Check your internet connection
- Try again (Gmail IMAP can be temporarily unavailable)
- Message-IDs must match exactly
- If emails were forwarded, they may have new Message-IDs
- Check that you're searching the right accounts
- Processing is linear (one Message-ID at a time for Account B)
- For 10k+ emails, expect 30-60 minutes
- Consider using
MAX_EMAILSto process in batches
- Uses Python's built-in
imaplibandemaillibraries - Fetches only headers (RFC822.HEADER), not full email bodies
- Searches using IMAP's
HEADER Message-IDcommand - Labels via Gmail's IMAP extension (
X-GM-LABELS) - No external dependencies required
- Only compares by Message-ID (standard email unique identifier)
- No fuzzy matching or content comparison
- Linear search (not optimized for 100k+ emails)
- Account B must support IMAP labeling (Gmail-specific)
- Credentials stored locally in
.env(not committed to git) - App passwords have limited scope (mail only)
- IMAP uses SSL/TLS encryption
- Read-only access to Account A
- Only adds labels to Account B (no deletion)
This project is licensed under CC BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0).
You are free to:
- Share and adapt the code for personal or non-commercial use
- Give appropriate credit
You may NOT:
- Use this code for commercial purposes
See LICENSE file for full terms.