Skip to content

kusl/MyImapDownloader

Repository files navigation

MyImapDownloader

Build and Test .NET 10 License: AGPL v3

A high-performance, cross-platform command-line tool for archiving emails from IMAP servers. Built with .NET 10, featuring SQLite-backed indexing, intelligent delta syncing, and robust resilience patterns.


Notice: This project contains code generated by Large Language Models such as Claude and Gemini. All code is experimental whether explicitly stated or not.


Table of Contents

Key Features

Feature Description
Delta Sync Uses IMAP UIDs and SQLite indexing to fetch only new messages since the last run
Read-Only Operations Opens IMAP folders in FolderAccess.ReadOnly mode—never modifies or deletes server data
Robust Deduplication Message-ID based deduplication with O(1) SQLite lookups before any network fetch
Self-Healing Index Automatically detects database corruption and rebuilds from .meta.json sidecar files
Resilience Patterns Exponential backoff (up to 5 minutes) and circuit breaker via Polly
OpenTelemetry Native Distributed tracing, metrics, and structured logging exported to JSONL files
Cross-Platform Runs natively on Windows, Linux, and macOS

Safety Guarantees

This application never deletes emails. The codebase is designed purely for archival and backup:

  • IMAP folders are opened in read-only mode (FolderAccess.ReadOnly)
  • No delete, move, or flag-modification commands exist in the codebase
  • Local archives are append-only—existing .eml files are never overwritten or removed
  • Even if the remote server demands deletion, this tool will not comply

The only file deletion that occurs is cleanup of failed temporary writes during the atomic write pattern (write to tmp/, move to cur/).

Installation

Prerequisites

Build from Source

git clone https://github.com/collabskus/MyImapDownloader.git
cd MyImapDownloader
dotnet build -c Release

Run

# Linux/macOS
./MyImapDownloader/bin/Release/net10.0/MyImapDownloader \
  -s imap.gmail.com -u user@gmail.com -p "app-password" -o ~/EmailArchive

# Windows
.\MyImapDownloader\bin\Release\net10.0\MyImapDownloader.exe `
  -s imap.gmail.com -u user@gmail.com -p "app-password" -o C:\EmailArchive

Usage

Command-Line Options

Option Short Default Description
--server -s required IMAP server address
--username -u required Email account username
--password -p required Account password or App Password
--port -r 993 IMAP port
--output -o EmailArchive Output directory for archived emails
--all-folders -a false Sync all folders, not just INBOX
--start-date Filter: download emails after this date (yyyy-MM-dd)
--end-date Filter: download emails before this date (yyyy-MM-dd)
--verbose -v false Enable verbose/debug logging

Examples

# Download INBOX only
dotnet run --project MyImapDownloader -- \
  -s imap.gmail.com -u you@gmail.com -p "app-password" -o ~/EmailArchive

# Download all folders with date range
dotnet run --project MyImapDownloader -- \
  -s imap.gmail.com -u you@gmail.com -p "app-password" \
  -o ~/EmailArchive --all-folders --start-date 2020-01-01

# Custom output directory with verbose logging
dotnet run --project MyImapDownloader -- \
  -s imap.gmail.com -u you@gmail.com -p "app-password" \
  -o ~/Documents/hikingfan_at_gmail_dot_com -v

Custom Output Directories

You can store emails anywhere on your filesystem:

# Absolute path (Linux/macOS)
-o /home/kushal/Documents/email_backups/personal_gmail

# Absolute path (Windows)
-o C:\Users\Kushal\Documents\EmailBackups\WorkOutlook

# Relative path (from current directory)
-o ./archives/hikingfan_at_gmail_dot_com

# Home directory expansion
-o ~/EmailArchive/account1

Configuration

Gmail Setup

  1. Enable 2-Step Verification
  2. Generate an App Password
  3. Use the 16-character app password with -p

IMAP Provider Reference

Provider Server Port Notes
Gmail imap.gmail.com 993 Requires App Password
Outlook/Office 365 outlook.office365.com 993 May require App Password
Yahoo Mail imap.mail.yahoo.com 993 Requires App Password
Fastmail imap.fastmail.com 993 Supports regular password
ProtonMail 127.0.0.1 1143 Via ProtonMail Bridge
iCloud imap.mail.me.com 993 Requires App-Specific Password
Zoho Mail imap.zoho.com 993 Supports regular password

Application Settings (appsettings.json)

{
  "Telemetry": {
    "ServiceName": "MyImapDownloader",
    "ServiceVersion": "1.0.0",
    "OutputDirectory": "telemetry",
    "MaxFileSizeMB": 25,
    "EnableTracing": true,
    "EnableMetrics": true,
    "EnableLogging": true,
    "FlushIntervalSeconds": 5,
    "MetricsExportIntervalSeconds": 15
  },
  "Logging": {
    "LogLevel": {
      "Default": "Information",
      "Microsoft": "Warning",
      "System": "Warning"
    }
  }
}

Environment Variables

All command-line options can also be set via environment variables:

# Linux/macOS
export IMAP_SERVER="imap.gmail.com"
export IMAP_USERNAME="you@gmail.com"
export IMAP_PASSWORD="your-app-password"
export EMAIL_OUTPUT_DIR="$HOME/EmailArchive"

# Windows PowerShell
$env:IMAP_SERVER = "imap.gmail.com"
$env:IMAP_USERNAME = "you@gmail.com"
$env:IMAP_PASSWORD = "your-app-password"
$env:EMAIL_OUTPUT_DIR = "$env:USERPROFILE\EmailArchive"

Architecture & Storage

Output Structure

EmailArchive/
├── index.v1.db                    # SQLite index (deduplication + sync state)
├── INBOX/
│   ├── cur/                       # Downloaded messages
│   │   ├── 1702900000.abc123.mypc:2,S.eml
│   │   ├── 1702900000.abc123.mypc:2,S.eml.meta.json
│   │   └── ...
│   ├── new/                       # (Reserved for future use)
│   └── tmp/                       # Atomic write staging area
├── Sent/
│   └── cur/
│       └── ...
└── Archive/
    └── cur/
        └── ...

SQLite Database Schema

The index.v1.db file contains two tables:

-- Tracks all archived messages for deduplication
CREATE TABLE Messages (
    MessageId TEXT PRIMARY KEY,
    Folder TEXT NOT NULL,
    ImportedAt TEXT NOT NULL
);

-- Tracks sync state for delta downloads
CREATE TABLE SyncState (
    Folder TEXT PRIMARY KEY,
    LastUid INTEGER NOT NULL,
    UidValidity INTEGER NOT NULL
);

The database uses WAL mode for better concurrency and crash resilience:

PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA cache_size = -64000;  -- 64MB cache

Sidecar Metadata Files

Each .eml file has a companion .meta.json file:

{
  "MessageId": "abc123def456@mail.gmail.com",
  "Subject": "Project Update - Q4 Review",
  "From": "alice@example.com",
  "To": "bob@example.com",
  "Date": "2025-12-24T10:30:00Z",
  "Folder": "INBOX",
  "ArchivedAt": "2025-12-24T15:45:32Z",
  "HasAttachments": true
}

Delta Sync Algorithm

The application implements a six-step delta synchronization strategy:

  1. Checkpoint Loading: On startup, retrieves LastUid and UidValidity for each folder from SQLite
  2. UIDVALIDITY Check: Compares server's UIDVALIDITY with stored value; if changed, resets sync state for that folder
  3. UID Search: Queries server for UID > LastUid only—skips already-archived messages
  4. Header-First Verification: Fetches envelope metadata before downloading body; checks Message-ID against index
  5. Streaming Download: Streams email body directly to disk via atomic write pattern (minimal RAM usage)
  6. Checkpoint Update: Updates LastUid in database after each successful batch
// Simplified delta sync logic
var (lastUid, storedValidity) = await _storage.GetSyncStateAsync(folder.FullName, ct);

if (storedValidity != folder.UidValidity)
{
    // UIDVALIDITY changed - server rebuilt folder, must rescan
    lastUid = 0;
    await _storage.ResetSyncStateAsync(folder.FullName, folder.UidValidity, ct);
}

// Only fetch UIDs greater than our last checkpoint
var query = lastUid > 0 
    ? SearchQuery.Uids(new UniqueIdRange(new UniqueId((uint)lastUid + 1), UniqueId.MaxValue))
    : SearchQuery.All;

var newUids = await folder.SearchAsync(query, ct);

Self-Healing Recovery

If the SQLite database is corrupted or missing:

  1. Corrupt database is moved to index.v1.db.corrupt.<timestamp>
  2. Fresh database is created with schema
  3. All existing .meta.json files are scanned
  4. Index is rebuilt from sidecar metadata
  5. Sync continues without re-downloading existing emails

Telemetry & Observability

Telemetry is written to XDG-compliant directories in JSONL format:

~/.local/share/MyImapDownloader/telemetry/
├── traces/
│   └── traces_2025-12-24_0001.jsonl
├── metrics/
│   └── metrics_2025-12-24_0001.jsonl
└── logs/
    └── logs_2025-12-24_0001.jsonl

XDG Directory Resolution

The telemetry system follows the XDG Base Directory Specification with graceful fallbacks:

  1. $XDG_DATA_HOME/<app>/telemetry (typically ~/.local/share)
  2. $HOME/.local/state/<app>/telemetry
  3. Executable directory fallback
  4. Current working directory (last resort)
  5. If no writable location is found, telemetry is disabled but the application continues normally

Instrumented Spans

Span Name Description
EmailArchiveSession Root span for entire application run
DownloadEmails IMAP connection and folder enumeration
ProcessFolder Per-folder delta sync processing
ProcessEmail Individual email download
SaveStream Disk write and metadata extraction
RebuildIndex Database recovery operation

Metrics

Metric Type Unit Description
emails.downloaded Counter emails Successfully downloaded emails
emails.skipped Counter emails Duplicates skipped
emails.errors Counter errors Download failures
storage.files.written Counter files Files written to disk
storage.bytes.written Counter bytes Total bytes written
storage.write.latency Histogram ms Write operation duration
connections.active Gauge connections Active IMAP connections

JSONL Output Format

Each line is a complete, valid JSON object:

{"type":"trace","timestamp":"2025-12-24T12:00:00Z","traceId":"abc123","spanId":"def456","operationName":"ProcessFolder","durationMs":1234.5}
{"type":"metric","timestamp":"2025-12-24T12:00:00Z","metricName":"storage.files.written","value":42}
{"type":"log","timestamp":"2025-12-24T12:00:00Z","logLevel":"Information","formattedMessage":"Downloaded: Re: Hello World"}

File Rotation

  • Daily rotation: New file each day
  • Size-based rotation: New file when exceeding MaxFileSizeMB (default: 25 MB)
  • Naming pattern: {type}_{date}_{sequence}.jsonl

MyEmailSearch (Coming Soon)

A companion tool for searching the email archive is under development. It will provide:

  • Fast structured searches: by sender, recipient, subject, date ranges
  • Full-text search: across email bodies using SQLite FTS5
  • Multiple output formats: table, JSON, CSV
  • Sub-second query times: for archives up to 100GB

Preview of the CLI:

# Search by sender
myemailsearch search "from:alice@example.com"

# Full-text search
myemailsearch search "kafka dotnet ecosystem"

# Combined query with date range
myemailsearch search "from:bob subject:report" --after 2025-01-01 --before 2025-06-01

# Output as JSON
myemailsearch search "quarterly review" --format json --limit 50

Development

Repository Structure

MyImapDownloader/
├── Directory.Build.props          # Shared build properties
├── Directory.Packages.props       # Centralized package versions
├── MyImapDownloader.slnx          # Solution file
├── MyImapDownloader/              # Main application
│   ├── Program.cs                 # Entry point
│   ├── EmailDownloadService.cs    # IMAP sync logic
│   ├── EmailStorageService.cs     # SQLite + file storage
│   └── Telemetry/                 # OpenTelemetry exporters
├── MyImapDownloader.Tests/        # TUnit test suite
├── MyEmailSearch/                 # Search tool (coming soon)
│   ├── Commands/                  # CLI command handlers
│   └── appsettings.json
└── MyEmailSearch.Tests/           # Search tool tests

Build & Test

# Build all projects
dotnet build

# Run all tests
dotnet test

# Run MyImapDownloader with verbose output
dotnet run --project MyImapDownloader -- \
  -s imap.example.com -u user -p pass -v

# Run MyEmailSearch (when available)
dotnet run --project MyEmailSearch -- search "from:alice"

Key Dependencies

Package Purpose
MailKit IMAP client
Microsoft.Data.Sqlite SQLite database
Polly Resilience patterns (retry, circuit breaker)
OpenTelemetry Observability framework
CommandLineParser CLI argument parsing (MyImapDownloader)
System.CommandLine CLI framework (MyEmailSearch)
TUnit Testing framework

Testing Framework

The project uses TUnit with Microsoft.Testing.Platform for modern, high-performance testing:

[Test]
public async Task Application_ShouldCompileAndRun()
{
    bool result = true;
    await Assert.That(result).IsTrue();
}

Central Package Management

All NuGet package versions are managed in Directory.Packages.props:

<PackageVersion Include="MailKit" Version="4.12.1" />
<PackageVersion Include="Polly" Version="8.6.0" />
<PackageVersion Include="TUnit" Version="0.19.56" />

This ensures all projects in the solution use consistent package versions.

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

This means:

  • You can use, modify, and distribute this software
  • If you modify and deploy it as a network service, you must release your source code
  • All derivative works must also be licensed under AGPL-3.0

See the LICENSE file for the complete license text.


Built with ❤️ using .NET 10

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •