Skip to content

Metadata mismatch: subject/snippet from different message than body/from #147

@dvdsgl

Description

@dvdsgl

Summary

Some messages in the archive show metadata (subject, snippet) from one email while the stored raw MIME and parsed body/from are from a different email. The mismatch is visible in msgvault show-message <id> --json and in the UI: the subject line and snippet describe message A, but the body and sender are from message B.

Observed behavior

  • msgvault show-message <id> --json returns:
    • subject and snippet: text that clearly belongs to a different message (e.g. an automated usage report)
    • from and body_text: correct, matching the actual message
  • msgvault export-eml <id> shows the correct message: headers (From, Subject) and body all match each other and match from / body_text in the JSON.

So the bug is in the denormalized metadata in the messages table (subject, snippet). The raw MIME in message_raw and the parsed body/participants are correct.

Likely cause

From the source:

  • Subject is set from the MIME parser (parsed.Subject in parseToModel).
  • Snippet is set from the Gmail API response (raw.Snippet).

If the Gmail API sometimes returns a snippet (or if the parser’s subject were ever wrong) that doesn’t match the message’s actual content, the stored row would have this mixed metadata. Once written, incremental sync doesn’t re-fetch full content, so the wrong subject/snippet persist.

Environment

  • msgvault v0.7.0 (commit 909fff4, 2026-02-09)
  • Gmail account(s); sync via full and incremental

Suggestions

  1. Repair path: A way to re-fetch or re-parse a single message (e.g. re-run MIME parse from message_raw and update messages.subject / messages.snippet) so affected rows can be fixed without a full re-sync.
  2. Validation: Optionally compare stored subject/snippet to values derived from the raw MIME (or to Gmail’s envelope) and flag or auto-correct mismatches.
  3. Fallback: When serving a message, if subject/snippet are inconsistent with body/from, consider deriving a preview from the parsed body instead of trusting the stored snippet.

Thanks for msgvault — the ability to verify with export-eml made it clear the raw data was correct and the issue was limited to the stored metadata.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions