Skip to content

Feature Request: Graph RAG over email archive #154

@ayushin

Description

@ayushin

Overview

Add an intelligent semantic search and knowledge layer over the email archive that
understands the meaning of messages and the relationships between senders,
conversations, and content — going far beyond keyword search.

Problems it solves

Repeated content bloats search and retrieval. Every reply email contains the full
quoted chain from all previous messages. Searching or analysing the archive today means
sifting through massive duplication — the same paragraph can appear hundreds of times.

Keyword search misses context. You have to know exactly what you're looking for.
You can't ask "what did we agree about the contract with Acme?" or "who are the key
people involved in the infrastructure migration?" and get a meaningful answer.

No map of who talks to whom about what. The archive contains a rich social and
topical graph — but today there's no way to explore it. You can't see which
conversations are semantically related, or how a topic evolved across threads.

What it does

  1. Deduplicate quoted content

When you reply to an email, your message body contains the full history of the thread
above. This feature strips and deduplicates that quoted content so each unique piece of
text is stored once, with pointers back to the messages that contain it. The result is a
clean, de-duplicated corpus of what was actually written — not what was re-quoted.

  1. Understand meaning, not just words

Each unique piece of content is run through an embedding model that converts it into a
semantic fingerprint. This allows the system to find content by meaning — so a search
for "project timeline slipped" can surface a message that says "we're running three
weeks behind" without sharing a single keyword.

  1. Build a knowledge graph of people and content

The unique content fragments and their authors are organised into a graph — a map of who
wrote what, what quotes what, and which content is semantically related. This enables
retrieval that can follow relationships: given a fragment, find the sender, find other
things they wrote, find other people who discussed the same topic.

  1. Build a map of conversations

At a higher level, each email thread gets a short AI-generated summary. These
conversation summaries are then organised into their own graph — linking threads that
share participants, share topics, or explicitly reference each other. This makes it
possible to navigate the archive at the conversation level, grouping related threads
that you'd never connect through keyword search alone.

User-facing capabilities this enables

  • Semantic Q&A: Ask a natural-language question and get an answer grounded in your
    actual emails, with references to the specific messages it drew from.
  • Topic exploration: Browse clusters of related conversations without knowing in
    advance what to search for.
  • People graph: See who the key correspondents are for any topic, and how they
    connect to each other.
  • Cross-thread discovery: Find conversations that are topically related even when
    they have completely different subject lines.
  • Attachment understanding: Surface documents and files by what they contain, not
    just their filename.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions