Skip to content

Deduplicate near-identical content at retain time #826

@mvessair-hive

Description

@mvessair-hive

Problem

When an LLM agent recalls memories and then retains a new interaction in the same turn, the recalled context often gets re-retained as new facts. Over multiple turns this creates exponential duplication — we observed a single fact ("X was out with Y") duplicated 31 times across world and observation types.

This affects any agent that follows a recall → respond → retain pattern, which is the standard usage loop for Hindsight-backed agents.

Proposed Solution

At retain time, before creating new memory units, check embedding similarity of the incoming content against existing memories in the same bank. If similarity exceeds a threshold (e.g. >0.9 cosine), skip or merge instead of creating a new fact.

This could be:

  • A configurable bank-level setting (dedup_threshold: 0.9)
  • Opt-in via a retain parameter (deduplicate: true)
  • Always-on with a high threshold to avoid false positives

Impact

Without dedup, banks accumulate redundant copies that degrade recall quality (redundant results crowd out diverse facts) and waste storage/embedding compute.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions