Skip to content

Memory strategy: reinforce durable memories, decay stale ones, and surface contradictions #1

@coe0718

Description

@coe0718

Description

Phantom already has strong building blocks for memory: session-end capture, semantic facts, contradiction handling, procedural memory, and hybrid retrieval. What feels missing is a stronger long-term memory strategy around reinforcement, forgetting, and startup retrieval.

I would like to propose an improvement to the existing memory system so it behaves more like a durable, adaptive memory rather than a growing archive.

The goal is not to add a second parallel memory system. The goal is to evolve the current one so it better answers three questions:

  1. What should become durable memory?
  2. What should fade away if it stops mattering?
  3. What should be surfaced when new information conflicts with old beliefs?

Use Case

Right now, Phantom can remember and retrieve useful information, but over time the memory model may drift toward accumulation without enough reinforcement or forgetting.

For a persistent co-worker, the important behavior is not just "can it search memory?" but "does it feel like it knows me, my preferences, and my working patterns without becoming noisy over time?"

A stronger memory strategy would improve workflows like:

  • remembering repeated user preferences across many sessions
  • surfacing stable patterns instead of many near-duplicate episodes
  • adapting when a user's habits change
  • keeping startup context focused instead of bloated
  • making long-running Phantom instances feel more cumulative and personally tuned

Proposed Solution

I think this fits best as an evolution of the existing episodic + semantic + procedural memory system, not a brand-new 5-layer subsystem.

Proposed direction:

  • Keep session-end extraction as the primary capture point.

    • Avoid per-turn model calls for salience unless there is a strong reason.
    • Let significance emerge from the full session, not from isolated turns.
  • Add reinforcement to durable memories.

    • When similar facts or patterns recur across sessions, increase their weight or confidence instead of storing near-duplicates forever.
    • Repeated signals should strengthen memory.
  • Add explicit decay / forgetting.

    • Episodic memories that are never accessed or reinforced should decay over time.
    • Stale or low-value memories should stop competing with durable ones during retrieval.
    • This should keep the system sharp rather than just larger.
  • Improve contradiction surfacing.

    • Existing contradiction handling is a great start.
    • In addition to superseding old facts, it would help to surface important contradictions during retrieval or startup so the agent can resolve or adapt to changed user behavior.
  • Improve startup retrieval.

    • Instead of loading "all memory", inject a focused top-N set of durable, high-signal memories at session start.
    • Favor reinforced memories, recent high-importance memories, and active contradictions.
  • Preserve the Cardinal Rule.

    • TypeScript should do plumbing: persistence, ranking, decay, retrieval, scheduling.
    • LLM judges should do reasoning: salience, extraction quality, contradiction interpretation, consolidation decisions.
    • Heuristics should stay fallback-only.

A possible implementation path could be split into small PRs:

  1. Reinforcement and decay metadata for episodic / semantic memory
  2. Ranking updates for retrieval and startup context
  3. Better contradiction surfacing
  4. Tests and docs for the new memory behavior

Alternatives Considered

Full 5-layer memory pipeline

I considered a more elaborate layered design with capture, observation, reflection, search, and startup loading as separate stages.

I do not think that should be added to Phantom as-is because:

  • it risks duplicating memory and evolution pipelines that already exist
  • per-turn LLM capture adds cost quickly
  • a separate observations file could create a second source of truth
  • loading all memory at startup would likely hurt context quality

Keep the current system unchanged

The current system is already better than most agent projects, but I think it is missing a stronger notion of reinforcement and forgetting, which is what makes long-term memory feel personal instead of archival.

Additional Context

This seems aligned with the contribution guide's "memory strategies" and "evolution pipeline improvements" categories.

I would be happy to help shape this into a focused PR series if the maintainer thinks this direction fits the project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions