Skip to content

Knowledge entry staleness: aging, pruning, and retrieval weighting #88

@AndreRobitaille

Description

@AndreRobitaille

Context

The auto-knowledge-extraction feature (PR forthcoming) adds stated_at dates to extracted KB entries and includes the date in trust labels injected into prompts. Downstream AI is instructed to qualify old facts with "as of [date]."

However, there's no active staleness management yet — old facts get the same retrieval priority as new ones, and nothing prunes or flags entries that are likely outdated.

Ideas to explore

  • Retrieval time-decay: Multiply cosine similarity score by a recency factor so older entries rank lower. Could be a simple linear decay or a configurable half-life.
  • Staleness flagging: A scheduled job that marks entries older than N months as needs_refresh or reduces their retrieval weight. Different fact types age differently (counts/dollar figures vs. relationships/ownership).
  • Pruning: Auto-block or deactivate entries past a certain age threshold, especially numeric/count facts. Could be part of the weekly ExtractKnowledgePatternsJob — if a pattern job sees updated numbers, it could supersede the old entry.
  • Supersession: When a new entry contradicts an old one (e.g., "6 hotels" vs old "5 hotels"), auto-block the old entry. The extraction prompt could be asked to flag superseded facts.
  • Confidence decay: Reduce confidence over time so triage naturally filters old entries on re-evaluation.

Not urgent

The current v1 approach (date in labels + soft instructions) is a reasonable starting point. This issue tracks improvements after we see how the system performs in practice.


🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions