Skip to content

Add BM25/keyword search as a retrieval strategy #10

@lightcone0

Description

@lightcone0

Context

Research on AI memory systems (March 2026) shows a near-linear correlation between retrieval strategy count and benchmark performance:

  • Hindsight (4 strategies: semantic, BM25, graph, temporal) → 91.4% LongMemEval
  • Zep (3 strategies: cosine, BM25, BFS) → 71.2%
  • Mem0 (1-2 strategies: vector, optional graph) → 49%

db0 currently has similarity + recency + popularity + graph expansion, but no BM25/keyword search.

The gap

Embedding search finds semantically similar content but misses exact mentions. A search for "API key" as a literal string may not rank highly in embedding space. BM25 catches these exact matches that cosine similarity misses.

This is the single biggest retrieval improvement available without changing the storage model.

Proposed approach

  • Add BM25 scoring as an optional retrieval strategy alongside the existing hybrid scoring
  • SQLite: use FTS5 full-text search extension (available in sql.js)
  • PostgreSQL: use built-in tsvector + tsquery full-text search
  • Merge BM25 results with existing strategies via RRF (reciprocal rank fusion), which db0 already has in util/rrf.ts
  • Profile-configurable: some workloads (coding-assistant) would benefit more than others (curated-memory)

Impact

Based on the research data, adding BM25 as a parallel strategy could close a significant portion of the gap on temporal and exact-recall queries. This is the highest-leverage single retrieval improvement.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions