Skip to content

Memory System

Enreign edited this page Mar 13, 2026 · 2 revisions

Memory System

Sparks maintains persistent cross-session memory using a local embedding pipeline, an HNSW approximate nearest-neighbor index, and SQLite full-text search. No external embedding API is required — all vector generation happens locally via an ONNX model.


Overview

Task / conversation
  → Generate embedding (ONNX, local)
    → Deduplicate check (cosine threshold)
      → Store in SQLite + update HNSW index
        → At next task: retrieve top-K by similarity + recency weight
          → Inject into system prompt context

Embedding Model

Property Value
Model all-MiniLM-L6-v2
Format ONNX
Vector dimensions 384
Location ~/.sparks/models/all-MiniLM-L6-v2/
External API required No

Verify the model is present:

cargo run --quiet -- doctor --skip-llm | grep -i embed

If the directory is missing, memory and semantic search fail silently. The doctor command will flag this.


Search: HNSW + FTS5

Sparks uses two complementary retrieval methods:

HNSW (Hierarchical Navigable Small World)

  • Approximate nearest-neighbor search over the 384-dim embedding space
  • Scales well as memory grows
  • Falls back to exact cosine search for small/early datasets (below a threshold)

FTS5 (SQLite Full-Text Search)

  • Fast keyword-based retrieval
  • Useful for exact terms, identifiers, and structured content that embeddings may handle poorly

At retrieval time, results from both methods are merged and re-ranked.


Recency Decay

Older memories lose relevance weight over time. The decay follows an exponential half-life:

weight = cosine_similarity × 2^(-age_days / half_life_days)

Default half_life_days = 30. This means a memory from 30 days ago has half the effective weight of an otherwise identical memory from today.

Configure in config.toml:

[memory]
half_life_days = 30

Deduplication

Before storing a new memory, Sparks checks cosine similarity against recent memories. If similarity exceeds dedup_threshold, the new memory is discarded as a duplicate.

[memory]
dedup_threshold = 0.92   # 0.0–1.0; higher = less aggressive deduplication

Injection into Context

At task start, Sparks retrieves the top-K most relevant memories and injects them into the ghost's system prompt. The number is configurable:

[memory]
max_results = 10

Memory injection counts against the task's context budget (default 128K tokens). If the context is tight, reduce max_results or trim soul file size.


Configuration

[memory]
enabled         = true
half_life_days  = 30
dedup_threshold = 0.92
max_results     = 10

[embedding]
enabled   = true
model_dir = "~/.sparks/models/all-MiniLM-L6-v2"

Storage

All memories are persisted in SQLite (~/.sparks/sparks.db):

  • memories table: text content, embedding vector (BLOB), timestamp, source
  • HNSW index: rebuilt/updated incrementally on writes; exact search fallback below threshold
  • FTS5 virtual table: mirrors memory text for keyword search

Memory survives restarts. There is currently no manual memory management CLI — memories accumulate indefinitely, filtered by recency weight.


Diagnostics

# Check embedding model and memory pipeline health
cargo run --quiet -- doctor --skip-llm

# Check memory count (via observer)
cargo run --quiet -- observe

The doctor command checks:

  • Embedding model directory exists and is readable
  • SQLite DB is writable
  • HNSW index initializes without error

Relevant Source Files

  • src/memory.rs — MemoryStore, HNSW index, FTS5, deduplication
  • src/embeddings.rs — ONNX runtime, vector generation
  • docs/memory-hot-path-lru.md — hot path design and benchmark contract

Clone this wiki locally