Memory System

Sparks maintains persistent cross-session memory using a local embedding pipeline, an HNSW approximate nearest-neighbor index, and SQLite full-text search. No external embedding API is required — all vector generation happens locally via an ONNX model.

Overview

Task / conversation
  → Generate embedding (ONNX, local)
    → Deduplicate check (cosine threshold)
      → Store in SQLite + update HNSW index
        → At next task: retrieve top-K by similarity + recency weight
          → Inject into system prompt context

Embedding Model

Property	Value
Model	`all-MiniLM-L6-v2`
Format	ONNX
Vector dimensions	384
Location	`~/.sparks/models/all-MiniLM-L6-v2/`
External API required	No

Verify the model is present:

cargo run --quiet -- doctor --skip-llm | grep -i embed

If the directory is missing, memory and semantic search fail silently. The doctor command will flag this.

Search: HNSW + FTS5

Sparks uses two complementary retrieval methods:

HNSW (Hierarchical Navigable Small World)

Approximate nearest-neighbor search over the 384-dim embedding space
Scales well as memory grows
Falls back to exact cosine search for small/early datasets (below a threshold)

FTS5 (SQLite Full-Text Search)

Fast keyword-based retrieval
Useful for exact terms, identifiers, and structured content that embeddings may handle poorly

At retrieval time, results from both methods are merged and re-ranked.

Recency Decay

Older memories lose relevance weight over time. The decay follows an exponential half-life:

weight = cosine_similarity × 2^(-age_days / half_life_days)

Default half_life_days = 30. This means a memory from 30 days ago has half the effective weight of an otherwise identical memory from today.

Configure in config.toml:

[memory]
half_life_days = 30

Deduplication

Before storing a new memory, Sparks checks cosine similarity against recent memories. If similarity exceeds dedup_threshold, the new memory is discarded as a duplicate.

[memory]
dedup_threshold = 0.92   # 0.0–1.0; higher = less aggressive deduplication

Injection into Context

At task start, Sparks retrieves the top-K most relevant memories and injects them into the ghost's system prompt. The number is configurable:

[memory]
max_results = 10

Memory injection counts against the task's context budget (default 128K tokens). If the context is tight, reduce max_results or trim soul file size.

Configuration

[memory]
enabled         = true
half_life_days  = 30
dedup_threshold = 0.92
max_results     = 10

[embedding]
enabled   = true
model_dir = "~/.sparks/models/all-MiniLM-L6-v2"

Storage

All memories are persisted in SQLite (~/.sparks/sparks.db):

memories table: text content, embedding vector (BLOB), timestamp, source
HNSW index: rebuilt/updated incrementally on writes; exact search fallback below threshold
FTS5 virtual table: mirrors memory text for keyword search

Memory survives restarts. There is currently no manual memory management CLI — memories accumulate indefinitely, filtered by recency weight.

Diagnostics

# Check embedding model and memory pipeline health
cargo run --quiet -- doctor --skip-llm

# Check memory count (via observer)
cargo run --quiet -- observe

The doctor command checks:

Embedding model directory exists and is readable
SQLite DB is writable
HNSW index initializes without error

Relevant Source Files

src/memory.rs — MemoryStore, HNSW index, FTS5, deduplication
src/embeddings.rs — ONNX runtime, vector generation
docs/memory-hot-path-lru.md — hot path design and benchmark contract

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory System

Memory System

Overview

Embedding Model

Search: HNSW + FTS5

HNSW (Hierarchical Navigable Small World)

FTS5 (SQLite Full-Text Search)

Recency Decay

Deduplication

Injection into Context

Configuration

Storage

Diagnostics

Relevant Source Files

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally