Skip to content

[BUG] EMC brute-force cosine search will block at scale #15

@OppaAI

Description

@OppaAI

Description

emc.py search() loads ALL episodes from SQLite and computes cosine similarity in a Python loop on every conversation turn. This is fine now but will cause significant response delays as the episode count grows over months of use.

Steps to Reproduce

  1. Use GRACE daily for 6–12 months
  2. EMC accumulates ~10,000–50,000 episodes
  3. Every conversation turn takes several seconds just for EMC search

Expected Behavior

EMC search stays fast regardless of how many episodes are stored.

Actual Behavior

At scale:

  • 1,000 episodes: ~50ms (fine)
  • 10,000 episodes: ~500ms (noticeable)
  • 50,000 episodes: ~2.5s (unacceptable)
  • 100,000 episodes: ~5s+ (breaks conversation flow)

Error Logs

N/A — performance degradation, no crash

Environment

  • Hardware: Jetson Orin Nano Super 8GB
  • OS: Ubuntu 22.04 / JetPack 6.2.2
  • ROS2: Humble
  • Python: 3.10
  • Package: scs
  • Node: emc

Affected Files

emc.py search() method — loads all rows, pure Python cosine loop

Possible Fix

Option A — sqlite-vec extension (recommended for Jetson):

# Lightweight SQLite vector search extension
# No separate process, stays inside SQLite
import sqlite_vec
conn.enable_load_extension(True)
sqlite_vec.load(conn)

Option B — FAISS index:

import faiss
index = faiss.IndexFlatIP(EMBEDDING_DIM)
# Rebuild index periodically from episodes table

Option C — Date-scoped search (quick win):

# Only search recent episodes first, fall back to full search
# Most relevant memories are usually recent

Additional Context

Not urgent for M1 — brute force is fine up to ~5,000 episodes (~3 months of daily use). Plan FAISS or sqlite-vec for M2. The 1TB NVMe means storage is not a concern — only search speed matters.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

Status

Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions