zer0dex

A token-efficient memory architecture for persistent AI agents.

zer0dex combines a compressed human-readable index with a local vector store to give AI agents long-term memory that outperforms both flat files and traditional RAG.

Results (n=97)

Architecture	Recall	≥75% Pass Rate	Cross-Reference
Flat File Only	52.2%	41%	70.0%
Full RAG	80.3%	77%	37.5%
zer0dex	91.2%	87%	80.0%

Zero losses in head-to-head comparison. 70ms latency. $0/month (fully local).

The Problem

Persistent AI agents need memory across sessions. Current approaches all have gaps:

Flat files (MEMORY.md, etc.) — cheap on tokens but poor retrieval. The agent has to "know what it knows."
Full RAG (vector store only) — good retrieval but poor cross-referencing. Facts are "floating" with no structure.
MemGPT/Letta — powerful but complex. The LLM manages its own memory paging, adding latency and cost.

The Architecture

zer0dex uses two layers:

Layer 1: Compressed Index (MEMORY.md)

A human-readable markdown file (~3KB) that acts as a semantic table of contents. It tells the agent what categories of knowledge exist without storing full details.

## Products
- **Little Canary** — prompt injection detection, 99% on TensorTrust
- mem0 topics: Little Canary, prompt injection, HERM

This index is always in context (~782 tokens). It provides navigational scaffolding — the agent knows where knowledge lives.

Layer 2: Vector Store (mem0 + chromadb)

A semantic fact store containing extracted details from daily logs, conversations, and curated notes. Queried via embedding similarity on every inbound message.

The Hook: Automatic Pre-Message Injection

A lightweight HTTP server keeps the vector store warm in memory. Every inbound message triggers a semantic query (70ms), and the top 5 relevant memories are injected into the agent's context automatically. No judgment call. No forgetting.

User: "How's the Suy Sideguy deployment going?"
    ↓
Hook queries mem0 → finds: "Suy: watching PID 76977, qwen3:4b judge"
    ↓
Agent sees: [message] + [Suy context from mem0] + [MEMORY.md index]
    ↓
Agent responds with specific details it otherwise wouldn't have

Why It Works

The compressed index solves the cross-reference problem. When a user asks "How does X relate to Y?", vector similarity finds X or Y, rarely both. But the compressed index contains cross-domain pointers — it's like a hyperlinked table of contents. The LLM uses the index to bridge domains that vector similarity alone can't connect.

Query Type	Full RAG	zer0dex	Gap
Direct recall	84.3%	92.1%	+7.8pp
Cross-reference	37.5%	80.0%	+42.5pp
Negative (rejection)	100%	100%	0pp

Quick Start

Requirements

Python 3.11+
Ollama with nomic-embed-text and mistral:7b
8GB+ RAM

Install

pip install mem0ai chromadb

# Pull embedding model
ollama pull nomic-embed-text

# Pull extraction model
ollama pull mistral:7b

1. Create Your Compressed Index

Create a MEMORY.md file — a compressed summary of what your agent knows:

# Agent Memory Index

## User
- Name: Alice
- Role: ML Engineer
- Topics in mem0: Alice, preferences, projects

## Projects
- **ProjectX** — NLP pipeline, deadline March 15
- Topics in mem0: ProjectX, NLP, deadlines

2. Seed the Vector Store

python zer0dex/seed.py --source MEMORY.md --source memory/

3. Start the Memory Server

python zer0dex/server.py --port 18420

4. Integrate with Your Agent

The server exposes a simple HTTP API:

# Query
curl -X POST http://localhost:18420/query \
  -H "Content-Type: application/json" \
  -d '{"text": "What is ProjectX?", "limit": 5}'

# Add memory
curl -X POST http://localhost:18420/add \
  -H "Content-Type: application/json" \
  -d '{"text": "ProjectX deadline moved to March 20"}'

# Health check
curl http://localhost:18420/health

For automatic injection, add a pre-message hook in your agent framework that queries the server before every LLM call.

Evaluation

Run the evaluation suite yourself:

python eval/evaluate.py --memories 86 --test-cases 97

See eval/README.md for methodology and detailed results.

Architecture Diagram

┌─────────────────────────────────────────────────┐
│                  Agent Context                   │
│                                                  │
│  ┌──────────────┐    ┌───────────────────────┐   │
│  │  MEMORY.md   │    │   mem0 Results (auto)  │  │
│  │  (compressed  │    │   [injected by hook]   │  │
│  │   index)      │    │                        │  │
│  │  ~782 tokens  │    │   ~104 tokens          │  │
│  └──────────────┘    └───────────────────────┘   │
│         ↑                       ↑                │
│    Always loaded          Pre-message hook        │
│    at session start       queries on every msg    │
└─────────────────────────────────────────────────┘
                              ↑
                    ┌─────────────────┐
                    │  mem0 HTTP Server │
                    │  (port 18420)     │
                    │  70ms avg query   │
                    └─────────────────┘
                              ↑
                    ┌─────────────────┐
                    │    chromadb      │
                    │  (local vector   │
                    │   store)         │
                    └─────────────────┘

How It Compares

Feature	Flat Files	Full RAG	MemGPT/Letta	zer0dex
Recall	52%	80%	~80%*	91%
Cross-reference	70%	38%	Unknown	80%
Latency	0ms	~15ms	~500ms+	70ms
Token overhead	782	104	Variable	886
Cost/month	$0	$0-50	$0-50	$0
Complexity	Trivial	Moderate	High	Low
Auto-retrieval	No	On-demand	LLM-managed	Every message

*MemGPT recall estimated; no published head-to-head comparison available.

Citation

@misc{bosch2026zer0dex,
  title={zer0dex: Dual-Layer Memory Architecture for Persistent AI Agents},
  author={Bosch, Rolando},
  year={2026},
  url={https://github.com/roli-lpci/zer0dex}
}

License

Apache 2.0

Credits

Built by Hermes Labs as part of LPCI (Linguistically Persistent Cognitive Interface) research.

Uses mem0 for vector storage, chromadb for the vector database, and Ollama for local embeddings/extraction.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
eval		eval
src/zer0dex		src/zer0dex
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
llms.txt		llms.txt
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zer0dex

Results (n=97)

The Problem

The Architecture

Layer 1: Compressed Index (MEMORY.md)

Layer 2: Vector Store (mem0 + chromadb)

The Hook: Automatic Pre-Message Injection

Why It Works

Quick Start

Requirements

Install

1. Create Your Compressed Index

2. Seed the Vector Store

3. Start the Memory Server

4. Integrate with Your Agent

Evaluation

Architecture Diagram

How It Compares

Citation

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

zer0dex

Results (n=97)

The Problem

The Architecture

Layer 1: Compressed Index (MEMORY.md)

Layer 2: Vector Store (mem0 + chromadb)

The Hook: Automatic Pre-Message Injection

Why It Works

Quick Start

Requirements

Install

1. Create Your Compressed Index

2. Seed the Vector Store

3. Start the Memory Server

4. Integrate with Your Agent

Evaluation

Architecture Diagram

How It Compares

Citation

License

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages