RLM - Infinite Memory for Claude Code

Your Claude Code sessions forget everything after /compact. RLM fixes that.

The Problem

Claude Code has a context window limit. When it fills up:

/compact wipes your conversation history
Previous decisions, insights, and context are lost
You repeat yourself. Claude makes the same mistakes. Productivity drops.

The Solution

RLM is an MCP server that gives Claude Code persistent memory across sessions:

You: "Remember that the client prefers 500ml bottles"
     → Saved. Forever. Across all sessions.

You: "What did we decide about the API architecture?"
     → Claude searches its memory and finds the answer.

3 lines to install. 14 tools. Zero configuration.

Quick Install

Requirements: Python 3.10+ (download), Claude Code CLI

Via PyPI (recommended)

pip install mcp-rlm-server[all]

Via uv (fast, no global pollution)

uv tool install mcp-rlm-server[all] --python 3.12

Via Git (full install with hooks)

git clone https://github.com/EncrEor/rlm-claude.git
cd rlm-claude
./install.sh

Via Docker

docker build -t rlm-server .
# Or pull from registry (when published):
# docker pull ghcr.io/encreor/rlm-claude

Then configure Claude Code to use the Docker container (see Docker setup below).

Restart Claude Code. Done.

Upgrading from v0.9.0 or earlier

v0.9.1 moved the source code from mcp_server/ to src/mcp_server/ (PyPA best practice). A compatibility symlink is included so existing installations keep working, but we recommend re-running the installer:

cd rlm-claude
git pull
./install.sh          # reconfigures the MCP server path

Your data (~/.claude/rlm/) is untouched. Only the server path is updated.

How It Works

                    ┌─────────────────────────┐
                    │     Claude Code CLI      │
                    └────────────┬────────────┘
                                 │
                    ┌────────────▼────────────┐
                    │    RLM MCP Server        │
                    │    (14 tools)            │
                    └────────────┬────────────┘
                                 │
              ┌──────────────────┼──────────────────┐
              │                  │                   │
    ┌─────────▼────────┐ ┌──────▼──────┐ ┌──────────▼─────────┐
    │    Insights       │ │   Chunks    │ │    Retention        │
    │ (key decisions,   │ │ (full conv  │ │ (auto-archive,      │
    │  facts, prefs)    │ │  history)   │ │  restore, purge)    │
    └──────────────────┘ └─────────────┘ └────────────────────┘

Auto-Save Before Context Loss

RLM hooks into Claude Code's /compact event. Before your context is wiped, RLM automatically saves a snapshot. No action needed.

Two Memory Systems

System	What it stores	How to use
Insights	Key decisions, facts, preferences	`rlm_remember()` / `rlm_recall()`
Chunks	Full conversation segments	`rlm_chunk()` / `rlm_peek()` / `rlm_grep()`

Features

Memory & Insights

rlm_remember - Save decisions, facts, preferences with categories and importance levels
rlm_recall - Search insights by keyword (multi-word tokenized), category, or importance
rlm_forget - Remove an insight
rlm_status - System overview (insight count, chunk stats, access metrics)

Conversation History

rlm_chunk - Save conversation segments with typed categorization (snapshot, session, debug; insight redirects to rlm_remember)
rlm_peek - Read a chunk (full or partial by line range)
rlm_grep - Regex search across all chunks (+ fuzzy matching for typo tolerance)
rlm_search - Hybrid search: BM25 + semantic cosine similarity (FR/EN, accent-normalized, chunks + insights)
rlm_list_chunks - List all chunks with metadata

Multi-Project Organization

rlm_sessions - Browse sessions by project or domain
rlm_domains - List available domains for categorization
Auto-detection of project from git or working directory
Cross-project filtering on all search tools

Smart Retention

rlm_retention_preview - Preview what would be archived (dry-run)
rlm_retention_run - Archive old unused chunks, purge ancient ones
rlm_restore - Bring back archived chunks
3-zone lifecycle: Active → Archive (.gz) → Purge
Immunity system: critical tags, frequent access, and keywords protect chunks

Auto-Chunking & Memory Routing (Hooks)

PreCompact hook: Automatic snapshot before /compact or auto-compact
PostToolUse hook (rlm_chunk): Stats tracking after chunk operations
PostToolUse hook (Write/Edit): Detects writes to Claude Code's auto-memory and nudges toward RLM for decisions, insights, and session logs
User-driven philosophy: you decide when to chunk, the system saves before loss

Semantic Search (optional)

Hybrid BM25 + cosine - Combines keyword matching with vector similarity for better relevance
Auto-embedding - New chunks are automatically embedded at creation time
Two providers - Model2Vec (fast, 256d) or FastEmbed (accurate, 384d)
Graceful degradation - Falls back to pure BM25 when semantic deps are not installed

Provider comparison (benchmark on 108 chunks)

	Model2Vec (default)	FastEmbed
Model	`potion-multilingual-128M`	`paraphrase-multilingual-MiniLM-L12-v2`
Dimensions	256	384
Embed 108 chunks	0.06s	1.30s
Search latency	0.1ms/query	1.5ms/query
Memory	0.1 MB	0.3 MB
Disk (model)	~35 MB	~230 MB
Semantic quality	Good (keyword-biased)	Better (true semantic)
Speed	21x faster	Baseline

Top-5 result overlap between providers: ~1.6/5 (different results in 7/8 queries). FastEmbed captures more semantic meaning while Model2Vec leans toward keyword similarity. The hybrid BM25 + cosine fusion compensates for both weaknesses.

Recommendation: Start with Model2Vec (default). Switch to FastEmbed only if you need better semantic accuracy and can afford the slower startup.

# Model2Vec (default) — fast, ~35 MB
pip install mcp-rlm-server[semantic]

# FastEmbed — more accurate, ~230 MB, slower
pip install mcp-rlm-server[semantic-fastembed]
export RLM_EMBEDDING_PROVIDER=fastembed

# Compare both providers on your data
python3 scripts/benchmark_providers.py

# Backfill existing chunks (run once after install)
python3 scripts/backfill_embeddings.py

Sub-Agent Skills

/rlm-analyze - Analyze a single chunk with an isolated sub-agent
/rlm-parallel - Analyze multiple chunks in parallel (Map-Reduce pattern from MIT RLM paper)

Comparison

Feature	Raw Context	Letta/MemGPT	RLM
Persistent memory	No	Yes	Yes
Works with Claude Code	N/A	No (own runtime)	Native MCP
Auto-save before compact	No	N/A	Yes (hooks)
Search (regex + BM25 + semantic)	No	Basic	Yes
Fuzzy search (typo-tolerant)	No	No	Yes
Multi-project support	No	No	Yes
Smart retention (archive/purge)	No	Basic	Yes
Sub-agent analysis	No	No	Yes
Zero config install	N/A	Complex	3 lines
FR/EN/JA support	N/A	EN only	3 languages
Cost	Free	Self-hosted	Free

Usage Examples

Session startup (recommended)

# Load universal rules (apply regardless of topic)
rlm_recall(importance="critical")

# Load context for current topic
rlm_recall(query="deployment")

# Check memory status
rlm_status()

Save and recall insights

# Save a universal rule (loaded every session)
rlm_remember("Always deploy LOCAL → VPS, never direct",
             category="decision", importance="critical",
             tags="deploy,workflow")

# Save a topic-specific insight
rlm_remember("WeasyPrint requires inline CSS for PDF rendering",
             category="finding", importance="high",
             tags="weasyprint,pdf")

# Find insights later
rlm_recall(query="source of truth")
rlm_recall(category="decision")
rlm_recall(importance="critical")    # all universal rules

Importance levels

Level	When to use	Loaded
`critical`	Universal rules (apply regardless of topic)	Every session
`high`	Topic-specific rules	When working on that topic
`medium`	Useful info, not blocking	On explicit search

Test: "Does this rule apply even when working on a completely different topic?" If yes → critical.

Manage conversation history

# Save important discussion (typed)
rlm_chunk("Discussion about API redesign... [long content]",
          summary="API v2 architecture decisions",
          tags="api,architecture",
          chunk_type="session")        # or "snapshot", "debug"

# Search across all history
rlm_search("API architecture decisions")      # BM25 ranked
rlm_grep("authentication", fuzzy=True)         # Typo-tolerant

# Read a specific chunk
rlm_peek("2026-01-18_MyProject_001")

Multi-project organization

# Filter by project
rlm_search("deployment issues", project="MyApp")
rlm_grep("database", project="MyApp", domain="infra")

# Browse sessions
rlm_sessions(project="MyApp")

Project Structure

rlm-claude/
├── src/mcp_server/
│   ├── server.py              # MCP server (14 tools)
│   └── tools/
│       ├── memory.py          # Insights (remember/recall/forget)
│       ├── navigation.py      # Chunks (chunk/peek/grep/list)
│       ├── search.py          # BM25 search engine
│       ├── tokenizer_fr.py    # FR/EN tokenization
│       ├── sessions.py        # Multi-session management
│       ├── retention.py       # Archive/restore/purge lifecycle
│       ├── embeddings.py      # Embedding providers (Model2Vec, FastEmbed)
│       ├── vecstore.py        # Vector store (.npz) for semantic search
│       └── fileutil.py        # Safe I/O (atomic writes, path validation, locking)
│
├── hooks/                     # Claude Code hooks
│   ├── i18n.py                # Translations (EN/FR/JA) for hook messages
│   ├── pre_compact_chunk.py   # Auto-save before /compact (PreCompact hook)
│   ├── memory_write_redirect.py # Redirect auto-memory writes to RLM (PostToolUse hook)
│   └── reset_chunk_counter.py # Stats reset after chunk (PostToolUse hook)
│
├── templates/
│   ├── hooks_settings.json    # Hook config template
│   ├── CLAUDE_RLM_SNIPPET.md  # CLAUDE.md instructions
│   └── skills/                # Sub-agent skills
│
├── context/                   # Storage (created at install, git-ignored)
│   ├── session_memory.json    # Insights
│   ├── index.json             # Chunk index
│   ├── chunks/                # Conversation history
│   ├── archive/               # Compressed archives (.gz)
│   ├── embeddings.npz         # Semantic vectors (Phase 8)
│   └── sessions.json          # Session index
│
├── install.sh                 # One-command installer
└── README.md

Configuration

Hook Configuration

The installer automatically configures hooks in ~/.claude/settings.json:

{
  "hooks": {
    "PreCompact": [
      {
        "matcher": "manual",
        "hooks": [{ "type": "command", "command": "python3 ~/.claude/rlm/hooks/pre_compact_chunk.py" }]
      },
      {
        "matcher": "auto",
        "hooks": [{ "type": "command", "command": "python3 ~/.claude/rlm/hooks/pre_compact_chunk.py" }]
      }
    ],
    "PostToolUse": [{
      "matcher": "mcp__rlm-server__rlm_chunk",
      "hooks": [{ "type": "command", "command": "python3 ~/.claude/rlm/hooks/reset_chunk_counter.py" }]
    }]
  }
}

Language

Hook messages default to English. Set RLM_LANG=fr for French or RLM_LANG=ja for Japanese:

# Option 1: Set globally in your shell profile (~/.zshrc, ~/.bashrc)
export RLM_LANG=fr   # or ja

# Option 2: Set per-hook in ~/.claude/settings.json
# Replace the command with:
"command": "RLM_LANG=fr python3 ~/.claude/rlm/hooks/pre_compact_chunk.py"

Supported languages: en (default), fr, ja.

Storage Directory

RLM stores data in ~/.claude/rlm/context/ by default. Override with RLM_CONTEXT_DIR:

export RLM_CONTEXT_DIR=/path/to/custom/storage

This is particularly useful for Docker deployments (see below).

Custom Domains

Organize chunks by topic with custom domains:

{
  "domains": {
    "my_project": {
      "description": "Domains for my project",
      "list": ["feature", "bugfix", "infra", "docs"]
    }
  }
}

Edit context/domains.json after installation.

Manual Installation

Via pip

pip install -e ".[all]"
claude mcp add rlm-server -- python3 -m mcp_server

Via uv

uv tool install mcp-rlm-server[all] --python 3.12
claude mcp add rlm-server -- ~/.local/bin/mcp-rlm-server

Hook Setup (required for pip and uv installs)

The ./install.sh script handles this automatically. For manual installs:

# Get hook scripts from the repo
git clone https://github.com/EncrEor/rlm-claude.git /tmp/rlm-setup

# Install hooks and i18n
mkdir -p ~/.claude/rlm/hooks
cp /tmp/rlm-setup/hooks/pre_compact_chunk.py ~/.claude/rlm/hooks/
cp /tmp/rlm-setup/hooks/reset_chunk_counter.py ~/.claude/rlm/hooks/
cp /tmp/rlm-setup/hooks/memory_write_redirect.py ~/.claude/rlm/hooks/
cp /tmp/rlm-setup/hooks/i18n.py ~/.claude/rlm/hooks/
chmod +x ~/.claude/rlm/hooks/*.py

# Install skills (optional)
mkdir -p ~/.claude/skills/rlm-analyze ~/.claude/skills/rlm-parallel
cp /tmp/rlm-setup/templates/skills/rlm-analyze/skill.md ~/.claude/skills/rlm-analyze/
cp /tmp/rlm-setup/templates/skills/rlm-parallel/skill.md ~/.claude/skills/rlm-parallel/

# Cleanup
rm -rf /tmp/rlm-setup

Then configure hooks in ~/.claude/settings.json (see Hook Configuration above).

Docker Setup

Build the image:

git clone https://github.com/EncrEor/rlm-claude.git
cd rlm-claude
docker build -t rlm-server .

Configure Claude Code MCP to use Docker:

claude mcp add rlm-server -- docker run -i --rm -v ~/.claude/rlm/context:/data rlm-server

Or manually in ~/.claude/settings.json:

{
  "mcpServers": {
    "rlm-server": {
      "type": "stdio",
      "command": "docker",
      "args": ["run", "-i", "--rm", "-v", "~/.claude/rlm/context:/data", "rlm-server"]
    }
  }
}

The Docker image uses RLM_CONTEXT_DIR=/data internally, and the volume mount maps it to your local storage.

Uninstall

./uninstall.sh              # Interactive (choose to keep or delete data)
./uninstall.sh --keep-data  # Remove RLM config, keep your chunks/insights
./uninstall.sh --all        # Remove everything
./uninstall.sh --dry-run    # Preview what would be removed

Security

RLM includes built-in protections for safe operation:

Path traversal prevention - Chunk IDs are validated against a strict allowlist ([a-zA-Z0-9_.-&]), and resolved paths are verified to stay within the storage directory
Atomic writes - All JSON and chunk files are written using write-to-temp-then-rename, preventing corruption from interrupted writes or crashes
File locking - Concurrent read-modify-write operations on shared indexes use fcntl.flock exclusive locks
Content size limits - Chunks are limited to 2 MB, and gzip decompression (archive restore) is capped at 10 MB to prevent resource exhaustion
SHA-256 hashing - Content deduplication uses SHA-256 (not MD5)

All I/O safety primitives are centralized in mcp_server/tools/fileutil.py.

Troubleshooting

"MCP server not found"

claude mcp list                    # Check servers
claude mcp remove rlm-server       # Remove if exists
claude mcp add rlm-server -- python3 -m mcp_server

"Hooks not working"

cat ~/.claude/settings.json | grep -A 10 "PreCompact"  # Verify hooks config
ls ~/.claude/rlm/hooks/                                  # Check installed hooks

Roadmap

Inspired By

Research Papers

RLM Paper (MIT CSAIL) - Zhang et al., Dec 2025 - "Recursive Language Models" — foundational architecture (chunk/peek/grep, sub-agent analysis)
MAGMA (arXiv:2601.03236) - Jan 2026 - "Memory-Augmented Generation with Memory Agents" — temporal filtering, entity extraction (Phase 7)

Libraries & Tools

Model2Vec - Static word embeddings for fast semantic search (Phase 8)
BM25S - Fast BM25 implementation in pure Python (Phase 5)
FastEmbed - ONNX-based embeddings, optional provider (Phase 8)
Letta/MemGPT - AI agent memory framework — early inspiration

Standards & Platform

MCP Specification - Model Context Protocol
Claude Code Hooks - PreCompact / PostToolUse hooks

Contributing

Translations

The repository is maintained in English. User-facing files can be translated to your language:

File	Purpose	Translations welcome
`README.md`	Main documentation	`README.xx.md` (e.g., `README.fr.md`, `README.ja.md`)
`templates/CLAUDE_RLM_SNIPPET.md`	CLAUDE.md instructions	`CLAUDE_RLM_SNIPPET.xx.md`

Code, comments, and commit messages stay in English.

Authors

Ahmed MAKNI (@EncrEor)
Claude Opus 4.6 (joint R&D)

License

MIT License - see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github/workflows		.github/workflows
context		context
docs		docs
hooks		hooks
mcp_server		mcp_server
scripts		scripts
src/mcp_server		src/mcp_server
templates		templates
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CHECKLIST_PAPER_VS_SOLUTION.md		CHECKLIST_PAPER_VS_SOLUTION.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.fr.md		README.fr.md
README.ja.md		README.ja.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
STATE_OF_ART.md		STATE_OF_ART.md
install.sh		install.sh
pyproject.toml		pyproject.toml
uninstall.sh		uninstall.sh

Folders and files

Latest commit

History

Repository files navigation

RLM - Infinite Memory for Claude Code

The Problem

The Solution

Quick Install

Via PyPI (recommended)

Via uv (fast, no global pollution)

Via Git (full install with hooks)

Via Docker

Upgrading from v0.9.0 or earlier

How It Works

Auto-Save Before Context Loss

Two Memory Systems

Features

Memory & Insights

Conversation History

Multi-Project Organization

Smart Retention

Auto-Chunking & Memory Routing (Hooks)

Semantic Search (optional)

Provider comparison (benchmark on 108 chunks)

Sub-Agent Skills

Comparison

Usage Examples

Session startup (recommended)

Save and recall insights

Importance levels

Manage conversation history

Multi-project organization

Project Structure

Configuration

Hook Configuration

Language

Storage Directory

Custom Domains

Manual Installation

Via pip

Via uv

Hook Setup (required for pip and uv installs)

Docker Setup

Uninstall

Security

Troubleshooting

"MCP server not found"

"Hooks not working"

Roadmap

Inspired By

Research Papers

Libraries & Tools

Standards & Platform

Contributing

Translations

Authors

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages