Skip to content

toejough/engram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,352 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Engram

Self-correcting memory for LLM agents. A Claude Code plugin that learns from sessions, surfaces relevant memories, measures whether they're actually followed, and fixes the ones that aren't.

The problem

Claude Code has several instruction sources — CLAUDE.md, rules, skills — but no way to know if they're working. An instruction that's always loaded but never followed wastes context budget. A great instruction that only matches narrow keywords goes unseen. Without measurement, instruction sets decay: duplicates accumulate, stale guidance persists, and useful patterns stay buried.

How engram solves it

Engram hooks into every phase of a Claude Code session to create a closed feedback loop:

  Learn ──→ Surface ──→ Maintain
    ↑                      │
    └──────────────────────┘
  1. Learn — Extracts structured memories from user corrections, instructions, and contextual facts. Each memory is a TOML file with tier-based metadata: title, content, principle, anti-pattern, keywords, concepts, and confidence tier (A/B/C).

  2. Surface — At every prompt and tool use, retrieves relevant memories via BM25 keyword scoring and injects them as context. A per-hook token budget caps total injection to avoid overwhelming the model.

  3. Maintain — Periodic diagnosis generates proposals for each memory based on its effectiveness quadrant. Proposals are applied with user confirmation: rewrites for stale content, keyword broadening for hidden gems, escalation for persistent violations, deletion for noise.

Instruction sources

Engram manages one instruction source — memories — and cross-references against other Claude Code sources for deduplication:

Source type Description Surfacing behavior
memory TOML files in ~/.claude/engram/data/memories/ Keyword-matched per prompt via BM25

Cross-reference sources (used for suppression, not managed by engram): CLAUDE.md, .claude/rules/, plugin skills.

Each memory TOML embeds its own registry data: content hash, surfaced count, last surfaced timestamp, evaluation counters (followed/contradicted/ignored), and enforcement level.

Measurements

Every time a memory is surfaced, the registry increments its surfaced_count and updates last_surfaced. At session end, the evaluator classifies each surfaced memory's outcome:

  • Followed — The model's behavior aligned with the instruction
  • Contradicted — The model did the opposite of what the instruction said
  • Ignored — The instruction was surfaced but had no observable effect

From these counters, engram computes effectiveness (followed / total evaluations) and frecency (recency-weighted frequency with configurable half-life decay).

Effectiveness quadrants

The registry classifies every instruction into one of four quadrants:

High effectiveness Low effectiveness
Often surfaced Working — Keep as-is Leech — Rewrite or escalate
Rarely surfaced Hidden Gem — Broaden keywords Noise — Remove

All memories are classified uniformly by the same thresholds. An additional "Insufficient" quadrant applies when evaluation data is below the minimum threshold.

Maintenance actions

Each quadrant has a prescribed action:

  • Working — Check for staleness; rewrite if content is outdated
  • Leech — Diagnose root cause (content quality, wrong keywords, enforcement gap) and either rewrite content or escalate enforcement
  • Hidden Gem — LLM suggests additional keywords/concepts to increase surfacing coverage
  • Noise — Delete the memory TOML file

Maintenance proposals are generated by engram maintain and applied interactively via engram maintain --apply --proposals <file>.

Enforcement escalation

For persistent leech memories (instructions that keep getting violated), engram applies a graduated escalation ladder:

  1. Advisory — Standard surfacing (default)
  2. Emphasized advisory — Surfaced with emphasis markers
  3. Reminder — Reminder injected after tool use via PostToolUse hook

Escalation level is tracked per-memory.

Memory graph

Engram builds a directed graph of relationships between memories. When a memory is surfaced, spreading activation propagates to related memories, allowing conceptually linked instructions to surface together even when keyword overlap is low.

Links are created automatically via merge-on-write: when new memories are stored, engram detects duplicates and near-duplicates, merging them and preserving relationship edges. Links are re-computed after each merge to keep the graph consistent.

Session lifecycle

Engram hooks into 7 Claude Code hook points:

Phase Hook What happens
Start SessionStart Build binary if stale. Run maintain/triage. Notify user that /recall is available.
Prompt UserPromptSubmit Surface prompt-relevant memories (BM25). Detect inline corrections (UC-3).
Prompt (async) UserPromptSubmit Incremental learning extraction from transcript delta.
Tool use PreToolUse Surface tool-specific memories (e.g., file-path-relevant instructions).
After tool PostToolUse Surface memories relevant to the tool call and its output.
Tool failure PostToolUseFailure Diagnose errors and surface relevant memories.
Compact PreCompact Flush: learning extraction.
End Stop Flush: learning extraction.

Previous session context is loaded on demand via the /recall skill, which summarizes recent transcripts from the same project using Haiku. This keeps session-start lightweight and avoids injecting stale context automatically.

Data files

All data lives in ~/.claude/engram/data/:

File Purpose
memories/*.toml Structured memory files with embedded registry data (surfaced count, evaluation counters, enforcement level)
surfacing-log.jsonl Running log of which memories were surfaced and when
learn-offset.json Offset tracking for incremental transcript learning

Memory TOML structure

Each memory is a TOML file with structured fields:

title = "Use targ for builds"
content = "Always use targ build system instead of raw go commands"
observation_type = "workflow_preference"
concepts = ["build-system", "tooling"]
keywords = ["targ", "build", "go test", "go vet"]
principle = "Use targ test, targ check, targ build for all operations"
anti_pattern = "Running go test or go vet directly"
rationale = "targ encodes hard-won lessons about build configuration"
confidence = "A"

Confidence tiers: A (explicit instruction — "always/never/remember"), B (teachable correction), C (contextual fact). Anti-patterns are required for tier A, optional for B, empty for C.

Installation

Requires Go 1.25+.

# Clone and install as a Claude Code plugin
git clone https://github.com/toejough/engram.git

# Add to Claude Code — the plugin auto-builds on first session
claude plugin add /path/to/engram

The binary auto-builds on first hook invocation and rebuilds when Go source files change.

Project structure

cmd/engram/          CLI entry point (thin wiring layer)
internal/            Business logic (33 packages, all DI boundaries)
hooks/               Shell scripts for Claude Code hook integration
skills/              Plugin skills (memory triage)
.claude-plugin/      Plugin manifest (plugin.json)
docs/specs/          Specification artifacts (UC, REQ, DES, ARCH, TEST)

Design principles

  • DI everywhere — No function in internal/ calls os.*, http.*, or any I/O directly. All I/O through injected interfaces, wired at cmd/ and cli.go edges.
  • Pure Go, no CGO — TF-IDF/BM25 for retrieval instead of ONNX. External embedding API if vector similarity needed.
  • Fire and forget — Registry writes never block the critical path. Write failures are logged but don't fail the operation.
  • Measure impact, not frequency — A memory surfaced 1000 times but never followed is a leech, not a success.

Specification

23 use cases across 5 specification layers (UC → REQ/DES/ARCH → TEST → IMPL).

UC Name
UC-1 Session Learning
UC-2 Hook-Time Surfacing & Enforcement
UC-3 Remember & Correct
UC-6 Memory Effectiveness Review
UC-14 Structured Session Continuity
UC-15 Automatic Outcome Signal
UC-16 Unified Memory Maintenance
UC-17 Context Budget Management
UC-18 PostToolUse Proactive Reminders
UC-21 Enforcement Escalation Ladder
UC-23 Unified Instruction Registry
UC-24 Proposal Application
UC-27 Global Binary Installation
UC-28 Automatic Maintenance and Promotion Triggers
UC-32 Memory Graph with Spreading Activation
UC-33 Merge-on-Write
UC-34 Pre-Classification Duplicate Consolidation
UC-P1-1 Cross-Source Contradiction Detection
UC-P5f-1 Re-compute Links After Merge

License

See LICENSE for details.

About

Self-correcting memory for LLM agents. A Claude Code plugin that learns from sessions, surfaces relevant memories, measures whether they're actually followed, and fixes the ones that aren't.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors