We Gave AI a Mirror. Now It Measures What It Believes.
Empirica is an epistemic measurement system that makes AI agents measurably more reliable β tracking what they know, preventing action before understanding, and compounding learning across sessions.
Training & Guides | CLI Reference | Architecture
AI coding agents today have no self-awareness about what they know:
- Forgets between sessions β same questions, same dead ends, every time
- Acts before understanding β edits your code without knowing the architecture
- Can't tell you when it's guessing β no distinction between knowledge and confabulation
- No audit trail β reasoning evaporates with the context window
| Capability | What You Experience |
|---|---|
| Measures before acting | AI investigates your codebase before touching it. The Sentinel gate blocks edits until understanding is demonstrated |
| Remembers across sessions | Findings, dead-ends, and learnings persist in a 4-layer memory system. Session 3 starts where Session 2 left off |
| Prevents confident mistakes | The CHECK gate uses thresholds computed dynamically from calibration data before allowing action |
| Shows confidence in real-time | Live statusline in your terminal: [empirica] β‘94% β70% β π―3 β POST π92% β K:95% C:92% |
| Calibrates against reality | Dual-track verification compares AI self-assessment against objective evidence β tests, git metrics, goal completion |
| Tracks your codebase | Temporal entity model auto-extracts functions, classes, and imports from every file edit β the AI knows what's alive and what's stale |
| Works through natural language | You describe tasks normally. The AI operates the measurement system automatically |
You talk to your AI normally. Empirica works in the background:
You: "Fix the authentication bug in the login flow"
Empirica: [AI investigates β logs findings β passes Sentinel gate β implements fix β measures learning]
You see: β‘87% β70% β π―1 β POST π85% β K:88% C:82% β Ξ +K
You direct. The AI measures.
Empirica's CLI has 150+ commands spanning investigation, measurement, calibration, and memory β like a cockpit instrument panel. You don't need to learn any of them. The AI reads the instruments, operates the controls, and reports back in natural language. The statusline gives you the flight data at a glance.
For power users, direct CLI access is always available: empirica goals-list, empirica calibration-report, empirica project-search --task "...", and more.
Learn the full workflow: getempirica.com has interactive training, guides, and deep explanations of every concept.
pip install empirica
empirica setup-claude-codeThen just start working. The hooks, Sentinel, system prompt, statusline, and MCP server are all configured automatically. See Claude Code Setup for details.
Already have Claude Code configured? Use --force to replace your default Claude Code settings with Empirica's epistemic hooks. Without --force, setup only writes files that don't already exist β so if you've already used Claude Code, the default internals stay in place and Empirica's hooks won't activate.
empirica setup-claude-code --force--force replaces hooks in settings.json but only removes Empirica's own hooks β hooks from other plugins (Railway, Superpowers, etc.) are preserved.
Homebrew (macOS)
brew tap nubaeon/tap
brew install empirica
empirica setup-claude-codeDocker
# Security-hardened Alpine image (~276MB, recommended)
docker pull nubaeon/empirica:1.7.1-alpine
# Standard image (Debian slim, ~414MB)
docker pull nubaeon/empirica:1.7.1
# Run
docker run -it -v $(pwd)/.empirica:/data/.empirica nubaeon/empirica:1.7.1 /bin/bashManual / Other AI Platforms
pip install empirica
pip install empirica-mcp # MCP Server (for Cursor, Cline, etc.)
cd your-project && empirica project-initThe CLI works standalone on any platform. The full epistemic workflow (epistemic transactions, Sentinel, calibration) requires loading the system prompt into your AI. See System Prompts for Claude, Copilot, Gemini, Qwen, and Roo Code.
empirica onboard # Interactive walkthrough of the full workflowOr just start working β with Claude Code hooks active, the AI manages the epistemic workflow automatically.
Empirica works through nested abstraction layers:
Plan
βββ Transaction 1 (Goal A)
βββ NOETIC: investigate, search, read β findings, unknowns, dead-ends
βββ CHECK: Sentinel gate β proceed / investigate more
βββ PRAXIC: implement, write, commit β goals completed
βββ POSTFLIGHT: measure learning delta β persists to memory
βββ Transaction 2 (Goal B, informed by T1's findings)
βββ ...
Plans decompose into transactions β one per goal or Claude Code task. Each transaction is a noetic-praxic loop: investigate first (noetic), then act (praxic), with the Sentinel gating the transition. Along the way, the AI collects and reads artifacts (findings, unknowns, assumptions, dead-ends, decisions) while using semantic search to surface relevant epistemic patterns and anti-patterns from the project's history. Top artifacts are ranked by confidence and fed into each project's MEMORY.md as a hot cache.
PREFLIGHT βββββββββΊ CHECK βββββββββΊ POSTFLIGHT
β β β
Baseline Sentinel Learning
Assessment Gate Delta
β β β
"What do I "Am I ready "What did I
know now?" to act?" learn?"
PREFLIGHT: AI assesses its knowledge state before starting work. CHECK: Sentinel gate validates readiness before allowing code edits. POSTFLIGHT: AI measures what it learned, creating a delta that persists.
With Claude Code hooks enabled, you see the AI's epistemic state in real-time:
[empirica] β‘94% β70% β π―3 β12/5 β POST π92% β K:95% C:92% β Ξ +K +C
| Signal | Meaning |
|---|---|
| β‘94% | Overall epistemic confidence |
| β70% | Sentinel threshold (know gate) β user-facing only |
| π―3 β12/5 | Open goals (3), unknowns (12 total, 5 blocking) |
| POST π92% | Transaction phase + work state (π investigating / π¨ acting) with composite score |
| K:95% C:92% | Knowledge and Context vectors (color-coded by gap to threshold) |
| Ξ +K +C | Learning delta (POSTFLIGHT only) β which vectors improved |
These vectors emerged from 600+ real working sessions across multiple AI systems. They measure the dimensions that consistently predict success or failure in complex tasks.
| Tier | Vector | What It Measures |
|---|---|---|
| Gate | engagement |
Is the AI actively processing or disengaged? |
| Foundation | know |
Domain knowledge depth |
do |
Execution capability | |
context |
Access to relevant information | |
| Comprehension | clarity |
How clear is the understanding? |
coherence |
Do the pieces fit together? | |
signal |
Signal-to-noise in available information | |
density |
Information richness | |
| Execution | state |
Current working state |
change |
Rate of progress/change | |
completion |
Task completion level | |
impact |
Significance of the work | |
| Meta | uncertainty |
Explicit doubt tracking |
Deep dive: Epistemic Vectors Explained
Empirica doesn't replace or reinvent anything Claude Code already does. Claude Code owns tasks, plans, memory, and projects. Empirica adds the measurement layer on top:
| Claude Code Does | Empirica Adds |
|---|---|
| Task management | Epistemic goals with measurable completion |
| Plan mode | Investigation phase with Sentinel gating β no edits until understanding is verified |
| MEMORY.md | Auto-curated hot cache ranked by epistemic confidence |
| Context window | 4-layer memory that survives compaction and persists across sessions |
| Code editing | Grounded calibration β was the AI's confidence justified by test results? |
| Subagent spawning | Bounded autonomy with delegated work counting and budget tracking |
The result: Claude Code's native capabilities, enhanced with measurement, gating, and calibration feedback that compounds over time.
| Platform | Integration Level | What You Get |
|---|---|---|
| Claude Code | Full (production) | Hooks, Sentinel gate, skills, agents, statusline, MCP |
| Cursor, Cline | MCP server | Epistemic transaction workflow, memory, calibration via MCP tools |
| Gemini CLI, Copilot | Experimental | System prompt + CLI |
| Any AI | CLI + prompt | Full measurement via CLI commands and system prompt |
| Resource | What It Covers |
|---|---|
| getempirica.com | Training course, interactive guides, deep explanations |
| Natural Language Guide | How to collaborate with AI using Empirica |
| Getting Started | First-time setup and concepts |
| CLI Reference | All 150+ commands documented |
| Architecture | Technical reference for contributors |
| System Prompts | AI prompts for Claude, Copilot, Gemini, Qwen, Roo |
| Project | Description | Status |
|---|---|---|
| Empirica | Core measurement system β epistemic transactions, Sentinel, calibration, 13 vectors | Open source |
| Empirica Iris | Epistemic browser automation with SVG spatial indexing β Sentinel gating for visual interactions | Open source |
| Docpistemic | Epistemic documentation coverage assessment β know what your docs know | Open source |
| Breadcrumbs | Survive context compacts with git notes β dead simple session continuity | Open source |
| Empirica Workspace | Entity Knowledge Graph, Epistemic Prompt Engine, CRM, portfolio dashboard | Proprietary |
Building something with Empirica? Open an issue to get listed.
setup-claude-code --forceno longer nukes other plugins' hooks β Previously cleared ALL hooks in settings.json. Now filters by Empirica plugin path, preserving Railway, Superpowers, and custom hooks- Python version detection β
_find_python()now preferspython3over versionedpython3.Xbinaries, preventing hooks from usingpython3.13which may not exist on all systems /empiricacommand trigger matching β Description now includes common phrases ("sentinel paused", "turn off empirica", "off-record statusline") so Claude can associate user intent with the command- Sentinel pipe targets β Added
base64toSAFE_PIPE_TARGETSsogh api ... | base64 -disn't blocked as praxic - README What's New sync β Release script now auto-syncs What's New section from CHANGELOG via
sync_readme_whats_new() - Cross-project search dedup β Deduplicate results by content across project collections
- Empirica Constitution β 12-section governance framework routing situations to mechanisms
- Epistemic Persistence Protocol (EPP) β Calibrated position-holding under pushback, replacing AAP
- Lean Core Prompt β 81% reduction in always-loaded context.
setup-claude-code --lean - Cross-Project Search β
--globalsearches ALL projects' Qdrant collections - Cross-Project Artifact Writing β
finding-log --project-id <name>writes to another project - Plugin Renamed β
empirica-integrationβempirica. Runsetup-claude-code --force - Brier Score Calibration β Proper scoring rule with dynamic thresholds
- Profile Management β
profile-sync,profile-prune,profile-status
Your data stays local:
.empirica/β Local SQLite database (gitignored by default).git/refs/notes/empirica/*β Epistemic checkpoints (local unless you push)- Qdrant runs locally if enabled
No cloud dependencies. No telemetry. Your epistemic data is yours.
- Website: getempirica.com
- Issues: GitHub Issues
- Discussions: GitHub Discussions
MIT License β see LICENSE for details.
Author: David S. L. Van Assche Version: 1.7.1
Turtles all the way down β built with its own epistemic framework, measuring what it knows at every step.