dev-loop

A safety harness for AI coding agents. It watches every action your AI takes, blocks dangerous ones in real-time, and runs quality checks before code ships.

100% open-source tool stack. Zero paid services (beyond Anthropic API).

Why This Exists

AI coding agents are fast — but they can also delete your .env, leak API keys in a commit, run rm -rf /, or push broken code. The faster they work, the more damage they can do before you notice.

dev-loop sits between the AI agent and your codebase. It intercepts every action, checks it against safety rules, and blocks anything dangerous — all in under 5 milliseconds. For bigger checks (security scans, running tests), it runs a full gate at commit time.

How It Works

Seven layers form a closed loop — the output of every stage feeds back as input. There is no "end", only cycles that get tighter as the harness learns.

#	Layer	What it does
1	Intake	Pulls issues from the tracker
2	Orchestration	Creates an isolated branch for the agent to work in
3	Agent Runtime	Runs the AI agent with scoped permissions
4	Quality Gates	Scans for secrets, vulnerabilities, and test failures
5	Observability	Records everything for debugging and dashboards
6	Feedback Loop	Retries failures, escalates to a human if stuck
7	LLMOps	Optimizes prompts via DSPy/GEPA using real session data

Three tiers of protection, from instant to thorough:

Real-time (< 5ms) — Every file write, edit, or shell command is checked against deny lists, dangerous patterns, and secret detectors before it executes.
Commit-time (~30s) — Tests, security scanning, secret detection, and spec enforcement run before each commit goes through.
Full pipeline (on demand) — Take a bug report → assign it to an agent → agent writes code → quality gates → retry on failure → PR created.

Three-Tier Safety Model

Tier	When it runs	Latency	What it checks
Tier 1	Every tool call	< 5ms	Blocked files, dangerous commands, leaked secrets
Tier 2	On `git commit`	~30s	Security scanner, secret scanner, test runner, spec enforcement
Tier 3	On demand	~minutes	Full 7-layer pipeline end-to-end

Tier 1: Check Engine

Check engine — deny list, dangerous ops, secrets

Three check modules run on every Write, Edit, or Bash call the agent makes:

Deny List — 15 patterns block writes to sensitive files (.env, .ssh/*, *.key, etc.)
Dangerous Ops — 25 patterns warn or block risky commands (rm -rf, curl | sh, git push --force, etc.)
Secret Scanner — 15 patterns catch API keys, private keys, and database strings in file content.

All patterns are configurable per-project.

Hook Integration

Five hooks connect Claude Code to the safety daemon:

Hook	When it fires	What it does
`PreToolUse`	Before Write, Edit, Bash	Checks for blocked files and dangerous commands
`PostToolUse`	After Write, Edit	Scans written content for secrets
`SessionStart`	Session begins	Registers the session, injects context from prior sessions
`SessionEnd`	Session ends	Saves session state, exports telemetry
`Stop`	After each turn	Warns if the agent is using too much context (85% threshold)

Hooks are fail-open — if the daemon is unavailable, all tool calls proceed normally.

Tier 2: Checkpoint Gates

Checkpoint gates — 5 sequential gates on commit

Five gates run in sequence on git commit. The first failure blocks the commit:

Sanity — Auto-detects the test runner and runs it
Semgrep — Security scanning for known vulnerability patterns
Secrets — Scans the staged diff for leaked credentials
ATDD — Checks that code matches acceptance specs (if configured)
Review — Placeholder for human or LLM code review

On pass, a Dev-Loop-Gate: <sha256> trailer is added to the commit message.

Config System

Three layers merge to produce the final configuration:

Built-in defaults — Hardcoded in the Rust daemon
Global config — ~/.config/dev-loop/ambient.yaml (applies to all projects)
Per-project config — .devloop.yaml in the project root

Pattern lists are additive (you can add or remove patterns at each level). Both global and project configs must have enabled: true for checks to run.

Session Lifecycle

Sessions are tracked from start to end:

Handoff file — Session state is saved between sessions so the next session picks up where you left off
Telemetry export — Spans are sent to OpenObserve for dashboards and alerting
Context guard — Warns at 85% context usage and saves state before context is lost
Temporary overrides — dl allow-once ".env" --ttl 600 bypasses a block for 10 minutes

Observability

Event log — Append-only JSONL log with automatic rotation
Live stream — Real-time event feed via dl stream
Telemetry — OpenTelemetry spans exported to OpenObserve
Dashboards — Loop health, agent performance, and calibration panels
Alerts — Gate failure spikes, stuck agents, cost anomalies

What Gets Blocked

The AI tries to...	What happens
Write to `.env` or `.ssh/config`	Blocked before the write executes (< 5ms)
Run `rm -rf /` or `curl ... \| sh`	Blocked before the command executes (< 5ms)
Force-push to main	Blocked before the push executes (< 5ms)
Commit code containing an API key	Blocked at commit time (~30s)
Commit code with a known vulnerability	Blocked at commit time (Semgrep scan)
Commit code that fails tests	Blocked at commit time (auto-detected test runner)

Calibration

just calibrate runs a 5-stage regression detection pipeline to make sure safety checks haven't degraded. It produces a dated report at docs/calibration/YYYY-MM-DD.md and exits with an error if it detects a regression.

Seven end-to-end test paths ("tracer bullets") validate every critical workflow — from the happy path through security catches, retries, cross-repo cascades, session replay, and LLMOps A/B comparison. See Tracer Bullets for details.

Quick Start

# Build and install the daemon
cd daemon && cargo build --release
cp target/release/dl ~/.local/bin/dl

# Hook into Claude Code
dl install

# Start the daemon
dl start

# Check status
dl status

Once running, dev-loop silently protects every Claude Code session. No changes to your workflow needed.

CLI Reference

Daemon Management

Command	Purpose
`dl start`	Start daemon (background, Unix socket)
`dl stop`	Graceful shutdown
`dl status`	Active sessions, uptime, event counts
`dl stream`	Tail live event stream
`dl reload`	Hot-reload config

Hook & Check

Command	Purpose
`dl install` / `dl uninstall`	Manage Claude Code hooks
`dl enable` / `dl disable`	Toggle the ambient safety layer
`dl check`	Offline check engine test
`dl checkpoint [--dir] [--json]`	Run Tier 2 gates offline
`dl allow-once <pattern>`	Temporary block override (5min TTL)
`dl kill <gate>` / `dl unkill [gate]`	Temporarily disable/re-enable a checkpoint gate

Observability

Command	Purpose
`dl traces --last N`	Tail the event log
`dl shadow-report`	Analyze shadow-mode verdicts
`dl feedback <id> correct\|false-positive\|missed`	Label events for scoring
`dl feedback --stats`	Precision/recall/F1 per check type
`dl outcome <session-id> success\|partial\|fail`	Record session outcome
`dl dashboard-validate`	Validate dashboard SQL queries

Configuration

Command	Purpose
`dl config [dir]`	Show merged config
`dl config-lint [--dir]`	Validate configuration
`dl rules`	Print active rules

Stats

Metric	Value
Tier 1 latency	< 5ms
Hook latency	~6ms (incl. process startup)
Binary size	6.5 MB
Total tests	997
Python tests	708
Rust tests	289
Conformance tests	106
Tracer bullets	7/7 passing

Agents

Claude Code agents that ship with dev-loop (.claude/agents/):

Agent	What it does
`@dashboard-mirror`	Grounds OpenObserve dashboard state — Playwright capture → 3-analyst pipeline (structure, data, UX) → synthesis into a canonical grounding doc. Source at `tools/dashboard-mirror/`.

Documentation

Doc	What it covers
Architecture	System diagram, data flow, multi-project model
Tracer Bullets	All 7 end-to-end test paths with entry/exit criteria
Edge Cases — Pass 1	25 failure modes: races, crashes, security
Edge Cases — Pass 2	16 design gaps: context scaling, backpressure
Scoring Rubric	7-dimension tool evaluation matrix
Test Repos	Validation targets and pass criteria
Network Requirements	External APIs, ports, degradation behavior
Ambient Layer Plan	Full daemon design spec (~1,200 lines)
Layer Docs	Per-layer design intent (7 docs, ~1000 lines)
ADRs	Architecture decision records

License

MIT

Diagrams rendered with beautiful-mermaid (github-dark theme).

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.beads		.beads
.claude		.claude
.entire		.entire
.github		.github
config		config
daemon		daemon
docs		docs
scripts		scripts
site		site
src/devloop		src/devloop
test-fixtures		test-fixtures
tests		tests
tools/dashboard-mirror		tools/dashboard-mirror
.devloop.yaml		.devloop.yaml
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
docker-compose.yaml		docker-compose.yaml
justfile		justfile
package.json		package.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dev-loop

Why This Exists

How It Works