A safety harness for AI coding agents. It watches every action your AI takes, blocks dangerous ones in real-time, and runs quality checks before code ships.
100% open-source tool stack. Zero paid services (beyond Anthropic API).
AI coding agents are fast — but they can also delete your .env, leak API keys in a commit, run rm -rf /, or push broken code. The faster they work, the more damage they can do before you notice.
dev-loop sits between the AI agent and your codebase. It intercepts every action, checks it against safety rules, and blocks anything dangerous — all in under 5 milliseconds. For bigger checks (security scans, running tests), it runs a full gate at commit time.
Seven layers form a closed loop — the output of every stage feeds back as input. There is no "end", only cycles that get tighter as the harness learns.
| # | Layer | What it does |
|---|---|---|
| 1 | Intake | Pulls issues from the tracker |
| 2 | Orchestration | Creates an isolated branch for the agent to work in |
| 3 | Agent Runtime | Runs the AI agent with scoped permissions |
| 4 | Quality Gates | Scans for secrets, vulnerabilities, and test failures |
| 5 | Observability | Records everything for debugging and dashboards |
| 6 | Feedback Loop | Retries failures, escalates to a human if stuck |
| 7 | LLMOps | Optimizes prompts via DSPy/GEPA using real session data |
Three tiers of protection, from instant to thorough:
- Real-time (< 5ms) — Every file write, edit, or shell command is checked against deny lists, dangerous patterns, and secret detectors before it executes.
- Commit-time (~30s) — Tests, security scanning, secret detection, and spec enforcement run before each commit goes through.
- Full pipeline (on demand) — Take a bug report → assign it to an agent → agent writes code → quality gates → retry on failure → PR created.
| Tier | When it runs | Latency | What it checks |
|---|---|---|---|
| Tier 1 | Every tool call | < 5ms | Blocked files, dangerous commands, leaked secrets |
| Tier 2 | On git commit |
~30s | Security scanner, secret scanner, test runner, spec enforcement |
| Tier 3 | On demand | ~minutes | Full 7-layer pipeline end-to-end |
Three check modules run on every Write, Edit, or Bash call the agent makes:
- Deny List — 15 patterns block writes to sensitive files (
.env,.ssh/*,*.key, etc.) - Dangerous Ops — 25 patterns warn or block risky commands (
rm -rf,curl | sh,git push --force, etc.) - Secret Scanner — 15 patterns catch API keys, private keys, and database strings in file content.
All patterns are configurable per-project.
Five hooks connect Claude Code to the safety daemon:
| Hook | When it fires | What it does |
|---|---|---|
PreToolUse |
Before Write, Edit, Bash | Checks for blocked files and dangerous commands |
PostToolUse |
After Write, Edit | Scans written content for secrets |
SessionStart |
Session begins | Registers the session, injects context from prior sessions |
SessionEnd |
Session ends | Saves session state, exports telemetry |
Stop |
After each turn | Warns if the agent is using too much context (85% threshold) |
Hooks are fail-open — if the daemon is unavailable, all tool calls proceed normally.
Five gates run in sequence on git commit. The first failure blocks the commit:
- Sanity — Auto-detects the test runner and runs it
- Semgrep — Security scanning for known vulnerability patterns
- Secrets — Scans the staged diff for leaked credentials
- ATDD — Checks that code matches acceptance specs (if configured)
- Review — Placeholder for human or LLM code review
On pass, a Dev-Loop-Gate: <sha256> trailer is added to the commit message.
Three layers merge to produce the final configuration:
- Built-in defaults — Hardcoded in the Rust daemon
- Global config —
~/.config/dev-loop/ambient.yaml(applies to all projects) - Per-project config —
.devloop.yamlin the project root
Pattern lists are additive (you can add or remove patterns at each level). Both global and project configs must have enabled: true for checks to run.
Sessions are tracked from start to end:
- Handoff file — Session state is saved between sessions so the next session picks up where you left off
- Telemetry export — Spans are sent to OpenObserve for dashboards and alerting
- Context guard — Warns at 85% context usage and saves state before context is lost
- Temporary overrides —
dl allow-once ".env" --ttl 600bypasses a block for 10 minutes
- Event log — Append-only JSONL log with automatic rotation
- Live stream — Real-time event feed via
dl stream - Telemetry — OpenTelemetry spans exported to OpenObserve
- Dashboards — Loop health, agent performance, and calibration panels
- Alerts — Gate failure spikes, stuck agents, cost anomalies
| The AI tries to... | What happens |
|---|---|
Write to .env or .ssh/config |
Blocked before the write executes (< 5ms) |
Run rm -rf / or curl ... | sh |
Blocked before the command executes (< 5ms) |
| Force-push to main | Blocked before the push executes (< 5ms) |
| Commit code containing an API key | Blocked at commit time (~30s) |
| Commit code with a known vulnerability | Blocked at commit time (Semgrep scan) |
| Commit code that fails tests | Blocked at commit time (auto-detected test runner) |
just calibrate runs a 5-stage regression detection pipeline to make sure safety checks haven't degraded. It produces a dated report at docs/calibration/YYYY-MM-DD.md and exits with an error if it detects a regression.
Seven end-to-end test paths ("tracer bullets") validate every critical workflow — from the happy path through security catches, retries, cross-repo cascades, session replay, and LLMOps A/B comparison. See Tracer Bullets for details.
# Build and install the daemon
cd daemon && cargo build --release
cp target/release/dl ~/.local/bin/dl
# Hook into Claude Code
dl install
# Start the daemon
dl start
# Check status
dl statusOnce running, dev-loop silently protects every Claude Code session. No changes to your workflow needed.
| Command | Purpose |
|---|---|
dl start |
Start daemon (background, Unix socket) |
dl stop |
Graceful shutdown |
dl status |
Active sessions, uptime, event counts |
dl stream |
Tail live event stream |
dl reload |
Hot-reload config |
| Command | Purpose |
|---|---|
dl install / dl uninstall |
Manage Claude Code hooks |
dl enable / dl disable |
Toggle the ambient safety layer |
dl check |
Offline check engine test |
dl checkpoint [--dir] [--json] |
Run Tier 2 gates offline |
dl allow-once <pattern> |
Temporary block override (5min TTL) |
dl kill <gate> / dl unkill [gate] |
Temporarily disable/re-enable a checkpoint gate |
| Command | Purpose |
|---|---|
dl traces --last N |
Tail the event log |
dl shadow-report |
Analyze shadow-mode verdicts |
dl feedback <id> correct|false-positive|missed |
Label events for scoring |
dl feedback --stats |
Precision/recall/F1 per check type |
dl outcome <session-id> success|partial|fail |
Record session outcome |
dl dashboard-validate |
Validate dashboard SQL queries |
| Command | Purpose |
|---|---|
dl config [dir] |
Show merged config |
dl config-lint [--dir] |
Validate configuration |
dl rules |
Print active rules |
| Metric | Value |
|---|---|
| Tier 1 latency | < 5ms |
| Hook latency | ~6ms (incl. process startup) |
| Binary size | 6.5 MB |
| Total tests | 997 |
| Python tests | 708 |
| Rust tests | 289 |
| Conformance tests | 106 |
| Tracer bullets | 7/7 passing |
Claude Code agents that ship with dev-loop (.claude/agents/):
| Agent | What it does |
|---|---|
@dashboard-mirror |
Grounds OpenObserve dashboard state — Playwright capture → 3-analyst pipeline (structure, data, UX) → synthesis into a canonical grounding doc. Source at tools/dashboard-mirror/. |
| Doc | What it covers |
|---|---|
| Architecture | System diagram, data flow, multi-project model |
| Tracer Bullets | All 7 end-to-end test paths with entry/exit criteria |
| Edge Cases — Pass 1 | 25 failure modes: races, crashes, security |
| Edge Cases — Pass 2 | 16 design gaps: context scaling, backpressure |
| Scoring Rubric | 7-dimension tool evaluation matrix |
| Test Repos | Validation targets and pass criteria |
| Network Requirements | External APIs, ports, degradation behavior |
| Ambient Layer Plan | Full daemon design spec (~1,200 lines) |
| Layer Docs | Per-layer design intent (7 docs, ~1000 lines) |
| ADRs | Architecture decision records |
MIT
Diagrams rendered with beautiful-mermaid (github-dark theme).