Skip to content

musicofhel/dev-loop

Repository files navigation

dev-loop

A safety harness for AI coding agents. It watches every action your AI takes, blocks dangerous ones in real-time, and runs quality checks before code ships.

100% open-source tool stack. Zero paid services (beyond Anthropic API).

Why This Exists

AI coding agents are fast — but they can also delete your .env, leak API keys in a commit, run rm -rf /, or push broken code. The faster they work, the more damage they can do before you notice.

dev-loop sits between the AI agent and your codebase. It intercepts every action, checks it against safety rules, and blocks anything dangerous — all in under 5 milliseconds. For bigger checks (security scans, running tests), it runs a full gate at commit time.

How It Works

System Overview — 7-layer closed loop

Seven layers form a closed loop — the output of every stage feeds back as input. There is no "end", only cycles that get tighter as the harness learns.

# Layer What it does
1 Intake Pulls issues from the tracker
2 Orchestration Creates an isolated branch for the agent to work in
3 Agent Runtime Runs the AI agent with scoped permissions
4 Quality Gates Scans for secrets, vulnerabilities, and test failures
5 Observability Records everything for debugging and dashboards
6 Feedback Loop Retries failures, escalates to a human if stuck
7 LLMOps Optimizes prompts via DSPy/GEPA using real session data

Three tiers of protection, from instant to thorough:

  • Real-time (< 5ms) — Every file write, edit, or shell command is checked against deny lists, dangerous patterns, and secret detectors before it executes.
  • Commit-time (~30s) — Tests, security scanning, secret detection, and spec enforcement run before each commit goes through.
  • Full pipeline (on demand) — Take a bug report → assign it to an agent → agent writes code → quality gates → retry on failure → PR created.

Three-Tier Safety Model

Three-tier ambient architecture
Tier When it runs Latency What it checks
Tier 1 Every tool call < 5ms Blocked files, dangerous commands, leaked secrets
Tier 2 On git commit ~30s Security scanner, secret scanner, test runner, spec enforcement
Tier 3 On demand ~minutes Full 7-layer pipeline end-to-end

Tier 1: Check Engine

Check engine — deny list, dangerous ops, secrets

Three check modules run on every Write, Edit, or Bash call the agent makes:

  • Deny List — 15 patterns block writes to sensitive files (.env, .ssh/*, *.key, etc.)
  • Dangerous Ops — 25 patterns warn or block risky commands (rm -rf, curl | sh, git push --force, etc.)
  • Secret Scanner — 15 patterns catch API keys, private keys, and database strings in file content.

All patterns are configurable per-project.

Hook Integration

Hook integration — Claude Code to daemon

Five hooks connect Claude Code to the safety daemon:

Hook When it fires What it does
PreToolUse Before Write, Edit, Bash Checks for blocked files and dangerous commands
PostToolUse After Write, Edit Scans written content for secrets
SessionStart Session begins Registers the session, injects context from prior sessions
SessionEnd Session ends Saves session state, exports telemetry
Stop After each turn Warns if the agent is using too much context (85% threshold)

Hooks are fail-open — if the daemon is unavailable, all tool calls proceed normally.

Tier 2: Checkpoint Gates

Checkpoint gates — 5 sequential gates on commit

Five gates run in sequence on git commit. The first failure blocks the commit:

  1. Sanity — Auto-detects the test runner and runs it
  2. Semgrep — Security scanning for known vulnerability patterns
  3. Secrets — Scans the staged diff for leaked credentials
  4. ATDD — Checks that code matches acceptance specs (if configured)
  5. Review — Placeholder for human or LLM code review

On pass, a Dev-Loop-Gate: <sha256> trailer is added to the commit message.

Config System

3-layer config merge

Three layers merge to produce the final configuration:

  1. Built-in defaults — Hardcoded in the Rust daemon
  2. Global config~/.config/dev-loop/ambient.yaml (applies to all projects)
  3. Per-project config.devloop.yaml in the project root

Pattern lists are additive (you can add or remove patterns at each level). Both global and project configs must have enabled: true for checks to run.

Session Lifecycle

Session lifecycle state diagram

Sessions are tracked from start to end:

  • Handoff file — Session state is saved between sessions so the next session picks up where you left off
  • Telemetry export — Spans are sent to OpenObserve for dashboards and alerting
  • Context guard — Warns at 85% context usage and saves state before context is lost
  • Temporary overridesdl allow-once ".env" --ttl 600 bypasses a block for 10 minutes

Observability

Observability data flow
  • Event log — Append-only JSONL log with automatic rotation
  • Live stream — Real-time event feed via dl stream
  • Telemetry — OpenTelemetry spans exported to OpenObserve
  • Dashboards — Loop health, agent performance, and calibration panels
  • Alerts — Gate failure spikes, stuck agents, cost anomalies

What Gets Blocked

The AI tries to... What happens
Write to .env or .ssh/config Blocked before the write executes (< 5ms)
Run rm -rf / or curl ... | sh Blocked before the command executes (< 5ms)
Force-push to main Blocked before the push executes (< 5ms)
Commit code containing an API key Blocked at commit time (~30s)
Commit code with a known vulnerability Blocked at commit time (Semgrep scan)
Commit code that fails tests Blocked at commit time (auto-detected test runner)

Calibration

5-stage calibration pipeline

just calibrate runs a 5-stage regression detection pipeline to make sure safety checks haven't degraded. It produces a dated report at docs/calibration/YYYY-MM-DD.md and exits with an error if it detects a regression.

Seven end-to-end test paths ("tracer bullets") validate every critical workflow — from the happy path through security catches, retries, cross-repo cascades, session replay, and LLMOps A/B comparison. See Tracer Bullets for details.

Quick Start

# Build and install the daemon
cd daemon && cargo build --release
cp target/release/dl ~/.local/bin/dl

# Hook into Claude Code
dl install

# Start the daemon
dl start

# Check status
dl status

Once running, dev-loop silently protects every Claude Code session. No changes to your workflow needed.

CLI Reference

Daemon Management

Command Purpose
dl start Start daemon (background, Unix socket)
dl stop Graceful shutdown
dl status Active sessions, uptime, event counts
dl stream Tail live event stream
dl reload Hot-reload config

Hook & Check

Command Purpose
dl install / dl uninstall Manage Claude Code hooks
dl enable / dl disable Toggle the ambient safety layer
dl check Offline check engine test
dl checkpoint [--dir] [--json] Run Tier 2 gates offline
dl allow-once <pattern> Temporary block override (5min TTL)
dl kill <gate> / dl unkill [gate] Temporarily disable/re-enable a checkpoint gate

Observability

Command Purpose
dl traces --last N Tail the event log
dl shadow-report Analyze shadow-mode verdicts
dl feedback <id> correct|false-positive|missed Label events for scoring
dl feedback --stats Precision/recall/F1 per check type
dl outcome <session-id> success|partial|fail Record session outcome
dl dashboard-validate Validate dashboard SQL queries

Configuration

Command Purpose
dl config [dir] Show merged config
dl config-lint [--dir] Validate configuration
dl rules Print active rules

Stats

Metric Value
Tier 1 latency < 5ms
Hook latency ~6ms (incl. process startup)
Binary size 6.5 MB
Total tests 997
Python tests 708
Rust tests 289
Conformance tests 106
Tracer bullets 7/7 passing

Agents

Claude Code agents that ship with dev-loop (.claude/agents/):

Agent What it does
@dashboard-mirror Grounds OpenObserve dashboard state — Playwright capture → 3-analyst pipeline (structure, data, UX) → synthesis into a canonical grounding doc. Source at tools/dashboard-mirror/.

Documentation

Doc What it covers
Architecture System diagram, data flow, multi-project model
Tracer Bullets All 7 end-to-end test paths with entry/exit criteria
Edge Cases — Pass 1 25 failure modes: races, crashes, security
Edge Cases — Pass 2 16 design gaps: context scaling, backpressure
Scoring Rubric 7-dimension tool evaluation matrix
Test Repos Validation targets and pass criteria
Network Requirements External APIs, ports, degradation behavior
Ambient Layer Plan Full daemon design spec (~1,200 lines)
Layer Docs Per-layer design intent (7 docs, ~1000 lines)
ADRs Architecture decision records

License

MIT


Diagrams rendered with beautiful-mermaid (github-dark theme).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors