Production-quality gates for Claude Code. Install once, never ship sloppy AI-written code again.
prod-gates is a Stop-hook-based quality system for Claude Code. It blocks a turn from completing until the diff has passed five gates — shape, simplicity, readability, security, and coverage — enforced by a mix of a review subagent and mechanical checks (typecheck, 100% test coverage, optional mutation testing).
The goal is simple: AI-written code ships at the same bar as human-written production code, or it doesn't ship.
| # | Gate | Enforced by | What it catches |
|---|---|---|---|
| 0 | Shape | prod-reviewer subagent |
Exploratory-coding scars — scattered edits, unused helpers, naming drift, layered conditionals, dead code, narrating comments. Forces a clean-slate rewrite when the first pass is visibly discovery-shaped. |
| 1 | Simplicity | prod-reviewer subagent |
Speculative abstraction, dead code, premature helpers, backwards-compat shims for hypothetical callers. |
| 2 | Readability | prod-reviewer subagent |
Names that don't carry meaning, narration comments, multi-paragraph docstrings. |
| 3 | Security | prod-reviewer subagent |
Injection vectors, secret leaks, unsafe deserialization, missing boundary validation, unbounded resources. |
| 4 | Coverage | Stop hook + subagent | <100% line coverage (including trivial accessors); weak-assertion anti-patterns (return-shape-only, unverified mocks, expect(true).toBe(true)); invented mock payloads with no recorded fixture. |
Two mechanical gates run alongside:
- Typecheck / build — catches fabricated packages, missing exports, and wrong call signatures. Auto-detects
tsconfig.json,go.mod,Cargo.toml, andpyproject.toml. - Mutation testing (opt-in) — kills the weak-assertion problem that line coverage hides. Enables automatically when a mutation tool is configured in the repo (
stryker.conf.*,[tool.mutmut],cargo-mutants,go-mutesting).
When any gate fails, the hook emits a structured block decision back to Claude with every failing gate listed. Claude cannot declare the task done until every gate is green.
git clone https://github.com/YOUR_ORG/prod-gates.git
cd prod-gates
./install.shThat's it. install.sh:
- Copies the subagent to
~/.claude/agents/prod-reviewer.md. - Copies the hook scripts to
~/.claude/hooks/. - Registers the Stop hook in
~/.claude/settings.json(idempotently, with a timestamped backup of any prior file).
Works everywhere Claude Code runs — the CLI, the VS Code extension, and the Claude desktop app Code tab all read the same ~/.claude/settings.json, so one install covers all three. Restart any open sessions to pick up the hook.
Uninstall with ./uninstall.sh, which removes the files and cleanly deregisters the hook while preserving the rest of your settings.
The prod-reviewer subagent reviews the current git diff HEAD and — only if gates 0 through 4 all pass — writes a SHA-1 of the diff to /tmp/claude-prod-blessing. The Stop hook recomputes that hash and compares:
- Hash matches → judgment gates are considered green; proceed to mechanical gates.
- Hash missing or stale → block with "invoke the
prod-reviewersubagent."
Both sides compute the hash with the same helper script (hooks/diff-hash.sh), so the blessing is cryptographically bound to the exact diff. You literally cannot edit code after approval without invalidating the blessing.
Drop a .prod-gates.sh file in a repo's root to override the hook's auto-detection. It is sourced by the hook, so it can set shell variables:
# .prod-gates.sh
TYPECHECK_CMD="pnpm typecheck"
COVERAGE_CMD="pnpm test -- --coverage --coverageThreshold='{\"global\":{\"lines\":100,\"branches\":100,\"functions\":100,\"statements\":100}}'"
MUTATION_CMD="pnpm stryker run"
# Or, to opt out entirely for this repo:
# DISABLE_PROD_GATES=1Use cases:
- Monorepos whose root has no
package.json/pyproject.toml— point each command at the workspace's tool. - Projects with custom test runners (vitest, pytest + custom plugins, mocha + nyc, etc.) where the auto-detected command isn't right.
- Escape hatch — set
DISABLE_PROD_GATES=1for experiments, prototypes, or docs-only repos where the gates don't fit.
If no .prod-gates.sh is present and the repo doesn't have a recognized manifest, the hook blocks with a message asking you to add one or set COVERAGE_CMD.
Line coverage is a weak signal for AI-generated tests — published benchmarks show ~30–40% mutation scores at 90%+ line coverage because the tests assert return shape but not behavior. Mutation testing closes that gap.
JavaScript / TypeScript (StrykerJS):
npm install --save-dev @stryker-mutator/core @stryker-mutator/jest-runner
npx stryker initEdit the generated stryker.conf.json to set a thresholds.break of 60 or higher.
Python (mutmut):
pip install mutmutAdd to pyproject.toml:
[tool.mutmut]
paths_to_mutate = ["src/"]Rust (cargo-mutants): cargo install cargo-mutants
Go (go-mutesting): go install github.com/zimmski/go-mutesting/cmd/go-mutesting@latest
Once any of these is configured, the hook picks it up automatically on the next run.
- Fabricated packages. Claude imports a plausible-sounding library that doesn't exist. Typecheck/build fails immediately.
- Fabricated API methods. Claude calls a method on a real library that doesn't exist. Typecheck catches it; failing integration tests catch the untyped cases.
- Invented API responses. Test file contains a mock response Claude guessed at. Shape/coverage gate rejects unless there's a recorded real fixture.
- Weak tests at 100% coverage. Tests assert
toBeDefined()without checking values, or never verify mocks were called. Mutation testing kills survivors; the subagent rejects the patterns directly. - Exploratory-shaped diffs. A working-but-ugly first pass gets sent back with "rewrite from scratch" before anyone wastes time polishing the wrong shape.
- "Done" prematurely. Claude cannot end a turn while a gate is red. The Stop hook blocks with the specific list of failures.
./tests/run-tests.shThe test suite builds hermetic temp git repos, stubs the typecheck / coverage / mutation commands via .prod-gates.sh, and verifies every branch of the hook logic — blessing flow, per-repo config, parser edge cases, install idempotency, and uninstall cleanup. No external tools (bun, pytest, cargo-tarpaulin, …) are required to run the tests themselves.
If you're modifying prod-gate.sh or diff-hash.sh, run the suite first and after. Any test failure is a real regression; there are no flaky cases.
- Bash coverage of the hook itself is not measured (measuring bash coverage requires the
bashcovRuby gem, a heavy dep). Instead, every branch is covered by an explicit smoke test intests/. A future rewrite in Python would let the hook be held to its own 100% line-coverage rule. - Monorepos without a root manifest require a
.prod-gates.shfile to tell the hook which commands to run. The hook does not recurse looking for manifests. - Network-dependent tools — if your integration tests hit real services, the hook runs them on every Stop. On a repo with a slow suite, that drags. Consider splitting tests into a fast "gate" subset and a slow "CI only" subset via
COVERAGE_CMD. - Self-review limitation — the same model writes and reviews the code. Gate 0 (shape) is where same-model review is weakest. A future option: route shape review to a second model.
MIT. See LICENSE.