Skip to content

toddegray/prod-gates

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

prod-gates

Production-quality gates for Claude Code. Install once, never ship sloppy AI-written code again.

prod-gates is a Stop-hook-based quality system for Claude Code. It blocks a turn from completing until the diff has passed five gates — shape, simplicity, readability, security, and coverage — enforced by a mix of a review subagent and mechanical checks (typecheck, 100% test coverage, optional mutation testing).

The goal is simple: AI-written code ships at the same bar as human-written production code, or it doesn't ship.


The five gates

# Gate Enforced by What it catches
0 Shape prod-reviewer subagent Exploratory-coding scars — scattered edits, unused helpers, naming drift, layered conditionals, dead code, narrating comments. Forces a clean-slate rewrite when the first pass is visibly discovery-shaped.
1 Simplicity prod-reviewer subagent Speculative abstraction, dead code, premature helpers, backwards-compat shims for hypothetical callers.
2 Readability prod-reviewer subagent Names that don't carry meaning, narration comments, multi-paragraph docstrings.
3 Security prod-reviewer subagent Injection vectors, secret leaks, unsafe deserialization, missing boundary validation, unbounded resources.
4 Coverage Stop hook + subagent <100% line coverage (including trivial accessors); weak-assertion anti-patterns (return-shape-only, unverified mocks, expect(true).toBe(true)); invented mock payloads with no recorded fixture.

Two mechanical gates run alongside:

  • Typecheck / build — catches fabricated packages, missing exports, and wrong call signatures. Auto-detects tsconfig.json, go.mod, Cargo.toml, and pyproject.toml.
  • Mutation testing (opt-in) — kills the weak-assertion problem that line coverage hides. Enables automatically when a mutation tool is configured in the repo (stryker.conf.*, [tool.mutmut], cargo-mutants, go-mutesting).

When any gate fails, the hook emits a structured block decision back to Claude with every failing gate listed. Claude cannot declare the task done until every gate is green.


Install

git clone https://github.com/YOUR_ORG/prod-gates.git
cd prod-gates
./install.sh

That's it. install.sh:

  • Copies the subagent to ~/.claude/agents/prod-reviewer.md.
  • Copies the hook scripts to ~/.claude/hooks/.
  • Registers the Stop hook in ~/.claude/settings.json (idempotently, with a timestamped backup of any prior file).

Works everywhere Claude Code runs — the CLI, the VS Code extension, and the Claude desktop app Code tab all read the same ~/.claude/settings.json, so one install covers all three. Restart any open sessions to pick up the hook.

Uninstall with ./uninstall.sh, which removes the files and cleanly deregisters the hook while preserving the rest of your settings.


How the blessing works

The prod-reviewer subagent reviews the current git diff HEAD and — only if gates 0 through 4 all pass — writes a SHA-1 of the diff to /tmp/claude-prod-blessing. The Stop hook recomputes that hash and compares:

  • Hash matches → judgment gates are considered green; proceed to mechanical gates.
  • Hash missing or stale → block with "invoke the prod-reviewer subagent."

Both sides compute the hash with the same helper script (hooks/diff-hash.sh), so the blessing is cryptographically bound to the exact diff. You literally cannot edit code after approval without invalidating the blessing.


Per-repo configuration

Drop a .prod-gates.sh file in a repo's root to override the hook's auto-detection. It is sourced by the hook, so it can set shell variables:

# .prod-gates.sh
TYPECHECK_CMD="pnpm typecheck"
COVERAGE_CMD="pnpm test -- --coverage --coverageThreshold='{\"global\":{\"lines\":100,\"branches\":100,\"functions\":100,\"statements\":100}}'"
MUTATION_CMD="pnpm stryker run"

# Or, to opt out entirely for this repo:
# DISABLE_PROD_GATES=1

Use cases:

  • Monorepos whose root has no package.json / pyproject.toml — point each command at the workspace's tool.
  • Projects with custom test runners (vitest, pytest + custom plugins, mocha + nyc, etc.) where the auto-detected command isn't right.
  • Escape hatch — set DISABLE_PROD_GATES=1 for experiments, prototypes, or docs-only repos where the gates don't fit.

If no .prod-gates.sh is present and the repo doesn't have a recognized manifest, the hook blocks with a message asking you to add one or set COVERAGE_CMD.


Configuring mutation testing (recommended for any JS/TS or Python project)

Line coverage is a weak signal for AI-generated tests — published benchmarks show ~30–40% mutation scores at 90%+ line coverage because the tests assert return shape but not behavior. Mutation testing closes that gap.

JavaScript / TypeScript (StrykerJS):

npm install --save-dev @stryker-mutator/core @stryker-mutator/jest-runner
npx stryker init

Edit the generated stryker.conf.json to set a thresholds.break of 60 or higher.

Python (mutmut):

pip install mutmut

Add to pyproject.toml:

[tool.mutmut]
paths_to_mutate = ["src/"]

Rust (cargo-mutants): cargo install cargo-mutants Go (go-mutesting): go install github.com/zimmski/go-mutesting/cmd/go-mutesting@latest

Once any of these is configured, the hook picks it up automatically on the next run.


What it catches in practice

  • Fabricated packages. Claude imports a plausible-sounding library that doesn't exist. Typecheck/build fails immediately.
  • Fabricated API methods. Claude calls a method on a real library that doesn't exist. Typecheck catches it; failing integration tests catch the untyped cases.
  • Invented API responses. Test file contains a mock response Claude guessed at. Shape/coverage gate rejects unless there's a recorded real fixture.
  • Weak tests at 100% coverage. Tests assert toBeDefined() without checking values, or never verify mocks were called. Mutation testing kills survivors; the subagent rejects the patterns directly.
  • Exploratory-shaped diffs. A working-but-ugly first pass gets sent back with "rewrite from scratch" before anyone wastes time polishing the wrong shape.
  • "Done" prematurely. Claude cannot end a turn while a gate is red. The Stop hook blocks with the specific list of failures.

Development and testing

./tests/run-tests.sh

The test suite builds hermetic temp git repos, stubs the typecheck / coverage / mutation commands via .prod-gates.sh, and verifies every branch of the hook logic — blessing flow, per-repo config, parser edge cases, install idempotency, and uninstall cleanup. No external tools (bun, pytest, cargo-tarpaulin, …) are required to run the tests themselves.

If you're modifying prod-gate.sh or diff-hash.sh, run the suite first and after. Any test failure is a real regression; there are no flaky cases.


Known limits

  • Bash coverage of the hook itself is not measured (measuring bash coverage requires the bashcov Ruby gem, a heavy dep). Instead, every branch is covered by an explicit smoke test in tests/. A future rewrite in Python would let the hook be held to its own 100% line-coverage rule.
  • Monorepos without a root manifest require a .prod-gates.sh file to tell the hook which commands to run. The hook does not recurse looking for manifests.
  • Network-dependent tools — if your integration tests hit real services, the hook runs them on every Stop. On a repo with a slow suite, that drags. Consider splitting tests into a fast "gate" subset and a slow "CI only" subset via COVERAGE_CMD.
  • Self-review limitation — the same model writes and reviews the code. Gate 0 (shape) is where same-model review is weakest. A future option: route shape review to a second model.

License

MIT. See LICENSE.

About

Production-quality gates for Claude Code: shape, simplicity, readability, security, and 100% coverage — enforced via Stop hook + prod-reviewer subagent.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages