Hard policy enforcement for AI coding agents. Prevent destructive commands from ever being executed - regardless of what the agent proposes or what a human approves in the moment.
Built for Claude Code, with patterns applicable to any AI agent that executes shell commands.
AI coding agents are powerful but dangerous when given access to infrastructure tooling. A single terraform destroy or DROP DATABASE can wipe production in seconds. Human-in-the-loop approval is not sufficient - people approve destructive commands when they don't fully understand the scope.
This repo provides ready-to-use guardrails that hard-block dangerous commands at the tool level, before they ever reach a shell.
- Claude Code installed
- jq installed (used by guard scripts to parse tool input)
Clone the repo:
git clone https://github.com/roboticforce/agent-guardrails.git
cd agent-guardrailsThen open Claude Code and paste the following prompt:
Install agent-guardrails globally from ./agent-guardrails (update the path to
wherever you cloned it). Read the settings.json, hooks.json, and all scripts
from the claude-code/ directory in the repo.
For settings.json: merge the permissions.deny array into my existing
~/.claude/settings.json without removing or overwriting any of my other
settings. If I don't have a settings.json yet, create one with just the
permissions block.
For hooks.json: copy to ~/.claude/hooks.json. Update all hook paths to use
absolute paths with $HOME/.claude/scripts/ so they work globally across all
projects. If I already have a hooks.json, merge the PreToolUse hooks into
my existing array.
For scripts: copy the scripts/ directory to ~/.claude/scripts/ and make
all .sh files executable.
After installing, verify by dry-running one guard script, e.g.:
echo '{"tool_input":{"command":"terraform destroy"}}' | ~/.claude/scripts/terraform-guard.sh
Install agent-guardrails into this project from ~/dev/agent-guardrails (update
the path to wherever you cloned it). Read the settings.json, hooks.json, and
all scripts from the claude-code/ directory in the repo.
Copy settings.json to .claude/settings.json in this project.
Copy hooks.json to .claude/hooks.json in this project.
Copy the scripts/ directory to .claude/scripts/ in this project.
Make all .sh files executable.
Add .claude/scripts/ to .gitignore if it's not already there.
After installing, verify by dry-running one guard script, e.g.:
echo '{"tool_input":{"command":"terraform destroy"}}' | .claude/scripts/terraform-guard.sh
Click to expand
Per-project:
cp -r claude-code/.claude /path/to/your/project/
chmod +x /path/to/your/project/.claude/scripts/*.shGlobal:
Warning: The
cpcommands below will overwrite existing files. If you already have a~/.claude/settings.jsonor~/.claude/hooks.json, back them up first and manually merge the JSON after copying.
# Back up existing config
cp ~/.claude/settings.json ~/.claude/settings.json.bak 2>/dev/null
cp ~/.claude/hooks.json ~/.claude/hooks.json.bak 2>/dev/null
# Copy files
cp claude-code/settings.json ~/.claude/settings.json
cp claude-code/hooks.json ~/.claude/hooks.json
cp -r claude-code/scripts ~/.claude/scripts
chmod +x ~/.claude/scripts/*.sh
# IMPORTANT: For global install, update hook paths in ~/.claude/hooks.json
# to use absolute paths. Replace all occurrences of:
# .claude/scripts/
# with:
# ~/.claude/scripts/Pattern-based blocklist that prevents Claude Code from executing matching commands. These are hard blocks - the agent cannot execute them.
Bash scripts that run before every command execution. They inspect the command and block anything matching dangerous patterns. Exit code 2 = blocked with a message shown to the user.
| Script | What it blocks |
|---|---|
terraform-guard.sh |
destroy, apply -auto-approve, state manipulation |
database-guard.sh |
DROP DATABASE, DROP TABLE, TRUNCATE, destructive migrations |
kubernetes-guard.sh |
delete namespace, delete -f, drain --force |
cloud-guard.sh |
Instance termination, resource deletion across AWS/GCP/Azure/DO |
git-guard.sh |
Force push to main/master/production, reset --hard, branch deletion |
Documentation explaining what each guardrail protects against and why. Link these in onboarding docs so your team understands the reasoning.
When Claude Code wants to run a shell command, it doesn't go straight to your terminal. The command passes through multiple enforcement layers before it can execute. If any layer rejects it, the command is killed and never reaches a shell.
flowchart TD
A["Claude Code proposes a command
e.g. terraform destroy"] --> B{"Layer 1: Deny Rules
settings.json"}
B -->|"Pattern matches
deny list"| BLOCKED1["BLOCKED
Command never executes"]
B -->|"No match"| C{"Layer 2: PreToolUse Hooks
hooks.json"}
C -->|"Guard script
exits with code 2"| BLOCKED2["BLOCKED
Command never executes"]
C -->|"Guard script
exits with code 0"| D{"Layer 3: User Approval
(if permissions require it)"}
D -->|"User denies"| BLOCKED3["BLOCKED
Command never executes"]
D -->|"User approves"| E["Command executes in shell"]
style BLOCKED1 fill:#d32f2f,color:#fff
style BLOCKED2 fill:#d32f2f,color:#fff
style BLOCKED3 fill:#d32f2f,color:#fff
style E fill:#388e3c,color:#fff
style A fill:#1565c0,color:#fff
style B fill:#f57f17,color:#000
style C fill:#f57f17,color:#000
style D fill:#f57f17,color:#000
Pattern-based blocklist built into Claude Code's permission system. When a command matches a deny pattern, Claude Code refuses to call the tool at all. The agent cannot override this - it's enforced by the runtime, not the LLM.
"Bash(command:terraform destroy*)"
"Bash(command:*DROP DATABASE*)"
"Bash(command:git push --force origin main*)"Bash scripts that run before every Bash tool call. Claude Code pipes the full tool input (as JSON) into the script via stdin. The script inspects the command and decides:
- Exit 0 - allow the command to proceed
- Exit 2 - block the command and show the user why
This is more flexible than deny rules - scripts can use regex, check multiple patterns, and provide detailed error messages. Claude Code enforces the exit code. The agent cannot override a hook that returns exit code 2.
Natural language instructions that tell the agent not to run destructive commands. This is the weakest layer - the agent follows instructions but could theoretically be prompted to ignore them. Never rely on this alone.
| Layer | Mechanism | Bypassable from agent? |
|---|---|---|
| Deny rules | Pattern match on tool calls | No |
| Hooks (exit code 2) | Arbitrary script logic | No |
CLAUDE.md instructions |
LLM instruction following | Theoretically yes |
Defense in depth. Deny rules catch exact patterns. Hooks catch variations with regex. Instructions handle everything else. A destructive command has to get past all three layers to execute.
Test that the guardrails are working:
# Should print "BLOCKED" and exit with code 2
echo '{"tool_input":{"command":"terraform destroy"}}' | ~/.claude/scripts/terraform-guard.sh
# Should print "BLOCKED" and exit with code 2
echo '{"tool_input":{"command":"DROP DATABASE production"}}' | ~/.claude/scripts/database-guard.sh
# Should exit silently with code 0 (allowed)
echo '{"tool_input":{"command":"terraform plan"}}' | ~/.claude/scripts/terraform-guard.shrm ~/.claude/hooks.json
rm -rf ~/.claude/scripts/
# Edit ~/.claude/settings.json and remove the "permissions" blockrm .claude/hooks.json
rm -rf .claude/scripts/
rm .claude/settings.jsonRun it manually in your terminal. These guardrails exist specifically to ensure a human directly executes and confirms destructive operations. The agent can help you prepare the command, review the plan, and explain what it will do - but the final execution should be yours.
There is no bypass mode by design. If an agent can be told to bypass a guardrail, the guardrail doesn't exist.
Every team has different infrastructure. Fork this repo and:
- Edit
settings.jsonto add/remove deny patterns for your stack - Modify hook scripts to match your specific commands and naming conventions
- Add project-specific rules to your
CLAUDE.md
See examples/incidents/ for real-world stories of AI agents destroying production infrastructure. Each incident includes what happened, what failed, and what guardrails would have prevented it.
Practice responding to production incidents in a safe environment at YouBrokeProd.com. Break things, fix things, get better - without the 3am pages.
MIT