Skip to content

roboticforce/agent-guardrails

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agent Guardrails

Hard policy enforcement for AI coding agents. Prevent destructive commands from ever being executed - regardless of what the agent proposes or what a human approves in the moment.

Built for Claude Code, with patterns applicable to any AI agent that executes shell commands.

Why this exists

AI coding agents are powerful but dangerous when given access to infrastructure tooling. A single terraform destroy or DROP DATABASE can wipe production in seconds. Human-in-the-loop approval is not sufficient - people approve destructive commands when they don't fully understand the scope.

This repo provides ready-to-use guardrails that hard-block dangerous commands at the tool level, before they ever reach a shell.

Prerequisites

  • Claude Code installed
  • jq installed (used by guard scripts to parse tool input)

Install

Clone the repo:

git clone https://github.com/roboticforce/agent-guardrails.git
cd agent-guardrails

Then open Claude Code and paste the following prompt:

Global install (all projects)

Install agent-guardrails globally from ./agent-guardrails (update the path to
wherever you cloned it). Read the settings.json, hooks.json, and all scripts
from the claude-code/ directory in the repo.

For settings.json: merge the permissions.deny array into my existing
~/.claude/settings.json without removing or overwriting any of my other
settings. If I don't have a settings.json yet, create one with just the
permissions block.

For hooks.json: copy to ~/.claude/hooks.json. Update all hook paths to use
absolute paths with $HOME/.claude/scripts/ so they work globally across all
projects. If I already have a hooks.json, merge the PreToolUse hooks into
my existing array.

For scripts: copy the scripts/ directory to ~/.claude/scripts/ and make
all .sh files executable.

After installing, verify by dry-running one guard script, e.g.:
echo '{"tool_input":{"command":"terraform destroy"}}' | ~/.claude/scripts/terraform-guard.sh

Per-project install

Install agent-guardrails into this project from ~/dev/agent-guardrails (update
the path to wherever you cloned it). Read the settings.json, hooks.json, and
all scripts from the claude-code/ directory in the repo.

Copy settings.json to .claude/settings.json in this project.
Copy hooks.json to .claude/hooks.json in this project.
Copy the scripts/ directory to .claude/scripts/ in this project.
Make all .sh files executable.
Add .claude/scripts/ to .gitignore if it's not already there.

After installing, verify by dry-running one guard script, e.g.:
echo '{"tool_input":{"command":"terraform destroy"}}' | .claude/scripts/terraform-guard.sh

Manual install (without Claude Code)

Click to expand

Per-project:

cp -r claude-code/.claude /path/to/your/project/
chmod +x /path/to/your/project/.claude/scripts/*.sh

Global:

Warning: The cp commands below will overwrite existing files. If you already have a ~/.claude/settings.json or ~/.claude/hooks.json, back them up first and manually merge the JSON after copying.

# Back up existing config
cp ~/.claude/settings.json ~/.claude/settings.json.bak 2>/dev/null
cp ~/.claude/hooks.json ~/.claude/hooks.json.bak 2>/dev/null

# Copy files
cp claude-code/settings.json ~/.claude/settings.json
cp claude-code/hooks.json ~/.claude/hooks.json
cp -r claude-code/scripts ~/.claude/scripts
chmod +x ~/.claude/scripts/*.sh

# IMPORTANT: For global install, update hook paths in ~/.claude/hooks.json
# to use absolute paths. Replace all occurrences of:
#   .claude/scripts/
# with:
#   ~/.claude/scripts/

What's included

Deny rules (settings.json)

Pattern-based blocklist that prevents Claude Code from executing matching commands. These are hard blocks - the agent cannot execute them.

Hook scripts

Bash scripts that run before every command execution. They inspect the command and block anything matching dangerous patterns. Exit code 2 = blocked with a message shown to the user.

Script What it blocks
terraform-guard.sh destroy, apply -auto-approve, state manipulation
database-guard.sh DROP DATABASE, DROP TABLE, TRUNCATE, destructive migrations
kubernetes-guard.sh delete namespace, delete -f, drain --force
cloud-guard.sh Instance termination, resource deletion across AWS/GCP/Azure/DO
git-guard.sh Force push to main/master/production, reset --hard, branch deletion

Policies

Documentation explaining what each guardrail protects against and why. Link these in onboarding docs so your team understands the reasoning.

How it works

When Claude Code wants to run a shell command, it doesn't go straight to your terminal. The command passes through multiple enforcement layers before it can execute. If any layer rejects it, the command is killed and never reaches a shell.

flowchart TD
    A["Claude Code proposes a command
    e.g. terraform destroy"] --> B{"Layer 1: Deny Rules
    settings.json"}
    B -->|"Pattern matches
    deny list"| BLOCKED1["BLOCKED
    Command never executes"]
    B -->|"No match"| C{"Layer 2: PreToolUse Hooks
    hooks.json"}
    C -->|"Guard script
    exits with code 2"| BLOCKED2["BLOCKED
    Command never executes"]
    C -->|"Guard script
    exits with code 0"| D{"Layer 3: User Approval
    (if permissions require it)"}
    D -->|"User denies"| BLOCKED3["BLOCKED
    Command never executes"]
    D -->|"User approves"| E["Command executes in shell"]

    style BLOCKED1 fill:#d32f2f,color:#fff
    style BLOCKED2 fill:#d32f2f,color:#fff
    style BLOCKED3 fill:#d32f2f,color:#fff
    style E fill:#388e3c,color:#fff
    style A fill:#1565c0,color:#fff
    style B fill:#f57f17,color:#000
    style C fill:#f57f17,color:#000
    style D fill:#f57f17,color:#000
Loading

Layer 1: Deny rules (settings.json)

Pattern-based blocklist built into Claude Code's permission system. When a command matches a deny pattern, Claude Code refuses to call the tool at all. The agent cannot override this - it's enforced by the runtime, not the LLM.

"Bash(command:terraform destroy*)"
"Bash(command:*DROP DATABASE*)"
"Bash(command:git push --force origin main*)"

Layer 2: PreToolUse hooks (hooks.json)

Bash scripts that run before every Bash tool call. Claude Code pipes the full tool input (as JSON) into the script via stdin. The script inspects the command and decides:

  • Exit 0 - allow the command to proceed
  • Exit 2 - block the command and show the user why

This is more flexible than deny rules - scripts can use regex, check multiple patterns, and provide detailed error messages. Claude Code enforces the exit code. The agent cannot override a hook that returns exit code 2.

Layer 3: CLAUDE.md instructions

Natural language instructions that tell the agent not to run destructive commands. This is the weakest layer - the agent follows instructions but could theoretically be prompted to ignore them. Never rely on this alone.

Why all three?

Layer Mechanism Bypassable from agent?
Deny rules Pattern match on tool calls No
Hooks (exit code 2) Arbitrary script logic No
CLAUDE.md instructions LLM instruction following Theoretically yes

Defense in depth. Deny rules catch exact patterns. Hooks catch variations with regex. Instructions handle everything else. A destructive command has to get past all three layers to execute.

Verifying your install

Test that the guardrails are working:

# Should print "BLOCKED" and exit with code 2
echo '{"tool_input":{"command":"terraform destroy"}}' | ~/.claude/scripts/terraform-guard.sh

# Should print "BLOCKED" and exit with code 2
echo '{"tool_input":{"command":"DROP DATABASE production"}}' | ~/.claude/scripts/database-guard.sh

# Should exit silently with code 0 (allowed)
echo '{"tool_input":{"command":"terraform plan"}}' | ~/.claude/scripts/terraform-guard.sh

Uninstall

Global

rm ~/.claude/hooks.json
rm -rf ~/.claude/scripts/
# Edit ~/.claude/settings.json and remove the "permissions" block

Per-project

rm .claude/hooks.json
rm -rf .claude/scripts/
rm .claude/settings.json

"I need to run a blocked command"

Run it manually in your terminal. These guardrails exist specifically to ensure a human directly executes and confirms destructive operations. The agent can help you prepare the command, review the plan, and explain what it will do - but the final execution should be yours.

There is no bypass mode by design. If an agent can be told to bypass a guardrail, the guardrail doesn't exist.

Customizing

Every team has different infrastructure. Fork this repo and:

  1. Edit settings.json to add/remove deny patterns for your stack
  2. Modify hook scripts to match your specific commands and naming conventions
  3. Add project-specific rules to your CLAUDE.md

Real incidents

See examples/incidents/ for real-world stories of AI agents destroying production infrastructure. Each incident includes what happened, what failed, and what guardrails would have prevented it.

Learn from real failures

Practice responding to production incidents in a safe environment at YouBrokeProd.com. Break things, fix things, get better - without the 3am pages.

License

MIT

About

Hard policy enforcement for AI coding agents. Prevent destructive terraform, database, k8s, and cloud commands.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages