Skip to content

feat(runner): add git safety guardrails to system prompt#1360

Merged
jeremyeder merged 7 commits intoambient-code:mainfrom
jeremyeder:fix/git-safety-guardrails
Apr 20, 2026
Merged

feat(runner): add git safety guardrails to system prompt#1360
jeremyeder merged 7 commits intoambient-code:mainfrom
jeremyeder:fix/git-safety-guardrails

Conversation

@jeremyeder
Copy link
Copy Markdown
Contributor

@jeremyeder jeremyeder commented Apr 20, 2026

Summary

Closes #1111
Supersedes #1225

Test plan

  • 21 runner tests pass (uv run pytest tests/test_auto_push.py)
  • Guardrails included when repos present
  • Guardrails excluded when no repos

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Workspace prompts now include Git Safety instructions when repository configuration is present, warning against embedding tokens, escalating access, forced pushes, deleting remote refs, and pushing to protected branches.
  • Tests

    • Added unit tests confirming Git Safety instructions appear only when repositories are configured.

Inject concise git safety rules into the session system prompt when repos
are configured. Covers force push, ref deletion, main branch protection,
destructive operations, and token exposure. Replaces the over-engineered
approach from ambient-code#1225 (307-line regex module + 344-line test suite) with
15 lines of prompt text and 2 tests.

Closes ambient-code#1111

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 20, 2026

Deploy Preview for cheerful-kitten-f556a0 canceled.

Name Link
🔨 Latest commit 5522831
🔍 Latest deploy log https://app.netlify.com/projects/cheerful-kitten-f556a0/deploys/69e664ae6389ab0007f44228

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 0875e209-cdc8-4875-abef-9c8f7864d060

📥 Commits

Reviewing files that changed from the base of the PR and between f7ee8b4 and 3305132.

📒 Files selected for processing (2)
  • components/runners/ambient-runner/ambient_runner/platform/prompts.py
  • components/runners/ambient-runner/tests/test_auto_push.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • components/runners/ambient-runner/tests/test_auto_push.py
  • components/runners/ambient-runner/ambient_runner/platform/prompts.py

📝 Walkthrough

Walkthrough

Added a module-level GIT_SAFETY_INSTRUCTIONS prompt fragment and updated build_workspace_context_prompt(...) to append it when repos_cfg is non-empty, inserting Git safety guardrails into the generated workspace context prompt.

Changes

Cohort / File(s) Summary
Git Safety Guardrails
components/runners/ambient-runner/ambient_runner/platform/prompts.py
Added GIT_SAFETY_INSTRUCTIONS constant and updated build_workspace_context_prompt(...) (no signature change) so the safety text is appended whenever repos_cfg is present.
Safety Guardrails Tests
components/runners/ambient-runner/tests/test_auto_push.py
Added tests verifying that when repos_cfg contains repos the generated prompt includes a "Git Safety" section (asserts presence of phrases like "NEVER embed tokens" and "Do NOT autonomously escalate"), and that the section is absent when repos_cfg is empty.
🚥 Pre-merge checks | ✅ 7 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning PR addresses core requirements from #1111: prevents token exposure and autonomous escalation via prompt guardrails. However, some requirements lack full implementation: no consent gates, no backup branch creation, no explicit blocking of API ref deletion. Consider adding consent gates at escalation boundaries and explicit checks to prevent DELETE ref operations and direct API manipulation to fully address #1111 requirements.
✅ Passed checks (7 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title follows Conventional Commits format (feat(runner): ...) and clearly describes the main change of adding git safety guardrails to the system prompt.
Out of Scope Changes check ✅ Passed Changes are focused and in-scope: adds GIT_SAFETY_INSTRUCTIONS constant and updates build_workspace_context_prompt to include safety rules when repos are configured, directly supporting #1111 mitigation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Performance And Algorithmic Complexity ✅ Passed PR introduces only O(1) conditional string append to existing prompt builder; no algorithmic complexity regressions or performance overhead detected.
Security And Secret Handling ✅ Passed PR introduces security-positive changes with no secrets, tokens, or sensitive data exposed. GIT_SAFETY_INSTRUCTIONS constant addresses violations in issue #1111.
Kubernetes Resource Safety ✅ Passed Custom check for Kubernetes Resource Safety does not apply; PR modifies only Python source code files, not Kubernetes manifests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Comment @coderabbitai help to get the list of available commands and usage tips.

@jeremyeder jeremyeder enabled auto-merge (squash) April 20, 2026 16:27
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/runners/ambient-runner/ambient_runner/platform/prompts.py`:
- Around line 82-87: Update the prompt text that currently contains the guidance
lines starting with "5. **NEVER run destructive operations without a backup**"
and "6. **NEVER embed tokens in commands**" to require explicit user
confirmation before performing destructive local git commands (`git reset
--hard`, `git clean -fd`, `git checkout -- .`): add a clear step requiring the
operator to request and receive a literal, explicit confirmation from the user
(e.g., user must type "I CONFIRM" or similar) before the runner will output or
execute any destructive command, and include the requirement that a named backup
branch be created first; modify the prompt string in
components/runners/ambient-runner/ambient_runner/platform/prompts.py so the
guidance enforces an explicit confirmation handshake for the identified
commands.

In `@components/runners/ambient-runner/tests/test_auto_push.py`:
- Around line 320-350: The test only asserts the presence of a single safety
phrase ("NEVER force push") in test_prompt_includes_git_safety_with_repos, which
lets other guardrails regress unnoticed; update the tests that call
build_workspace_context_prompt (e.g., test_prompt_includes_git_safety_with_repos
and test_prompt_excludes_git_safety_without_repos) to assert all expected
safety-critical phrases are present when repos_cfg is non-empty (examples:
"NEVER force push", "DO NOT delete refs", "DO NOT expose API keys", "RESTRICT
tokens", "PROTECT main branch" or whatever exact phrases
build_workspace_context_prompt emits) and assert none of those phrases appear
when repos_cfg is empty; locate the checks by referencing
build_workspace_context_prompt and the two test functions and add multiple
explicit assertions covering each guardrail phrase rather than relying on a
single substring.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 43d59d5b-69a2-49dd-b2a8-3f142359f2bc

📥 Commits

Reviewing files that changed from the base of the PR and between bea2c03 and f7ee8b4.

📒 Files selected for processing (2)
  • components/runners/ambient-runner/ambient_runner/platform/prompts.py
  • components/runners/ambient-runner/tests/test_auto_push.py

Comment on lines +82 to +87
"5. **NEVER run destructive operations without a backup** — before "
"`git reset --hard`, `git clean -fd`, or `git checkout -- .`, "
"create a backup branch first.\n"
"6. **NEVER embed tokens in commands** — use environment variables.\n\n"
"When a git operation fails: stop, diagnose, report, wait for the user.\n\n"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add explicit user-confirmation requirement for destructive local git operations

Line 82–Line 84 requires backup branches, but still permits destructive commands without explicit consent. Given the incident goals, this should require confirmation before reset --hard / clean -fd / checkout -- ..

Suggested prompt update
 GIT_SAFETY_INSTRUCTIONS = (
@@
-    "5. **NEVER run destructive operations without a backup** — before "
-    "`git reset --hard`, `git clean -fd`, or `git checkout -- .`, "
-    "create a backup branch first.\n"
+    "5. **NEVER run destructive operations without explicit user approval** — "
+    "before `git reset --hard`, `git clean -fd`, or `git checkout -- .`, "
+    "ask the user for confirmation and create a backup branch first.\n"
@@
-    "When a git operation fails: stop, diagnose, report, wait for the user.\n\n"
+    "When a git operation fails: stop, diagnose, report, wait for the user.\n\n"
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"5. **NEVER run destructive operations without a backup** — before "
"`git reset --hard`, `git clean -fd`, or `git checkout -- .`, "
"create a backup branch first.\n"
"6. **NEVER embed tokens in commands** — use environment variables.\n\n"
"When a git operation fails: stop, diagnose, report, wait for the user.\n\n"
)
"5. **NEVER run destructive operations without explicit user approval** — "
"before `git reset --hard`, `git clean -fd`, or `git checkout -- .`, "
"ask the user for confirmation and create a backup branch first.\n"
"6. **NEVER embed tokens in commands** — use environment variables.\n\n"
"When a git operation fails: stop, diagnose, report, wait for the user.\n\n"
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/runners/ambient-runner/ambient_runner/platform/prompts.py` around
lines 82 - 87, Update the prompt text that currently contains the guidance lines
starting with "5. **NEVER run destructive operations without a backup**" and "6.
**NEVER embed tokens in commands**" to require explicit user confirmation before
performing destructive local git commands (`git reset --hard`, `git clean -fd`,
`git checkout -- .`): add a clear step requiring the operator to request and
receive a literal, explicit confirmation from the user (e.g., user must type "I
CONFIRM" or similar) before the runner will output or execute any destructive
command, and include the requirement that a named backup branch be created
first; modify the prompt string in
components/runners/ambient-runner/ambient_runner/platform/prompts.py so the
guidance enforces an explicit confirmation handshake for the identified
commands.

Comment on lines +320 to +350
def test_prompt_includes_git_safety_with_repos(self):
"""Git safety guardrails are included when repos are present."""
repos_cfg = [
{
"name": "my-repo",
"url": "https://github.com/owner/my-repo.git",
"branch": "main",
"autoPush": False,
}
]
prompt = build_workspace_context_prompt(
repos_cfg=repos_cfg,
workflow_name=None,
artifacts_path="artifacts",
ambient_config={},
workspace_path="/workspace",
)
assert "Git Safety Guardrails" in prompt
assert "NEVER force push" in prompt

def test_prompt_excludes_git_safety_without_repos(self):
"""Git safety guardrails are excluded when no repos are present."""
prompt = build_workspace_context_prompt(
repos_cfg=[],
workflow_name=None,
artifacts_path="artifacts",
ambient_config={},
workspace_path="/workspace",
)
assert "Git Safety Guardrails" not in prompt

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Strengthen guardrail assertions to cover all safety-critical rules

Line 337 and Line 338 only lock one rule (NEVER force push). This can miss regressions in ref-deletion/API-ref/token/main-branch protections while tests still pass.

Suggested test hardening
     def test_prompt_includes_git_safety_with_repos(self):
         """Git safety guardrails are included when repos are present."""
@@
         assert "Git Safety Guardrails" in prompt
         assert "NEVER force push" in prompt
+        assert "NEVER delete remote branches or refs" in prompt
+        assert "NEVER manipulate git refs via the GitHub/GitLab REST API" in prompt
+        assert "NEVER push to main/master" in prompt
+        assert "NEVER run destructive operations without a backup" in prompt
+        assert "NEVER embed tokens in commands" in prompt
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/runners/ambient-runner/tests/test_auto_push.py` around lines 320 -
350, The test only asserts the presence of a single safety phrase ("NEVER force
push") in test_prompt_includes_git_safety_with_repos, which lets other
guardrails regress unnoticed; update the tests that call
build_workspace_context_prompt (e.g., test_prompt_includes_git_safety_with_repos
and test_prompt_excludes_git_safety_without_repos) to assert all expected
safety-critical phrases are present when repos_cfg is non-empty (examples:
"NEVER force push", "DO NOT delete refs", "DO NOT expose API keys", "RESTRICT
tokens", "PROTECT main branch" or whatever exact phrases
build_workspace_context_prompt emits) and assert none of those phrases appear
when repos_cfg is empty; locate the checks by referencing
build_workspace_context_prompt and the two test functions and add multiple
explicit assertions covering each guardrail phrase rather than relying on a
single substring.

mergify Bot and others added 5 commits April 20, 2026 17:00
Keep only token redaction and escalation protocol — these are
universally correct. Remove opinionated rules (force push policy,
backup branches, ref deletion) that should be opt-in per-project.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jeremyeder jeremyeder merged commit e89ed9d into ambient-code:main Apr 20, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Amber Refactor] Fix Agents destroying workspaces

1 participant