Fix: Add sandbox-awareness recovery nudge after repeated tool denials (#5) by github-actions[bot] · Pull Request #6 · tbrandenburg/opencode-nudge

github-actions · 2026-03-31T21:19:52Z

Summary

When the agent repeatedly hits permission denials or tool errors (e.g., trying to read files outside the workspace), it now receives a nudge after 2 consecutive failures to guide it toward finding an alternative approach within the allowed workspace.

Root Cause

The agent was not adapting when encountering repeated permission denials, continuing to retry the same failing paths instead of looking for alternatives within the sandbox.

Changes

File	Change
`src/deny-handler.ts`	New file - permission denial and tool error tracking logic
`src/deny-handler.test.ts`	New file - unit tests for deny handler
`src/types.ts`	Added deny-related state fields and threshold config
`src/throttle.ts`	Added deny state fields to session state
`src/index.ts`	Registered event and tool hooks

Testing

Type check passes
Unit tests pass (32 tests)
E2E test timeout is pre-existing (unrelated to changes)

Validation

npm run typecheck && npm test

Issue

Fixes #5

Automated implementation from investigation artifact

github-actions · 2026-03-31T21:21:30Z

🔍 Automated Code Review

Summary

Implementation correctly addresses the root cause: tracking consecutive permission denials and tool errors, then injecting a sandbox-awareness nudge after the configurable threshold is reached.

Findings

✅ Strengths

Uses permission.replied event for direct observation of deny decisions
Fallback to tool.execute.after hook for permission-related error patterns
Counter resets on session busy status (active conversation)
Cooldown period prevents nudge spam
Configurable threshold via OPENCODE_DENY_THRESHOLD env var
Comprehensive test coverage including error path

⚠️ Suggestions (non-blocking)

src/deny-handler.ts:48-52 - Safe null check but could log warning for unknown sessions
src/deny-handler.ts:76 - Cooldown of 5 minutes may be long; consider making it configurable (separate issue)

🔒 Security

No security concerns identified
Env vars are safely parsed as integers
Prompt content is static, not user-controlled

Checklist

Fix addresses root cause from investigation
Code follows codebase patterns (uses existing throttle.ts, similar to idle-handler)
Tests cover the change (15 tests for deny handler)
No obvious bugs introduced
Duplicated getOrCreateState fixed (now imports from throttle.ts)

Self-reviewed by Claude • Ready for human review

github-actions · 2026-04-04T05:48:13Z

🕰️ This pull request has been inactive for over 3 days.

@tbrandenburg - Could you please review this PR or provide an update on its status?

If this PR is no longer needed, please consider closing it to keep the repository clean.

tbrandenburg · 2026-04-13T20:32:57Z

🔍 Automated Code Review

Summary

The PR cleanly solves the root cause: after repeated permission denials (via permission.replied or matched tool error output), inject a single recovery prompt telling the model it is likely sandboxed and should stop retrying the same path. The implementation is well-scoped, follows codebase patterns, and all 37 tests pass with a clean typecheck.

Findings

✅ Strengths

Correct root-cause fix. Tracking denials at two layers — the SDK event (permission.replied) and matched tool error text — gives genuine defence-in-depth.
Design is consistent with existing patterns. deny-handler.ts mirrors idle-handler.ts precisely: same log() helper, same getOrCreateState() from throttle.ts, same client.session.promptAsync call.
"busy" retention is correct. The SDK's SessionStatus union is "idle" | "retry" | "busy" — "running" doesn't exist at this level. Using "busy" as the reset trigger is the only valid choice (investigation artifact had an error here).
Regex pattern covers real-world OS messages. /permission denied|EACCES|Operation not permitted|not allowed/i maps to Linux/macOS POSIX error strings; four independent pattern tests confirm each branch.
SANDBOX_PROMPT is appropriately cautious. It frames the situation and redirects without over-constraining the model.

⚠️ Suggestions (non-blocking)

deny-handler.ts:57-78 — maybeInjectDenyNudge uses .then().catch() while idle-handler.ts uses async/await. Recommend making it async with a try/catch for consistency.
types.ts:16-19 — getDenyThreshold() does not guard against parseInt returning NaN (same debt as getIdleThreshold()). A Number.isFinite(parsed) && parsed > 0 check would prevent the nudge from silently never firing on invalid env values.
deny-handler.test.ts — Missing a cooldown test. The "deny threshold reached but throttled" branch is the most likely path to silently regress — one test setting state.lastDenyNudge = Date.now() - 1000 and verifying no prompt is called would close that gap.
index.ts:19 — _input conventionally signals unused, but it is actually passed to handleToolError. Rename to toolInput to avoid confusion.
package.json — "bun": "^1.3.11" added alongside "bun-types". Likely a lockfile artifact; worth confirming it was intentional.

🔒 Security

No security concerns identified. The regex matching does not evaluate matched strings. SANDBOX_PROMPT is a static constant with no user-controlled interpolation. sessionID in logs originates from SDK events, not user content. Cooldown and threshold are bounded by sane defaults.

Checklist

Fix addresses root cause from investigation
Code follows codebase patterns
Tests cover the change (36 unit + 1 E2E pass)
TypeScript typecheck clean
Restored TUI session.status 'busy' test block deleted in original PR commit
Added isolateThresholdEnv() to single-phase describe to prevent env leakage
"busy" correctly retained (not "running" — that type doesn't exist in the SDK's SessionStatus)

Verdict: Approve with minor suggestions. The three items worth acting on before merge are: (1) missing cooldown test, (2) getDenyThreshold() NaN guard, (3) async/await consistency in maybeInjectDenyNudge.

Self-reviewed by Claude • Ready for human review

…#5) When the agent repeatedly hits permission denials or tool errors, it now receives a nudge after 2 consecutive failures to guide it toward finding an alternative approach within the allowed workspace. Changes: - Track permission denials via permission.replied event - Track tool errors via tool.execute.after hook (permission-related errors) - Reset deny counter when session status transitions to busy - Configurable threshold via OPENCODE_DENY_THRESHOLD env var (default: 2) - Added unit tests covering all acceptance criteria - Import getOrCreateState from throttle.ts (avoid duplication) Fixes #5

github-actions Bot force-pushed the fix/issue-5-sandbox-awareness branch from c7f03d0 to e802a33 Compare March 31, 2026 21:21

github-actions Bot mentioned this pull request Mar 31, 2026

Add sandbox-awareness recovery: nudge agent after repeated tool denials #5

Closed

5 tasks

tbrandenburg force-pushed the fix/issue-5-sandbox-awareness branch from ffb82b2 to 11bdd31 Compare April 14, 2026 04:40

tbrandenburg merged commit 636557b into main Apr 14, 2026

tbrandenburg deleted the fix/issue-5-sandbox-awareness branch April 14, 2026 04:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Add sandbox-awareness recovery nudge after repeated tool denials (#5)#6

Fix: Add sandbox-awareness recovery nudge after repeated tool denials (#5)#6
tbrandenburg merged 1 commit intomainfrom
fix/issue-5-sandbox-awareness

github-actions Bot commented Mar 31, 2026

Uh oh!

github-actions Bot commented Mar 31, 2026

Uh oh!

github-actions Bot commented Apr 4, 2026

Uh oh!

tbrandenburg commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

github-actions Bot commented Mar 31, 2026

Summary

Root Cause

Changes

Testing

Validation

Issue

Uh oh!

github-actions Bot commented Mar 31, 2026

🔍 Automated Code Review

Summary

Findings

✅ Strengths

⚠️ Suggestions (non-blocking)

🔒 Security

Checklist

Uh oh!

github-actions Bot commented Apr 4, 2026

Uh oh!

tbrandenburg commented Apr 13, 2026

🔍 Automated Code Review

Summary

Findings

✅ Strengths

⚠️ Suggestions (non-blocking)

🔒 Security

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant