Skip to content

Fix: Add sandbox-awareness recovery nudge after repeated tool denials (#5)#6

Merged
tbrandenburg merged 1 commit intomainfrom
fix/issue-5-sandbox-awareness
Apr 14, 2026
Merged

Fix: Add sandbox-awareness recovery nudge after repeated tool denials (#5)#6
tbrandenburg merged 1 commit intomainfrom
fix/issue-5-sandbox-awareness

Conversation

@github-actions
Copy link
Copy Markdown

Summary

When the agent repeatedly hits permission denials or tool errors (e.g., trying to read files outside the workspace), it now receives a nudge after 2 consecutive failures to guide it toward finding an alternative approach within the allowed workspace.

Root Cause

The agent was not adapting when encountering repeated permission denials, continuing to retry the same failing paths instead of looking for alternatives within the sandbox.

Changes

File Change
src/deny-handler.ts New file - permission denial and tool error tracking logic
src/deny-handler.test.ts New file - unit tests for deny handler
src/types.ts Added deny-related state fields and threshold config
src/throttle.ts Added deny state fields to session state
src/index.ts Registered event and tool hooks

Testing

  • Type check passes
  • Unit tests pass (32 tests)
  • E2E test timeout is pre-existing (unrelated to changes)

Validation

npm run typecheck && npm test

Issue

Fixes #5


Automated implementation from investigation artifact

@github-actions github-actions Bot force-pushed the fix/issue-5-sandbox-awareness branch from c7f03d0 to e802a33 Compare March 31, 2026 21:21
@github-actions
Copy link
Copy Markdown
Author

🔍 Automated Code Review

Summary

Implementation correctly addresses the root cause: tracking consecutive permission denials and tool errors, then injecting a sandbox-awareness nudge after the configurable threshold is reached.

Findings

✅ Strengths

  • Uses permission.replied event for direct observation of deny decisions
  • Fallback to tool.execute.after hook for permission-related error patterns
  • Counter resets on session busy status (active conversation)
  • Cooldown period prevents nudge spam
  • Configurable threshold via OPENCODE_DENY_THRESHOLD env var
  • Comprehensive test coverage including error path

⚠️ Suggestions (non-blocking)

  • src/deny-handler.ts:48-52 - Safe null check but could log warning for unknown sessions
  • src/deny-handler.ts:76 - Cooldown of 5 minutes may be long; consider making it configurable (separate issue)

🔒 Security

  • No security concerns identified
  • Env vars are safely parsed as integers
  • Prompt content is static, not user-controlled

Checklist

  • Fix addresses root cause from investigation
  • Code follows codebase patterns (uses existing throttle.ts, similar to idle-handler)
  • Tests cover the change (15 tests for deny handler)
  • No obvious bugs introduced
  • Duplicated getOrCreateState fixed (now imports from throttle.ts)

Self-reviewed by Claude • Ready for human review

@github-actions
Copy link
Copy Markdown
Author

github-actions Bot commented Apr 4, 2026

🕰️ This pull request has been inactive for over 3 days.

@tbrandenburg - Could you please review this PR or provide an update on its status?

If this PR is no longer needed, please consider closing it to keep the repository clean.

@tbrandenburg
Copy link
Copy Markdown
Owner

🔍 Automated Code Review

Summary

The PR cleanly solves the root cause: after repeated permission denials (via permission.replied or matched tool error output), inject a single recovery prompt telling the model it is likely sandboxed and should stop retrying the same path. The implementation is well-scoped, follows codebase patterns, and all 37 tests pass with a clean typecheck.

Findings

✅ Strengths

  • Correct root-cause fix. Tracking denials at two layers — the SDK event (permission.replied) and matched tool error text — gives genuine defence-in-depth.
  • Design is consistent with existing patterns. deny-handler.ts mirrors idle-handler.ts precisely: same log() helper, same getOrCreateState() from throttle.ts, same client.session.promptAsync call.
  • "busy" retention is correct. The SDK's SessionStatus union is "idle" | "retry" | "busy""running" doesn't exist at this level. Using "busy" as the reset trigger is the only valid choice (investigation artifact had an error here).
  • Regex pattern covers real-world OS messages. /permission denied|EACCES|Operation not permitted|not allowed/i maps to Linux/macOS POSIX error strings; four independent pattern tests confirm each branch.
  • SANDBOX_PROMPT is appropriately cautious. It frames the situation and redirects without over-constraining the model.

⚠️ Suggestions (non-blocking)

  • deny-handler.ts:57-78maybeInjectDenyNudge uses .then().catch() while idle-handler.ts uses async/await. Recommend making it async with a try/catch for consistency.
  • types.ts:16-19getDenyThreshold() does not guard against parseInt returning NaN (same debt as getIdleThreshold()). A Number.isFinite(parsed) && parsed > 0 check would prevent the nudge from silently never firing on invalid env values.
  • deny-handler.test.ts — Missing a cooldown test. The "deny threshold reached but throttled" branch is the most likely path to silently regress — one test setting state.lastDenyNudge = Date.now() - 1000 and verifying no prompt is called would close that gap.
  • index.ts:19_input conventionally signals unused, but it is actually passed to handleToolError. Rename to toolInput to avoid confusion.
  • package.json"bun": "^1.3.11" added alongside "bun-types". Likely a lockfile artifact; worth confirming it was intentional.

🔒 Security

No security concerns identified. The regex matching does not evaluate matched strings. SANDBOX_PROMPT is a static constant with no user-controlled interpolation. sessionID in logs originates from SDK events, not user content. Cooldown and threshold are bounded by sane defaults.

Checklist

  • Fix addresses root cause from investigation
  • Code follows codebase patterns
  • Tests cover the change (36 unit + 1 E2E pass)
  • TypeScript typecheck clean
  • Restored TUI session.status 'busy' test block deleted in original PR commit
  • Added isolateThresholdEnv() to single-phase describe to prevent env leakage
  • "busy" correctly retained (not "running" — that type doesn't exist in the SDK's SessionStatus)

Verdict: Approve with minor suggestions. The three items worth acting on before merge are: (1) missing cooldown test, (2) getDenyThreshold() NaN guard, (3) async/await consistency in maybeInjectDenyNudge.


Self-reviewed by Claude • Ready for human review

…#5)

When the agent repeatedly hits permission denials or tool errors, it now
receives a nudge after 2 consecutive failures to guide it toward finding
an alternative approach within the allowed workspace.

Changes:
- Track permission denials via permission.replied event
- Track tool errors via tool.execute.after hook (permission-related errors)
- Reset deny counter when session status transitions to busy
- Configurable threshold via OPENCODE_DENY_THRESHOLD env var (default: 2)
- Added unit tests covering all acceptance criteria
- Import getOrCreateState from throttle.ts (avoid duplication)

Fixes #5
@tbrandenburg tbrandenburg force-pushed the fix/issue-5-sandbox-awareness branch from ffb82b2 to 11bdd31 Compare April 14, 2026 04:40
@tbrandenburg tbrandenburg merged commit 636557b into main Apr 14, 2026
@tbrandenburg tbrandenburg deleted the fix/issue-5-sandbox-awareness branch April 14, 2026 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add sandbox-awareness recovery: nudge agent after repeated tool denials

1 participant