AI agents fail silently. This skill makes sure they fail loudly, recover smartly, and learn from every mistake.
AI agents are notoriously bad at handling errors:
| Anti-pattern | What happens | Consequence |
|---|---|---|
| 🤫 Silent failure | Error occurs, agent says "Done!" | Broken state, user doesn't know |
| 🔄 Blind retry | Same call, same params, same error | Wasted tokens, rate limits hit |
| 🔥 Error escalation | Wrong fix makes it worse | Data loss, cascading failures |
| 📉 Fake success | "Should work" without verification | Delivery of broken results |
Sound familiar? You ask an AI agent to fix a file, it says "Done!" — but nothing changed.
R1: RECOGNIZE → Did something go wrong? (Stop ignoring errors)
R2: RESCUE → Can I fix it? (Smart recovery, max 3 attempts)
R3: REPORT → Does the human need to know? (Structured error reports)
R4: REMEMBER → What did I learn? (Error journal, pattern detection)
Without Error Recovery:
User: "Push to GitHub"
Agent: ✅ Pushed successfully!
Reality: fatal: could not read Username, nothing was pushed
With Error Recovery:
User: "Push to GitHub"
⚠️ Error: Git push failed after 2 recovery attempts
What happened: Pushing to origin/main
Error: fatal: could not read Username for 'https://github.com'
Attempted:
1. Retried push → same auth error
2. Tried SSH → not configured
State: Commit saved locally, not on remote
Need: Git credentials or token auth setup
# Claude Code
cp SKILL.md .claude/skills/error-recovery/
# OpenClaw
cp SKILL.md ~/.openclaw/workspace/skills/error-recovery/
# Cursor
cp SKILL.md .cursor/rules/error-recovery.mdc# Diagnose any error instantly
node scripts/error-diagnose.mjs --error "ENOENT"
node scripts/error-diagnose.mjs --error "429" --context "GitHub API"
node scripts/error-diagnose.mjs --error "timeout" --context "npm install"Supports: ENOENT, EACCES, EISDIR, 401, 403, 404, 429, 500, timeout, SIGKILL, SIGTERM, and any custom error.
Never silently continue after a failure. The skill checks:
- Command exit codes
- Exception messages
- Empty or unexpected output
- Timeout detection
- Behavioral anomalies
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Attempt 1 │────▶│ Attempt 2 │────▶│ Attempt 3 │
│ Immediate │ │ Wait 5s │ │ Wait 15s │
└─────────────┘ └─────────────┘ └──────┬──────┘
│
All failed? → 📢 REPORT │
What gets retrried: Network timeouts, rate limits, transient API errors What gets reported immediately: Auth failures, permission errors, data corruption
Every error gets reported with:
- What happened (context)
- The full error (details)
- What was tried (recovery log)
- Current state (what works, what's broken)
- What's needed (human action required?)
Errors are logged to memory/errors/ with:
- Root cause analysis
- Fix documentation
- Prevention measures
- Pattern detection across sessions
| Error Type | Strategy | Max Retries |
|---|---|---|
| Network timeout | Exponential backoff | 3 |
| Rate limit (429) | Wait + respect Retry-After | 3 |
| File not found (ENOENT) | Create / suggest correct path | 1 |
| Permission denied (EACCES) | Suggest fix | 0 (report) |
| Auth failure (401) | Re-auth | 1 |
| Server error (500) | Retry after delay | 3 |
| Process killed (SIGKILL) | Reduce load, retry | 2 |
| Data validation | Request correct input | 0 (report) |
- OpenClaw
- Claude Code
- Cursor
- Windsurf
- Codex
- Any agent framework
| Skill | Purpose |
|---|---|
| Cognitive Debt Guard | Prevent AI code quality issues |
| Prompt Guard | Block prompt injection attacks |
| EVR Framework | Verify completions are real |
| Systematic Debugging | Root cause analysis |
MIT — Use freely, fail safely.