An agent skill that stress-tests technical plans before you build them.
Models are lazy about verification. They'll write a plan that says "use SQLite for concurrent writes" or "Y.js supports persistence out of the box" and move on without checking. These unchecked assumptions become mid-build surprises that force architectural pivots, messy workarounds, and wasted context.
This skill forces the model to actually verify its claims — searching real docs, running proof-of-concept code, and fixing the plan before implementation starts. Each verification runs in a fresh sub-agent context, so there's no confirmation bias from the planning conversation. The result: plans that work on the first try, which means cleaner code with fewer mid-course corrections.
A plan claimed bash + sqlite3 would be fast enough for git hooks. The skill spun up parallel agents to research alternatives and run an actual latency POC:
The POC disproved the assumption — bash was 4-5x slower than estimated — and surfaced the real tradeoffs across runtimes:
/plugin marketplace add gbasin/stress-test-skill
/plugin install stress-test
Or manually:
curl -fsSL -o ~/.claude/commands/stress-test.md \
https://raw.githubusercontent.com/gbasin/stress-test-skill/main/skills/stress-test/SKILL.md$skill-installer install https://github.com/gbasin/stress-test-skill/tree/main/skills/stress-test
Or manually:
mkdir -p ~/.codex/skills/stress-test
curl -fsSL -o ~/.codex/skills/stress-test/SKILL.md \
https://raw.githubusercontent.com/gbasin/stress-test-skill/main/skills/stress-test/SKILL.mdCopy skills/stress-test/SKILL.md into wherever your framework reads agent instructions from, or include its contents in your agent's system prompt.
Six phases, each building on the last:
- Decompose — Extracts every decision, assumption, dependency, and interface from your plan
- Verify — Launches parallel sub-agents to search docs, repos, and the web for evidence. For each claim: "How do we know this works?"
- Triage — Separates what's confirmed from what needs hands-on testing. Drafts minimal POC specs for unresolved items.
- Approve — Presents proposed POCs and lets you choose which to run, skip, or modify. Nothing runs without your say-so.
- Test — Runs approved POCs in parallel in an isolated
.poc-stress-test/directory. Each POC reports confirmed, disproved, or inconclusive with raw output. - Update — Walks through each finding individually, recommends plan changes, and applies approved updates inline. Cleans up after itself.
- After writing a technical plan or architecture doc, before you start building
- When evaluating a new library, framework, or integration approach
- Before committing to decisions that are expensive to reverse
- Anytime a plan has claims you haven't personally verified

