Shakedown

One-command shakedown for any codebase. Architecture, security, performance, resilience, docs - what's broken, what to fix first, and whether you or your agents can handle it.

Works on anything you can point it at: agent systems, pipelines, web apps, CLIs, microservices, infra.

It's a Claude Code Skill - six phases, one report, fixes ranked by impact.

I made this because I kept writing the same audit prompt from scratch every time I wanted to properly review a project. Started with a 12-agent pipeline on Paperclip, turned it into something that works on any project.

Why this exists

I was running a 12-agent intelligence pipeline and every time I wanted to review the whole thing, I'd write a long audit prompt from scratch. Every time, I'd forget something. Sometimes security. Sometimes backup validation. Sometimes I just wouldn't look at how the agents coordinate with each other.

It's never the obvious stuff. It's always the thing you assumed was fine until it wasn't.

So I made it into a skill. Same checklist, same order, every time. Now I run one command and know I'm not skipping anything. It also finds things I wasn't looking for, which surprised me. After using it on a few projects I figured other people might find it useful too.

What it does

One command. Full analysis. Actionable output.

/shakedown

The audit discovers your project structure dynamically, reads everything, queries databases, tests backup integrity, and produces a structured report covering:

Architecture and code quality - design patterns, MECE analysis, contradictions, algorithm efficiency, dependency graph, test coverage
Error handling and resilience - crash scenarios, timeout coverage, silent failures, data integrity, edge cases, retry patterns, graceful degradation
Performance and bottleneck analysis - timing, parallelism, scaling limits, resource waste, cost analysis
Code and storage efficiency - empty files, duplicates, dead dependencies, build artifacts, storage bloat
Security and data exposure - secrets, injection vulnerabilities, PII, supply chain, workflow security, licensing compliance
Logging and observability - structured logs, traceability, alerting, monitoring
Documentation quality - accuracy vs codebase, completeness, onboarding readiness
Value assessment - problem clarity, target audience, maturity vs claims, differentiation, adoption readiness
Agent skill compliance - agentskills.io spec validation (conditional, for skill projects)
Production readiness - 10-gate PASS/PARTIAL/FAIL checklist (you call the ship/no-ship yourself)
Ranked recommendations - top 10 actions with impact, effort, and who implements

Installation

Claude Code (CLI or Desktop)

Clone the full skill (recommended — includes reference checklists for deeper analysis):

git clone https://github.com/belousov-petr/shakedown.git
mkdir -p ~/.claude/skills/shakedown
cp -r shakedown/SKILL.md shakedown/references ~/.claude/skills/shakedown/

Or grab just the core skill file (works but loses 11 reference checklists — shallower audit):

mkdir -p ~/.claude/skills/shakedown
curl -o ~/.claude/skills/shakedown/SKILL.md \
  https://raw.githubusercontent.com/belousov-petr/shakedown/master/SKILL.md

Restart Claude Code. It shows up as /shakedown.

Any skills-compatible agent (cross-client)

Use the .agents/skills/ path for compatibility with Cursor, Gemini CLI, Copilot, and 30+ other clients:

git clone https://github.com/belousov-petr/shakedown.git
mkdir -p ~/.agents/skills/shakedown
cp -r shakedown/SKILL.md shakedown/references ~/.agents/skills/shakedown/

Or install at project level (travels with the repo):

mkdir -p .agents/skills/shakedown
git clone https://github.com/belousov-petr/shakedown.git /tmp/shk
cp -r /tmp/shk/SKILL.md /tmp/shk/references .agents/skills/shakedown/
rm -rf /tmp/shk

How it works

Phase 1: Discover

Looks at the project before assuming anything. Maps the structure, checks git history, reads the README, detects if it's an agent skill (triggers extra checks in Phase 4), then asks you to confirm scope before burning tokens.

Phase 2: Read (parallel)

Sends 4 agents to read everything at once. One covers config and architecture, one covers execution logic, one reads outputs and docs, one counts files and checks data stores.

Phase 3: Diagnose

If there's a database, it connects and queries it. Checks table sizes, failure rates, data freshness. If there's no database, it skips this.

Phase 4: Analyze

This is where the actual opinions come in. Architecture review, error handling audit, performance analysis, storage efficiency, resource waste. If the project is an agent skill, it also gets evaluated against the agentskills.io specification. Everything quantified where possible.

Phase 5: Assess

Security scan, PII check, documentation accuracy, whether the project actually does what it claims to do. Includes a value assessment — does this project solve a real problem, for a clear audience, with measurable value? Ends with the ranked recommendations and the uncomfortable question.

Phase 6: Test resilience

Checks whether backups actually restore (not just whether they exist). Traces what happens when components fail.

Usage

Go to your project directory and run:

/shakedown

It also activates from natural language. These phrases trigger the full audit:

audit this project
find the weak spots
how solid is this project
how mature is this project
what would break first
where does this need tightening
assess technical debt
do a project health check

Claude Code matches your request against the skill's description — any phrasing around auditing, health checking, or stress-testing a whole project should trigger it. It won't activate for simple code reviews or single-file analysis.

The report covers

What the project is (derived from reading, not assumed)
What's genuinely good, with evidence
What will break soon, with evidence
Architecture problems, MECE gaps, contradictions, algorithm efficiency
Error handling: crash paths, silent failures, edge cases, retry patterns
Performance: bottlenecks, waste, cost
Code and storage efficiency: empty files, duplicates, dead deps, bloat
Agent skill compliance (if applicable): spec, description, instructions, evals
Security: secrets, injection risks, PII, licensing
Logging and monitoring gaps
Documentation: accuracy, completeness, onboarding
Goal fulfillment: stated vs actual behavior
Blind spots nobody is watching
Ratings (8 dimensions, scored 1-10)
Overall score with justification
Production readiness (10 gates; you call the ship/no-ship yourself)
Top 10 ranked fixes with effort estimates
Value assessment: problem clarity, audience, maturity, differentiation
The uncomfortable question

Focused checks (without running the full audit)

The reference files in references/ are self-contained checklists. You can use them individually for targeted reviews without running the full 6-phase audit. Just point your agent at the specific file:

If you want to check...	Use this file
Architecture, MECE gaps, algorithms, test coverage	`references/architecture-quality.md`
Error handling, crash paths, resilience	`references/error-resilience.md`
Performance, bottlenecks, scaling, cost	`references/performance-analysis.md`
File waste, duplicates, dead dependencies	`references/storage-efficiency.md`
Agent skill spec compliance	`references/skill-standards.md`
Security, secrets, injection, PII, licensing	`references/security-checklist.md`
Logging, docs quality, blind spots	`references/operational-health.md`
Problem clarity, audience, maturity, value	`references/value-assessment.md`
Database health, schema, freshness	`references/db-diagnostics.md`
Backup validation, disaster recovery	`references/resilience-testing.md`

Example: "Review this project's security using the checklist in references/security-checklist.md"

How it's different from /ultrareview

Anthropic shipped /ultrareview in April 2026 alongside Opus 4.7. It's a PR-time bug hunter that runs in Anthropic's cloud, spawns multiple reviewer agents, and reproduces every finding before reporting. Worth using. But it's a different job from this skill.

	/ultrareview	/shakedown
Scope	a branch / PR diff	the whole project
Trigger	before merge	any time, not tied to a PR
Runtime	Anthropic's cloud sandbox	your local Claude Code
Output	reproduced bug list (logic, edge cases, security, perf)	14-section report — architecture, security (OWASP depth), performance, resilience, docs, value assessment, a readiness checklist, the uncomfortable question
Cost	3 free on Pro/Max, then $5-20 per run	free
Works on	git repos with diffs	any project — non-git, side projects, skills, pipelines

Use /ultrareview before merging a PR. Use /shakedown when you want to know what's weak across the whole project.

What to do with the report

The report is built so you can act on it immediately. Here's what you can say after the audit finishes:

Say this	What happens
`Fix them all`	Starts implementing all recommendations in priority order
`Fix the critical ones`	Only tackles items rated Critical or FAIL
`Explain recommendation #3 in detail`	Deep-dive into a specific finding with implementation steps
`Re-run Phase 5 only`	Re-checks just one phase after you've made changes
`Create GitHub issues for each recommendation`	Turns findings into trackable issues
`Prioritize for a solo developer`	Filters by what a human needs to do vs what agents can handle
`Compare with the last audit`	If you've run it before, diffs the reports to show progress

How it thinks

A few rules I keep coming back to when reviewing my own projects:

Look at the project before making assumptions about it. Map first, read second.
Check with the user before going deep. "This is what I found, this is what I'll audit - sound right?" saves everyone's time.
Don't critique what you haven't read.
Put numbers on things. "23% duplicate rate" is useful. "Some duplicates" is not.
Compare what the docs say against what the code does. The gap between those two is where most problems hide.
Every recommendation answers four things: what, why, how much work, who does it. Anything less is just complaining.
The audit should be useful now, not next sprint. If you can say "fix them all" and start working immediately, it did its job.
Test the safety nets. Backups exist? Restore one. Retry logic? Trace what happens when it fires. Don't report that something exists - report whether it works.
Find what's wrong, not just what's right. The point is to make the project better, not to feel good about it.
Surface the constraints nobody talks about: rate limits, daily budgets, peak hour pricing. Those shape what's actually possible more than architecture does.

Works on

Project type	What gets checked
Agent systems (Paperclip, CrewAI, AutoGen)	Instructions, heartbeats, coordination, pipeline flow, signal quality
Web apps	Routes, API design, auth, DB schema, frontend/backend split
Data pipelines	Stage flow, data integrity, scheduling, error handling, throughput
CLI tools	Argument handling, error messages, edge cases, docs
Monorepos	Package boundaries, dependencies, build system, cross-package consistency
Microservices	Service boundaries, API contracts, resilience, observability

Benchmark results

Tested on 3 real projects: a personal media content pipeline (Python, ~56 files, yt-dlp + Whisper + static HTML gallery), a 17-agent Claude-based intelligence pipeline (live Postgres, 3,344 LOC of agent instructions + ~1,500 LOC Python), and a browser-automation tool (~1,950 LOC JavaScript, 4 files). Each project was audited twice — once with the skill active, once with a bare "audit this project" prompt. Both runs used Claude Opus 4.7 for a fair comparison. Output was graded against scripts/validate-output.py (21 structural checks, plus 1 conditional Skill Standards check for audits that cover agent-skill projects).

Project	With skill	Without skill	Delta
Personal media content pipeline (medium Python)	22/22 (100%)	0/21 (0%)	+100%
17-agent intelligence pipeline (large)	22/22 (100%)	2/21 (10%)	+90%
Browser automation tool (small JS)	22/22 (100%)	1/21 (5%)	+95%
Average	100%	5%	+95%

The bare prompts find real bugs. They just don't organize them. Without the skill — even on Opus 4.7 — no readiness table, no ratings, no ranked fix list, no uncomfortable question. The findings are in there somewhere, but you'd have to re-read everything to act on them. Full reports in examples/.

Token usage

The skill itself costs ~3,500 tokens to load (the SKILL.md file). Measured token usage across the 3 reference audits above (Opus 4.7, 1M context):

Project type	Files read	Tokens	Duration	Example
Small CLI/browser tool (~2K LOC)	15	~95K	~15 min	`benchmark-audit-c.md`
Medium personal pipeline (~56 files + JSONL manifests)	32	~165K	~45 min	`benchmark-audit-a.md`
Large agent system (17 agents + live Postgres + 4K LOC)	34	~125K	~32 min	`benchmark-audit-b.md`

Token usage scales roughly with source-code size + database query count, not file count — the large agent system had fewer source files than the medium pipeline but hit a live Postgres with 68 tables during Phase 3.

Project structure

shakedown/
├── SKILL.md                              # The skill - orchestrator (479 lines)
├── references/
│   ├── architecture-quality.md           # Section 4.4: structure, MECE, algorithms, tests
│   ├── error-resilience.md               # Section 4.5: crashes, timeouts, edge cases
│   ├── performance-analysis.md           # Section 4.6: timing, scaling, cost
│   ├── storage-efficiency.md             # Section 4.7: empty files, duplicates, bloat
│   ├── skill-standards.md                # Section 4.8: agentskills.io compliance
│   ├── db-diagnostics.md                 # Phase 3: database-specific queries
│   ├── security-checklist.md             # Section 5.1: OWASP-derived — secrets, injection, PII, licensing, LLM/Agentic/MCP risks
│   ├── operational-health.md             # Sections 5.2/5.3/5.5: logging, docs, blind spots
│   ├── value-assessment.md               # Section 5.10: problem, audience, maturity
│   ├── resilience-testing.md             # Phase 6: backup and resilience tests
│   └── gotchas.md                        # 10 common agent audit mistakes
├── examples/
│   ├── benchmark-audit-a.md              # With-skill audit: content pipeline
│   ├── benchmark-audit-b.md              # With-skill audit: agent pipeline
│   ├── benchmark-audit-c.md              # With-skill audit: browser tool
│   ├── baseline-audit-a.md               # Without-skill baseline: content pipeline
│   ├── baseline-audit-b.md               # Without-skill baseline: agent pipeline
│   └── baseline-audit-c.md               # Without-skill baseline: browser tool
├── evals/
│   ├── evals.json                        # 7 test cases with 51 assertions
│   └── README.md                         # How to run and grade evals
├── scripts/
│   └── validate-output.py                # 22-check output completeness validator
├── README.md
└── LICENSE                               # MIT

The main SKILL.md is a clean orchestrator — phases, flow, output templates. All detailed checklists live in references/ (11 files, 3,202 lines) and are loaded on demand. This keeps activation cost low while preserving depth. Each reference file includes scope boundary notes to prevent overlap.

A few honest things

Before you use this on anything serious.

LLM audits spot patterns, not truth. What comes back is a pile of "this looks suspicious" — still on you to read the code, check whether it's actually a problem, and figure out the fix. Especially for security, or anything users actually touch. If a finding looks important, don't take my word for it. Read the code.
Don't mistake this for due diligence. Compliance, audit trails, threat models, multiple humans signing off — none of that is here. What you get is a first pass from one LLM. Useful, not definitive.
The fit is small stuff you care about but haven't cleaned up. Solo tools, prototypes, side experiments, things that have been sitting untouched for months. The idea is to un-slop them before the mess starts dictating decisions, and to walk away with a shortlist you can actually work through. For anything with real stakes, pay a human to look.

Contributing

If you've run this and found gaps, I'd like to hear about it. Open an issue or PR with:

What kind of project you audited
What the skill should have checked but didn't
What you'd add to fill that gap

License

MIT. Use it, fork it, ship it — credit appreciated but not required.

Acknowledgments

Claude Code wrote this. I designed the audit flow and check structure, decided what gets audited and why, and directed the work. Claude did the typing - skill, references, tests, docs. For security and skill-compliance specifically, I reused two established sources instead of inventing my own:

Agent Skills specification. The skill is built to conform to it. Run npx skills-ref validate ./shakedown to check your install (frontmatter, naming, directory structure).
OWASP GenAI Security Project. The security checks, questions, and red-team procedures in references/security-checklist.md come straight from OWASP's GenAI publications - LLM Top 10, Agentic Top 10, Secure MCP, GenAI Data Security, Governance Checklist, Red Teaming Guide. If your project touches LLMs or agents, those documents are worth reading directly.

Author

Petr Belousov

GitHub: @belousov-petr
LinkedIn: petrbelousov

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shakedown

Why this exists

What it does

Installation

Claude Code (CLI or Desktop)

Any skills-compatible agent (cross-client)

How it works

Phase 1: Discover

Phase 2: Read (parallel)

Phase 3: Diagnose

Phase 4: Analyze

Phase 5: Assess

Phase 6: Test resilience

Usage

The report covers

Focused checks (without running the full audit)

How it's different from /ultrareview

What to do with the report

How it thinks

Works on

Benchmark results

Token usage

Project structure

A few honest things

Contributing

License

Acknowledgments

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
evals		evals
examples		examples
references		references
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
shakedown.png		shakedown.png

Folders and files

Latest commit

History

Repository files navigation

Shakedown

Why this exists

What it does

Installation

Claude Code (CLI or Desktop)

Any skills-compatible agent (cross-client)

How it works

Phase 1: Discover

Phase 2: Read (parallel)

Phase 3: Diagnose

Phase 4: Analyze

Phase 5: Assess

Phase 6: Test resilience

Usage

The report covers

Focused checks (without running the full audit)

How it's different from /ultrareview

What to do with the report

How it thinks

Works on

Benchmark results

Token usage

Project structure

A few honest things

Contributing

License

Acknowledgments

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages