Static analysis for Claude Code / Agent SDK skills. Audits SKILL.md against
10 QA checks to verify a skill will trigger and execute correctly across LLMs
(Claude, GPT, Gemini).
Built from an audit of 119 production skills (score went from 5.9 → 10.0/10).
Skills written for Claude pre-4.5 often work only in the author's context — they have thin descriptions, vague workflows, no negative boundaries, no evals. When you try to run them in:
- Claude 4.6+ (tighter trigger heuristics)
- GPT-5 / GPT-4o (different trigger semantics)
- Gemini Pro (stricter about imperative language)
- External agent frameworks
...the skill silently fails to trigger, or triggers in wrong contexts.
skill-audit diagnoses these issues before you ship.
Requires Python 3.8+. No dependencies (stdlib only).
git clone https://github.com/okjpg/skill-audit
cd skill-auditOr install as a Claude Code skill:
cp -r skill-audit ~/.claude/skills/Or via gh skill (when your CLI supports it):
gh skill install okjpg/skill-auditpython3 scripts/audit.py ./my-skill/SKILL.mdpython3 scripts/audit.py ~/.claude/skills/python3 scripts/audit.py ~/.claude/skills/ --json -o audit.jsonskill-audit: 12 skills analisadas
Score médio: 7.1/10
Passing (≥7): 4/12 (33%)
Report: audit-results.md
The markdown report includes:
- Score distribution
- Issues ranked by frequency
- Skills ranked worst → best
- Detailed per-skill breakdown
- Name in kebab-case matching folder name
- Description with 50+ words, third person, 5+ trigger phrases, negative boundaries
- Workflow steps are single, imperative, unambiguous
- 2+ examples with real input → real output
- Edge cases covered (3+ conditions with specific action)
- Output format explicitly defined
- Zero vague language (
handle appropriately,as needed, etc — banned) - Negative boundaries in body (
## When NOT to Use) - Zero hardcoded secrets (API keys, tokens)
evals/evals.jsonwith 2+ cases
Full spec: references/qa-checklist.md.
See examples/before.md (score 3/10) vs
examples/after.md (score 10/10) — same skill, same
behavior, but one triggers reliably in any LLM.
templates/SKILL-TEMPLATE.md — copy-paste
skeleton for new V3-compliant skills.
- Creating a skill from scratch — use
criar-skillinstead - Executing the skill being audited — run it directly
- Security audit (OWASP, production secrets) — this tool is for structural quality, not security posture
- Does not audit semantic quality of the workflow (whether the steps make sense for your domain)
- Static analysis only — does not execute the skill
- Trigger phrase heuristics can false-positive on dense technical descriptions. Review borderline scores (6-7) manually.
Bruno Okamoto — part of the Pixel AI Hub curriculum.
Related:
second-brain-amora— persistent memory system for Claude Codeskill-creator— generate new skills from workflows
MIT.