Skills are advertised as cross-agent compatible but only manually tested in Claude Code today.
Proposed:
- A small
tests/ directory with one canonical prompt per skill
- A runner that captures outputs from each supported agent
- A diff/scoring step (could lean on the AI-Eval-Skills toolkit)
This becomes the regression net when skills are edited.
Skills are advertised as cross-agent compatible but only manually tested in Claude Code today.
Proposed:
tests/directory with one canonical prompt per skillThis becomes the regression net when skills are edited.