Thank you for your interest in contributing! This document provides guidelines for contributing to this plugin evaluation framework.
- Code of Conduct
- Types of Contributions
- Project Structure
- Development Setup
- Code Style
- Linting
- Testing Requirements
- Pull Request Process
- Commit Messages
- Architecture Notes
- Questions?
- Version Release Procedure
This project adheres to our Code of Conduct. By participating, you agree to uphold this code.
| Type | Description |
|---|---|
| Stage Improvements | Enhance analysis, generation, execution, or evaluation stages |
| New Component Support | Add evaluation for new plugin component types |
| Detection Methods | Improve programmatic detection or LLM judgment |
| Output Formats | Add new report formats (e.g., HTML, Markdown) |
| Bug Fixes | Fix issues in parsing, execution, or metrics calculation |
| Documentation | Improve README, add examples, clarify usage |
src/
├── index.ts # CLI entry point (env.ts MUST be first import)
├── env.ts # Environment setup (dotenv loading)
├── config/ # Configuration loading with Zod validation
├── stages/
│ ├── 1-analysis/ # Plugin parsing, trigger extraction
│ ├── 2-generation/ # Scenario generation (LLM + deterministic)
│ ├── 3-execution/ # Agent SDK integration, tool capture
│ └── 4-evaluation/ # Programmatic detection, LLM judge, metrics
├── state/ # Resume capability, checkpointing
├── types/ # TypeScript interfaces
└── utils/ # Retry, concurrency, logging utilities
tests/
├── unit/ # Unit tests (mirror src/ structure)
│ └── stages/ # Per-stage test files
├── integration/ # Integration tests for full stages
├── e2e/ # End-to-end tests (real SDK calls)
├── mocks/ # Mock implementations for testing
└── fixtures/ # Test data and mock plugins
See CLAUDE.md for detailed architecture documentation.
- Node.js >= 20.0.0
- An Anthropic API key
# Clone and install
git clone https://github.com/sjnims/cc-plugin-eval.git
cd cc-plugin-eval
npm install
# Create environment file
echo "ANTHROPIC_API_KEY=sk-ant-your-key" > .env
# Build
npm run buildnpm test # Run all tests
npm run test:watch # Watch mode
npm run test:coverage # With coverage reportThis project uses strict TypeScript with ESLint. Key conventions:
- Strict TypeScript: All strict flags enabled,
noUncheckedIndexedAccess - Explicit return types: Required on all functions
- Import order: builtin → external → internal → parent → sibling
- Type imports: Always use
import typefor type-only imports - Unused parameters: Prefix with underscore (
_param)
Run all linters before submitting:
# TypeScript/ESLint (required)
npm run lint
npm run typecheck
# Code formatting
npx prettier --check "src/**/*.ts" "*.json" "*.md"
# Markdown
markdownlint "*.md"
# YAML
uvx yamllint -c .yamllint.yml config.yaml .yamllint.yml
# GitHub Actions
actionlint .github/workflows/*.yml- Coverage thresholds: 78% lines/statements, 75% functions, 65% branches
- Unit tests for all new functionality
- Integration tests for stage-level changes
- Mock external API calls (Anthropic SDK)
# Run a single test file
npx vitest run tests/unit/stages/1-analysis/skill-analyzer.test.ts
# Run tests matching a pattern
npx vitest run -t "SkillAnalyzer"-
Fork and create a feature branch
feat/descriptionfor new featuresfix/descriptionfor bug fixesdocs/descriptionfor documentationrefactor/descriptionfor refactoring
-
Make changes following the code style above
-
Ensure quality gates pass
- All linters pass (
npm run lint,npm run typecheck) - All tests pass (
npm test) - Coverage threshold met (
npm run test:coverage)
- All linters pass (
-
Submit PR using the provided template
- Link to related issue if applicable
- Describe what changed and why
- Include test plan
-
Address review feedback
Follow Conventional Commits:
feat: add JUnit XML output format
fix: correct token estimation for Claude 3.5 Sonnet
docs: update CLI usage examples
refactor: extract scenario validation to separate module
test: add integration tests for stage 4
chore: update dependencies
Programmatic detection is primary (100% confidence):
- Tool capture hooks capture invocations during execution:
PreToolUse/PostToolUseforSkill,Task,SlashCommandtool callsSubagentStart/SubagentStopfor agent detectionSDKHookResponseMessagefor hook detection- MCP tools via pattern:
mcp__<server>__<tool>
- LLM judge is secondary, used only for quality assessment
Two SDK integration points exist:
| SDK | Used In | Purpose |
|---|---|---|
@anthropic-ai/sdk |
Stages 2, 4 | LLM calls for generation and judgment |
@anthropic-ai/claude-agent-sdk |
Stage 3 | Plugin loading and execution |
Pipeline state is saved after each stage to results/{plugin}/{run-id}/state.json,
enabling resume on interruption. When modifying stages, ensure state serialization
remains compatible.
Open a Discussion for questions or ideas before starting significant work.
This project follows semantic versioning. The version is defined in package.json (single source of truth).
Run all validation before starting:
npm run typecheck
npm run lint
npm test
npm run build
# Additional linters
npx prettier --check "src/**/*.ts" "*.json" "*.md"
markdownlint "*.md"
uvx yamllint -c .yamllint.yml config.yaml .yamllint.yml
actionlint .github/workflows/*.ymlReview commits since last release:
git log v0.x.x..HEAD --oneline- patch (0.1.x): Bug fixes, dependency updates
- minor (0.x.0): New features, backwards-compatible
- major (x.0.0): Breaking changes
git checkout main
git pull origin main
git checkout -b release/v0.x.xnpm version <major|minor|patch> --no-git-tag-versionVerify the change:
rg '"version"' package.json- Move items from
[Unreleased]to new version section - Add release date in format
YYYY-MM-DD - Organize changes into categories: Added, Changed, Deprecated, Removed, Fixed, Security
- Update comparison links at bottom of file
Example:
## [Unreleased]
## [0.2.0] - 2025-01-15
### Added
- New feature description (#PR)
### Fixed
- Bug fix description (#PR)
[Unreleased]: https://github.com/sjnims/cc-plugin-eval/compare/v0.2.0...HEAD
[0.2.0]: https://github.com/sjnims/cc-plugin-eval/compare/v0.1.0...v0.2.0git add package.json package-lock.json CHANGELOG.md
git commit -m "chore: prepare release v0.x.x"
git push origin release/v0.x.x
gh pr create --title "chore: prepare release v0.x.x" --body "$(cat <<'EOF'
## Release v0.x.x
### Changes
- [List major changes]
### Checklist
- [ ] Version updated in package.json
- [ ] CHANGELOG.md updated with release notes
- [ ] All CI checks pass
- [ ] Tested locally
EOF
)"After PR approval and merge:
git checkout main
git pull origin main
gh release create v0.x.x \
--target main \
--title "v0.x.x" \
--notes-file - <<'EOF'
## Summary
[Brief description of release]
## What's Changed
[Paste relevant sections from CHANGELOG.md]
**Full Changelog**: https://github.com/sjnims/cc-plugin-eval/compare/v0.x-1.x...v0.x.x
EOFAfter release:
# Check tag exists
git tag -l | grep v0.x.x
# Check GitHub release
gh release view v0.x.x
# Verify CLI version
npm run build && node dist/index.js --version