diff --git a/bin/run-eval.py b/bin/run-eval.py index 1284061..16f5d6b 100755 --- a/bin/run-eval.py +++ b/bin/run-eval.py @@ -2,6 +2,10 @@ """ pmstack evaluation executor. +Not to be confused with `evals/runner.py` — that one runs the pmstack +self-eval suite (grading pmstack's own skills via the `claude` CLI). +This script grades a third-party AI system described by an eval YAML. + Reads an evaluation YAML produced by /eval, validates it has a real target, invokes the target once per test case, scores the output, and writes deterministic artifacts to outputs/eval-runs/-/. diff --git a/evals/runner.py b/evals/runner.py index 3974b10..9541e89 100755 --- a/evals/runner.py +++ b/evals/runner.py @@ -2,6 +2,11 @@ """ pmstack self-eval runner. +Not to be confused with `bin/run-eval.py` — that one is the user-facing +executor for /run-eval (grades a third-party AI target described by an +eval YAML). This script grades pmstack's own skills against the +canonical suite at evals/pmstack-self.yaml. + Runs each skill in evals/pmstack-self.yaml against a clean install of pmstack, collects the artifacts written to outputs/, runs structural checks, and asks a judge model to score quality. Writes a JSON report to evals/results/. diff --git a/examples/README.md b/examples/README.md index 1926dfc..5a3d291 100644 --- a/examples/README.md +++ b/examples/README.md @@ -35,6 +35,27 @@ If an artifact looks like work you'd want to do less of by hand, that's the valu --- +## /prd — translate a customer signal into a PRD draft + +**You'd type:** +``` +/prd "Support agents spend 30% of their time manually triaging tickets. We're missing our 2-hour SLA on 40% of tickets and ticket volume is growing 15% MoM." +``` + +**You'd get:** **[examples/sample-prd-output.md](./sample-prd-output.md)** + +**What's good about it:** +- Problem statement grounded in numbers (30% of agent time, 40% SLA miss, 15% MoM growth) — not vibes +- MoSCoW scoping makes the V1 cut explicit and defensible (auto-categorize ✓, auto-reply ✗) +- Success metrics include a North Star (>85% first-try routing), a supporting metric (<2hr response), and a counter-metric (<10% re-routing) — the trio you'd want before any launch +- Open questions section names the real trade-off (Haiku vs Sonnet for cost/latency/accuracy) instead of hand-waving "TBD" +- Calls out concrete risks at both layers — technical (LLM latency) and business (mis-routing enterprise tickets) + +**Time to produce by hand:** 1–2 hours of staring at a blank page, then more hours arguing about scope. +**Time with `/prd`:** ~60 seconds + a few minutes to add company-specific context. + +--- + ## Strategy / planning artifacts Not from a single skill — these came from Claude+pmstack working through real strategic questions: @@ -49,7 +70,6 @@ Not from a single skill — these came from Claude+pmstack working through real | Skill | What we'd put here | |---|---| -| `/prd` | A worked PRD from a real customer signal | | `/competitive` | A worked landscape analysis with white-space ID | | `/compare` | A side-by-side product comparison with an evaluable test plan | | `/metrics` | A real measurement framework with a North Star, supporting metrics, counter-metrics | diff --git a/setup b/setup index 6766534..4a699c7 100755 --- a/setup +++ b/setup @@ -17,7 +17,8 @@ for arg in "$@"; do --global) GLOBAL=1 ;; --dry-run) DRY_RUN=1 ;; -h|--help) - sed -n '2,8p' "$0" | sed 's/^# //' + # Print the leading comment block (lines 2..first non-# line) so help never drifts from the comment. + awk 'NR==1{next} /^#/{sub(/^# ?/,""); print; next} {exit}' "$0" exit 0 ;; *) diff --git a/skills/prd-from-signal.md b/skills/prd-from-signal.md index 2a16d78..912cf9a 100644 --- a/skills/prd-from-signal.md +++ b/skills/prd-from-signal.md @@ -8,16 +8,19 @@ Transform a raw user signal (e.g., a customer quote, a support ticket, or a feat ## Instructions When the user invokes this skill with a signal, you should: -1. Analyze the signal to extract the underlying user problem (the "Why"). -2. Propose a solution or feature that addresses the problem. -3. Use the `templates/prd-template.md` structure to draft the PRD. -4. Include: - - Problem Statement - - Target Audience - - Proposed Solution - - Key Features (MoSCoW prioritization) - - Success Metrics -5. Save the output to `outputs/prd-[topic]-[date].md`. +1. Analyze the signal to extract the underlying user problem (the "Why") — not just restate the surface ask. +2. Name the target audience precisely — persona, segment, JTBD. +3. Propose a solution that addresses the root problem. +4. Fill out the PRD using the canonical structure in [`templates/prd-template.md`](../templates/prd-template.md). All six sections are required: + 1. Problem Statement (what / who / why now) + 2. Proposed Solution (description + value proposition) + 3. Key Features (MoSCoW — specific features, not adjectives) + 4. User Experience / Flow (step-by-step + empty / loading / error / success states) + 5. Success Metrics (North Star + 2–3 supporting + 1–2 counter-metrics) + 6. Open Questions / Risks (technical, business, unresolved decisions) +5. Save the output to `outputs/prd-[topic-slug]-[YYYY-MM-DD].md`. + +If `templates/prd-template.md` ever conflicts with the list above, the template wins — update this skill to match. ## Tone Clear, concise, and focused on user value. Use active voice.