From 434efe53246a82e3450ce2acf3ee1be5b172d5e9 Mon Sep 17 00:00:00 2001 From: Ryan Alberts Date: Mon, 27 Apr 2026 23:40:57 -0500 Subject: [PATCH 1/4] docs: feature existing PRD sample in examples gallery The repo already shipped examples/sample-prd-output.md but it wasn't linked from anywhere, while the gallery still listed /prd under "Coming soon". /prd is the headline skill in the main README walkthrough, so the empty slot was the most damaging gap. Add a /prd section that links the existing sample and remove the now-filled row. --- examples/README.md | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/examples/README.md b/examples/README.md index 1926dfc..5a3d291 100644 --- a/examples/README.md +++ b/examples/README.md @@ -35,6 +35,27 @@ If an artifact looks like work you'd want to do less of by hand, that's the valu --- +## /prd — translate a customer signal into a PRD draft + +**You'd type:** +``` +/prd "Support agents spend 30% of their time manually triaging tickets. We're missing our 2-hour SLA on 40% of tickets and ticket volume is growing 15% MoM." +``` + +**You'd get:** **[examples/sample-prd-output.md](./sample-prd-output.md)** + +**What's good about it:** +- Problem statement grounded in numbers (30% of agent time, 40% SLA miss, 15% MoM growth) — not vibes +- MoSCoW scoping makes the V1 cut explicit and defensible (auto-categorize ✓, auto-reply ✗) +- Success metrics include a North Star (>85% first-try routing), a supporting metric (<2hr response), and a counter-metric (<10% re-routing) — the trio you'd want before any launch +- Open questions section names the real trade-off (Haiku vs Sonnet for cost/latency/accuracy) instead of hand-waving "TBD" +- Calls out concrete risks at both layers — technical (LLM latency) and business (mis-routing enterprise tickets) + +**Time to produce by hand:** 1–2 hours of staring at a blank page, then more hours arguing about scope. +**Time with `/prd`:** ~60 seconds + a few minutes to add company-specific context. + +--- + ## Strategy / planning artifacts Not from a single skill — these came from Claude+pmstack working through real strategic questions: @@ -49,7 +70,6 @@ Not from a single skill — these came from Claude+pmstack working through real | Skill | What we'd put here | |---|---| -| `/prd` | A worked PRD from a real customer signal | | `/competitive` | A worked landscape analysis with white-space ID | | `/compare` | A side-by-side product comparison with an evaluable test plan | | `/metrics` | A real measurement framework with a North Star, supporting metrics, counter-metrics | From 6ce4da8b8c23bc3543339fef757b238942879e26 Mon Sep 17 00:00:00 2001 From: Ryan Alberts Date: Tue, 28 Apr 2026 00:05:28 -0500 Subject: [PATCH 2/4] docs(prd): align skill with its own template (all 6 sections) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit skills/prd-from-signal.md listed only 5 PRD sections inline — missing "User Experience / Flow" and "Open Questions / Risks" — even though templates/prd-template.md (referenced from step 3) defines 6, and the cross-platform claude-skills/pmstack-prd/SKILL.md already lists all 6. Make the template the canonical source, list all 6 sections in the skill, and add a note that the template wins on future drift. --- skills/prd-from-signal.md | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/skills/prd-from-signal.md b/skills/prd-from-signal.md index 2a16d78..912cf9a 100644 --- a/skills/prd-from-signal.md +++ b/skills/prd-from-signal.md @@ -8,16 +8,19 @@ Transform a raw user signal (e.g., a customer quote, a support ticket, or a feat ## Instructions When the user invokes this skill with a signal, you should: -1. Analyze the signal to extract the underlying user problem (the "Why"). -2. Propose a solution or feature that addresses the problem. -3. Use the `templates/prd-template.md` structure to draft the PRD. -4. Include: - - Problem Statement - - Target Audience - - Proposed Solution - - Key Features (MoSCoW prioritization) - - Success Metrics -5. Save the output to `outputs/prd-[topic]-[date].md`. +1. Analyze the signal to extract the underlying user problem (the "Why") — not just restate the surface ask. +2. Name the target audience precisely — persona, segment, JTBD. +3. Propose a solution that addresses the root problem. +4. Fill out the PRD using the canonical structure in [`templates/prd-template.md`](../templates/prd-template.md). All six sections are required: + 1. Problem Statement (what / who / why now) + 2. Proposed Solution (description + value proposition) + 3. Key Features (MoSCoW — specific features, not adjectives) + 4. User Experience / Flow (step-by-step + empty / loading / error / success states) + 5. Success Metrics (North Star + 2–3 supporting + 1–2 counter-metrics) + 6. Open Questions / Risks (technical, business, unresolved decisions) +5. Save the output to `outputs/prd-[topic-slug]-[YYYY-MM-DD].md`. + +If `templates/prd-template.md` ever conflicts with the list above, the template wins — update this skill to match. ## Tone Clear, concise, and focused on user value. Use active voice. From d13f7a4446235b62e8ff8ad9ac84b5e17486f41b Mon Sep 17 00:00:00 2001 From: Ryan Alberts Date: Tue, 28 Apr 2026 00:11:17 -0500 Subject: [PATCH 3/4] fix(setup): help no longer leaks 'set -euo pipefail' line ./setup --help used a hard-coded sed range '2,8p' that included the first non-comment line. Replace with awk that prints the leading comment block until the first non-# line, so help can't drift again when the comment block grows or shrinks. Before: last line of help was 'set -euo pipefail' (a bash directive). After: clean usage block, no shell internals. --- setup | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/setup b/setup index 6766534..4a699c7 100755 --- a/setup +++ b/setup @@ -17,7 +17,8 @@ for arg in "$@"; do --global) GLOBAL=1 ;; --dry-run) DRY_RUN=1 ;; -h|--help) - sed -n '2,8p' "$0" | sed 's/^# //' + # Print the leading comment block (lines 2..first non-# line) so help never drifts from the comment. + awk 'NR==1{next} /^#/{sub(/^# ?/,""); print; next} {exit}' "$0" exit 0 ;; *) From 5b73d3c6d716a7016596da505570ed37a276bcd7 Mon Sep 17 00:00:00 2001 From: Ryan Alberts Date: Wed, 29 Apr 2026 16:45:44 -0500 Subject: [PATCH 4/4] docs(evals): cross-reference the two runners in their docstrings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The repo has two scripts whose names look like cousins — bin/run-eval.py (user-facing /run-eval; grades a third-party AI target) and evals/runner.py (internal /eval-self; grades pmstack's own skills). They are not duplicates: different inputs, different invocation paths, different output formats. But the naming risks reader confusion when someone greps for "eval runner" and grabs the wrong file. Add a one-paragraph "not to be confused with X" pointer to each docstring so the distinction is visible at a glance. --- bin/run-eval.py | 4 ++++ evals/runner.py | 5 +++++ 2 files changed, 9 insertions(+) diff --git a/bin/run-eval.py b/bin/run-eval.py index 1284061..16f5d6b 100755 --- a/bin/run-eval.py +++ b/bin/run-eval.py @@ -2,6 +2,10 @@ """ pmstack evaluation executor. +Not to be confused with `evals/runner.py` — that one runs the pmstack +self-eval suite (grading pmstack's own skills via the `claude` CLI). +This script grades a third-party AI system described by an eval YAML. + Reads an evaluation YAML produced by /eval, validates it has a real target, invokes the target once per test case, scores the output, and writes deterministic artifacts to outputs/eval-runs/-/. diff --git a/evals/runner.py b/evals/runner.py index 3974b10..9541e89 100755 --- a/evals/runner.py +++ b/evals/runner.py @@ -2,6 +2,11 @@ """ pmstack self-eval runner. +Not to be confused with `bin/run-eval.py` — that one is the user-facing +executor for /run-eval (grades a third-party AI target described by an +eval YAML). This script grades pmstack's own skills against the +canonical suite at evals/pmstack-self.yaml. + Runs each skill in evals/pmstack-self.yaml against a clean install of pmstack, collects the artifacts written to outputs/, runs structural checks, and asks a judge model to score quality. Writes a JSON report to evals/results/.