Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions bin/run-eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
"""
pmstack evaluation executor.

Not to be confused with `evals/runner.py` — that one runs the pmstack
self-eval suite (grading pmstack's own skills via the `claude` CLI).
This script grades a third-party AI system described by an eval YAML.

Reads an evaluation YAML produced by /eval, validates it has a real target,
invokes the target once per test case, scores the output, and writes
deterministic artifacts to outputs/eval-runs/<feature>-<run-date>/.
Expand Down
5 changes: 5 additions & 0 deletions evals/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@
"""
pmstack self-eval runner.

Not to be confused with `bin/run-eval.py` — that one is the user-facing
executor for /run-eval (grades a third-party AI target described by an
eval YAML). This script grades pmstack's own skills against the
canonical suite at evals/pmstack-self.yaml.

Runs each skill in evals/pmstack-self.yaml against a clean install of pmstack,
collects the artifacts written to outputs/, runs structural checks, and asks a
judge model to score quality. Writes a JSON report to evals/results/.
Expand Down
22 changes: 21 additions & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,27 @@ If an artifact looks like work you'd want to do less of by hand, that's the valu

---

## /prd — translate a customer signal into a PRD draft

**You'd type:**
```
/prd "Support agents spend 30% of their time manually triaging tickets. We're missing our 2-hour SLA on 40% of tickets and ticket volume is growing 15% MoM."
```

**You'd get:** **[examples/sample-prd-output.md](./sample-prd-output.md)**

**What's good about it:**
- Problem statement grounded in numbers (30% of agent time, 40% SLA miss, 15% MoM growth) — not vibes
- MoSCoW scoping makes the V1 cut explicit and defensible (auto-categorize ✓, auto-reply ✗)
- Success metrics include a North Star (>85% first-try routing), a supporting metric (<2hr response), and a counter-metric (<10% re-routing) — the trio you'd want before any launch
- Open questions section names the real trade-off (Haiku vs Sonnet for cost/latency/accuracy) instead of hand-waving "TBD"
- Calls out concrete risks at both layers — technical (LLM latency) and business (mis-routing enterprise tickets)

**Time to produce by hand:** 1–2 hours of staring at a blank page, then more hours arguing about scope.
**Time with `/prd`:** ~60 seconds + a few minutes to add company-specific context.

---

## Strategy / planning artifacts

Not from a single skill — these came from Claude+pmstack working through real strategic questions:
Expand All @@ -49,7 +70,6 @@ Not from a single skill — these came from Claude+pmstack working through real

| Skill | What we'd put here |
|---|---|
| `/prd` | A worked PRD from a real customer signal |
| `/competitive` | A worked landscape analysis with white-space ID |
| `/compare` | A side-by-side product comparison with an evaluable test plan |
| `/metrics` | A real measurement framework with a North Star, supporting metrics, counter-metrics |
Expand Down
3 changes: 2 additions & 1 deletion setup
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ for arg in "$@"; do
--global) GLOBAL=1 ;;
--dry-run) DRY_RUN=1 ;;
-h|--help)
sed -n '2,8p' "$0" | sed 's/^# //'
# Print the leading comment block (lines 2..first non-# line) so help never drifts from the comment.
awk 'NR==1{next} /^#/{sub(/^# ?/,""); print; next} {exit}' "$0"
exit 0
;;
*)
Expand Down
23 changes: 13 additions & 10 deletions skills/prd-from-signal.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,19 @@ Transform a raw user signal (e.g., a customer quote, a support ticket, or a feat

## Instructions
When the user invokes this skill with a signal, you should:
1. Analyze the signal to extract the underlying user problem (the "Why").
2. Propose a solution or feature that addresses the problem.
3. Use the `templates/prd-template.md` structure to draft the PRD.
4. Include:
- Problem Statement
- Target Audience
- Proposed Solution
- Key Features (MoSCoW prioritization)
- Success Metrics
5. Save the output to `outputs/prd-[topic]-[date].md`.
1. Analyze the signal to extract the underlying user problem (the "Why") — not just restate the surface ask.
2. Name the target audience precisely — persona, segment, JTBD.
3. Propose a solution that addresses the root problem.
4. Fill out the PRD using the canonical structure in [`templates/prd-template.md`](../templates/prd-template.md). All six sections are required:
1. Problem Statement (what / who / why now)
2. Proposed Solution (description + value proposition)
3. Key Features (MoSCoW — specific features, not adjectives)
4. User Experience / Flow (step-by-step + empty / loading / error / success states)
5. Success Metrics (North Star + 2–3 supporting + 1–2 counter-metrics)
6. Open Questions / Risks (technical, business, unresolved decisions)
5. Save the output to `outputs/prd-[topic-slug]-[YYYY-MM-DD].md`.
Comment on lines +11 to +21
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The instructions have a slight redundancy. Item 2, 'Name the target audience precisely', is also covered within item 4.1, 'Problem Statement (what / who / why now)'. This could be confusing for the LLM processing these instructions.

To improve clarity, I suggest removing the separate top-level item for the target audience and incorporating its detail into the 'Problem Statement' sub-item.

Suggested change
1. Analyze the signal to extract the underlying user problem (the "Why") — not just restate the surface ask.
2. Name the target audience precisely — persona, segment, JTBD.
3. Propose a solution that addresses the root problem.
4. Fill out the PRD using the canonical structure in [`templates/prd-template.md`](../templates/prd-template.md). All six sections are required:
1. Problem Statement (what / who / why now)
2. Proposed Solution (description + value proposition)
3. Key Features (MoSCoW — specific features, not adjectives)
4. User Experience / Flow (step-by-step + empty / loading / error / success states)
5. Success Metrics (North Star + 2–3 supporting + 1–2 counter-metrics)
6. Open Questions / Risks (technical, business, unresolved decisions)
5. Save the output to `outputs/prd-[topic-slug]-[YYYY-MM-DD].md`.
1. Analyze the signal to extract the underlying user problem (the "Why") — not just restate the surface ask.
2. Propose a solution that addresses the root problem.
3. Fill out the PRD using the canonical structure in [`templates/prd-template.md`](../templates/prd-template.md). All six sections are required:
1. Problem Statement (what / who / why now). The "who" should be a precise target audience (persona, segment, JTBD).
2. Proposed Solution (description + value proposition)
3. Key Features (MoSCoW — specific features, not adjectives)
4. User Experience / Flow (step-by-step + empty / loading / error / success states)
5. Success Metrics (North Star + 2–3 supporting + 1–2 counter-metrics)
6. Open Questions / Risks (technical, business, unresolved decisions)
4. Save the output to `outputs/prd-[topic-slug]-[YYYY-MM-DD].md`.


If `templates/prd-template.md` ever conflicts with the list above, the template wins — update this skill to match.

## Tone
Clear, concise, and focused on user value. Use active voice.