36 skills for PMs who build AI agents as products — not just use AI as a tool
⭐ If you're a PM building AI agents, star this repo — it's the only skillset designed for the full agent product lifecycle.
🆕 New to Claude Code? →
forge/claude-mdscans your project, auto-generates CLAUDE.md, and recommends the right AI_PM_Skills plugins. The fastest way to onboard.
In 2026, PMs are being asked to "build an agent" — but existing PM skills don't prepare you for that.
General PM skills teach you to use AI as a tool — write PRDs faster, generate OKRs, analyze competitors. But when you're building agents as products, the questions are fundamentally different:
- "What would it cost to run this agent at 1,000 users/day?"
- "How does an agent recover from hallucination?"
- "How do I orchestrate multiple agents together?"
- "How do I encode 3 months of operational judgment into the agent's instructions?"
This project turns those questions into 36 production-grade skills across the full agent lifecycle.
# 1. Install the plugin
/plugin marketplace add kimsanguine/AI_PM_Skills
/plugin install oracle@kimsanguine-AI_PM_Skills
# 2. Just describe your task — the right skill loads automatically
"We handle 500 support tickets/day. Which parts should an agent handle?"
# → opp-tree skill auto-loads → opportunity mapping startsThis isn't a random collection of skills. It's a complete lifecycle — the same path every agent PM walks.
발견(Discover) → 설계(Architect) → 실행(Ship) → 운영(Operate) → 학습(Learn)
oracle atlas forge argus muse
6 skills 7 skills 12 skills 8 skills 3 skills
↑ │
└──────────── Accumulated TK feeds back into next agent ────────┘
| Stage | Plugin | The Question | Key Skills |
|---|---|---|---|
| Discover | oracle |
"What agent should we build?" | opp-tree · assumptions · build-or-buy · cost-sim · hitl · agent-gtm |
| Architect | atlas |
"How should we structure it?" | 3-tier · orchestration · router · memory-arch · moat · growth-loop · biz-model |
| Ship | forge |
"How to spec and ship it?" | claude-md · prd · instruction · prompt · ctx-budget · okr · stakeholder-map · agent-plan-review + 4 comms tools |
| Operate | argus |
"How to measure and improve?" | kpi · reliability · premortem · burn-rate · north-star · agent-ab-test · cohort · incident |
| Learn | muse ⭐ |
"How to make agents smarter over time?" | pm-framework · pm-decision · pm-engine |
Each skill auto-loads from natural language — describe your task and the right skill fires. Skills also route across plugins: burn-rate (argus) detects a cost spike → suggests router (atlas) for model change → triggers cost-sim (oracle) for re-simulation.
36 skills map to 5 stages of agent product development. This isn't "AI tools for PMs" — it's a structured methodology for building agents as products, from discovery to self-improving agents.
We separate how Claude finds skills (Platform Layer — Skills 2.0 spec) from what goes inside each skill (Content Layer). The Content Layer defines the Trigger Gate (Use/Route/Boundary) pattern that prevents skill collisions, plus domain-specific context in each skill's context/domain.md. Result: 97.9% trigger accuracy across 96 test queries.
┌─ Platform Layer ──── Skills 2.0 Spec ──────────────────────┐
│ frontmatter · auto-invocation · subagent · hooks · evals │
├─ Content Layer ──── AI_PM_Skills Pattern ──────────────────┤
│ Core Goal → Trigger Gate → Failure Handling │
│ → Quality Gate → Examples · context/domain.md │
└─────────────────────────────────────────────────────────────┘
muse is the moat. It structures your operational judgment into TK (Tacit Knowledge) units, then injects them into agent instructions. The more you use it, the smarter your agents get — and that knowledge stays yours.
PM 판단 기록 → /extract → TK-NNN 구조화 → PM-ENGINE-MEMORY.md 축적
→ /tk-to-instruction → 에이전트 시스템 프롬프트 업데이트 → 반복
This creates switching cost: competitors can copy the framework, but they can't copy your accumulated TK.
Every skill is measured. 10 quality tests with 54 assertions prove what skills add vs base Claude. Result:
| With Skill | Without Skill | Delta | |
|---|---|---|---|
| Pass Rate | 100% | 88% | +12% |
pm-framework without skill drops to 40%. cost-sim with skill adds +46.6% output. This is data-driven proof that the skills work.
Every skill includes examples/good-01.md and examples/bad-01.md — concrete right/wrong output pairs. Plus references/test-cases.md with edge case tables. These aren't decorative; they're training signals that make skill quality measurable and continuously improvable.
Built on Claude Code's latest platform spec: auto-invocation, context: fork, allowed-tools, model field, dynamic !command injection, marketplace, and eval system. New users start with the PM-ENGINE-MEMORY Starter Kit — 5 seed TK entries so the value is immediate, not "someday when I accumulate enough data."
1. oracle — What agent to build? (6 skills, 2 commands)
| Skill | What it does | When to use |
|---|---|---|
opp-tree |
Build an opportunity tree scored by repeat frequency, automation fit, and judgment dependency | "We have 10 automation candidates — which one first?" |
assumptions |
Extract riskiest assumptions across 4 axes (Value/Feasibility/Reliability/Ethics) and design 2-day validation experiments | "What's the biggest risk before we start building?" |
build-or-buy |
Score Build vs Buy vs No-code across 6 axes (differentiation, speed, cost, customization, maintenance, domain) | "Should we use Intercom's bot or build our own agent?" |
hitl |
Set automation levels (1-5) and escalation triggers via reversibility × error-impact matrix | "Can the agent decide refunds, or must a human approve?" |
cost-sim |
Simulate monthly costs at 1→10→100→1,000 users by model pricing × call patterns | "Sonnet at 500 calls/day — what's the monthly bill?" |
agent-gtm |
Score beachhead segments (5 criteria) + design Shadow→Co-pilot→Auto→Delegation trust sequence | "How do we roll this agent out to B2B customers?" |
Commands: /discover · /validate
2. atlas — How to architect it? (7 skills, 2 commands)
| Skill | What it does | When to use |
|---|---|---|
3-tier |
Design Prometheus (strategy) → Atlas (coordination) → Worker (execution) roles, comms, and delegation | "I need 5 agents — who controls whom?" |
orchestration |
Compare Sequential/Parallel/Router/Hierarchical patterns by latency, error rate, and cost | "Should my doc pipeline run serial or parallel?" |
biz-model |
Design per-use / subscription / outcome-based pricing + variable cost analysis targeting >70% margin | "Per-API-call billing or monthly flat fee?" |
router |
Auto-route tasks to T1-T4 models by complexity + fallback chains for 40-80% cost reduction | "Simple FAQ → Haiku, complex analysis → Opus — auto?" |
memory-arch |
Design Working/Episodic/Semantic/Procedural memory layers + token-budget-aware retrieval | "How does today's session recall yesterday's context?" |
moat |
Diagnose 6 moat types: data flywheel, workflow lock-in, network effects, switching costs, specialization, brand | "A competitor ships a GPT clone — what's our defense?" |
growth-loop |
Design usage→data→improvement→re-use loops + cold-start solutions + anti-loop identification | "How do we make recommendations improve with every use?" |
Commands: /architecture · /strategy-review
3. forge — How to spec and ship it? (12 skills, 3 commands)
Onboarding (1): claude-md Core Spec (7): instruction · prd · prompt · ctx-budget · okr · stakeholder-map · agent-plan-review Communication (4): gemini-image-flow · infographic-gif-creator · pptx-ai-slide · agent-demo-video
| Skill | What it does | When to use |
|---|---|---|
claude-md ⭐ |
Scan project structure → auto-generate CLAUDE.md → recommend matching AI_PM_Skills plugins | "New project — set up Claude Code context and find the right skills" |
instruction |
Define Role/Context/Goal/Tools/Memory/Output/Failure with least-privilege tool access | "What goes in (and out of) the system prompt?" |
prd |
7-section agent spec: Instruction/Tools/Memory/Triggers/Output/Failure with dual narrative (tech + biz) | "I need a PRD that covers hallucination recovery and tool permissions" |
prompt |
CRISP framework (Context/Role/Instruction/Scope/Parameters) + Why-First principle + 7 failure pattern avoidance | "Longer prompts make my agent behave worse" |
ctx-budget |
Estimate per-file token usage → classify Essential/Conditional/Excluded → 70% threshold alerts | "How do I fit 5 RAG docs + chat history into 128K?" |
okr |
Dual-axis OKRs: Business Impact + Operational Health with mandatory cost KR | "Is 95% accuracy enough, or do I need cost metrics too?" |
stakeholder-map |
Power-Interest matrix + blocker response strategies + internal champion cultivation | "Legal is blocking the agent rollout — how do I get buy-in?" |
agent-plan-review |
4-axis review + failure mode matrix (5+ types) + Mermaid output | "Find the holes in this design before we start coding" |
gemini-image-flow |
End-to-end Gemini API image pipeline with model tier selection | "Build a sketch→code pipeline" |
infographic-gif-creator |
Convert architecture / workflow into HTML/CSS → GIF/MP4 animations | "Show the multi-agent flow to execs" |
pptx-ai-slide |
Story-driven slide decks (pitch / review / investor variants) | "Board presentation — 10 slides max" |
agent-demo-video |
Screen recordings + animations + narration via Remotion | "Show non-technical stakeholders what the agent does" |
Commands: /write-prd · /set-okr · /sprint
4. argus — How to measure and improve? (8 skills, 2 commands)
| Skill | What it does | When to use |
|---|---|---|
kpi |
Define 5-7 operational + business metrics with leading/lagging split | "What goes on the agent dashboard?" |
reliability |
Quantify P95/P99 worst cases + design safeguards + set SLA tiers | "3 out of 100 responses hallucinate — acceptable?" |
premortem |
Score 10-15 failure modes by Severity × Likelihood × Detection Difficulty | "Give me a 'this must not break' list" |
burn-rate |
Visualize token costs by model/task + spike detection + budget caps | "Token costs jumped 40% — what caused it?" |
north-star |
Select one metric via 5 criteria + set anti-metrics | "Team doesn't know which KPI matters most" |
agent-ab-test |
Calculate MDE + concurrent experiments + control for LLM nondeterminism | "Prompt A vs B — real difference or noise?" |
cohort |
Track performance by deployment cohort (4-week minimum, n≥100) | "Did v2.1 actually improve over v2.0?" |
incident |
Detect silent failures + triage + contain blast radius + 5 Whys | "Agent silent for 30 min — no alerts fired" |
Commands: /health-check · /cost-review
5. muse ⭐ — Turn PM tacit knowledge into agent assets (3 skills, 3 commands)
| Skill | What it does | When to use |
|---|---|---|
pm-framework |
Convert implicit judgment into TK-NNN units with activation/deactivation conditions + knowledge graph linking | "3 years of agent ops experience is stuck in my head" |
pm-decision |
Build a pattern library of recurring PM decisions with context, criteria, and known failures | "I've seen this situation before — why did I decide that way?" |
pm-engine |
Agents dynamically query TK knowledge graph at runtime + auto-extract 1 TK/day + auto-update instructions | "I want my agents to leverage my operational know-how automatically" |
Commands: /extract · /decide · /tk-to-instruction
Start with the PM-ENGINE-MEMORY Starter Kit — 5 seed TK entries to get going immediately.
The framework is open-source; your data (PM-ENGINE-MEMORY.md) is your own asset.
/plugin marketplace add kimsanguine/AI_PM_Skills
/plugin install oracle@kimsanguine-AI_PM_Skills # or atlas, forge, argus, musegit clone https://github.com/kimsanguine/AI_PM_Skills.git
claude --plugin-dir ./AI_PM_Skills/oracle # pick what you needFirst time with Claude Code? → Start with forge/claude-md — it scans your project and recommends the right skills.
Not sure which agent to build yet? → Start with oracle.
Already know what to build? → Start with forge.
| Tool | Skills | Commands | How to use |
|---|---|---|---|
| Gemini CLI | ✅ | ❌ | Copy to .gemini/skills/ |
| Cursor | ✅ | ❌ | Copy to .cursor/skills/ |
| Codex CLI | ✅ | ❌ | Copy to .codex/skills/ |
| Kiro | ✅ | ❌ | Copy to .kiro/skills/ |
📐 Architecture Deep-Dive — Two Layers, Skills 2.0, Trigger Gate, Commands
You don't call skills by name. Describe your task in natural language, and Claude matches it against each SKILL.md's description field to auto-load the best fit. Trigger accuracy: 97.9% across 96 test queries.
The Trigger Gate's "Route" field enables routing between plugins:
| From | Trigger Condition | Route To |
|---|---|---|
opp-tree |
"Validate assumptions for top opportunity" | assumptions |
burn-rate |
"Need model routing change" | router |
prd |
"Need instruction design" | instruction |
pm-framework |
"Convert TK to agent instruction" | pm-engine |
| Command | Chained Skills | Plugin |
|---|---|---|
/discover |
opp-tree → assumptions → build-or-buy | oracle |
/architecture |
orchestration → 3-tier → memory-arch | atlas |
/write-prd |
prd → instruction → ctx-budget | forge |
/health-check |
kpi → reliability → burn-rate | argus |
/tk-to-instruction |
pm-engine → instruction | muse+forge |
| Feature | 1.0 (2025) | 2.0 (2026) | AI_PM_Skills |
|---|---|---|---|
| Auto-invocation | ❌ | ✅ | ✅ 97.9% |
Subagent (context: fork) |
❌ | ✅ | ✅ 5 skills |
| Tool restriction | ❌ | ✅ | ✅ 3-tier |
| Marketplace + Evals | ❌ | ✅ | ✅ Full |
| Dynamic injection | ❌ | ✅ | ✅ 5 skills |
| Hooks | ❌ | ✅ |
⚠️ hookshave a known issue (#17688). Fallbackvalidate_*.shscripts available inreferences/.
AI_PM_Skills/
├── oracle/ # Discovery (6 skills, 2 commands)
├── atlas/ # Architecture (7 skills, 2 commands)
├── forge/ # Execution (12 skills, 3 commands)
├── argus/ # Monitoring (8 skills, 2 commands)
├── muse/ # Knowledge (3 skills, 3 commands)
├── evals/ # Quality + trigger evals
├── docs/images/ # Diagrams
├── validate_plugins.py
└── CONTRIBUTING.md
Every skill follows a consistent internal structure. This isn't just Skills 2.0 spec compliance — it's a content architecture designed for measurable quality and continuous improvement:
oracle/skills/opp-tree/ ← example skill
├── SKILL.md ← Core: frontmatter (name, description,
│ argument-hint, allowed-tools) +
│ Trigger Gate (Use/Route/Boundary) +
│ Failure Handling + Quality Gate
├── context/
│ └── domain.md ← Domain knowledge injected at runtime
│ (e.g., agent economics, industry benchmarks)
├── examples/
│ ├── good-01.md ← ✅ Reference output — "this is what great looks like"
│ └── bad-01.md ← ❌ Anti-pattern — "this is what to avoid and why"
└── references/
├── test-cases.md ← Edge cases, boundary conditions, eval criteria
└── troubleshooting.md ← Common failures + recovery patterns
Why this matters:
| Component | Purpose | Impact |
|---|---|---|
SKILL.md Trigger Gate |
Use/Route/Boundary → prevents wrong skill from firing | 97.9% trigger accuracy |
context/domain.md |
Domain expertise Claude doesn't have natively | +12~46% output quality |
examples/good-01.md |
Concrete "gold standard" output | Anchors Claude's generation |
examples/bad-01.md |
Explicit anti-patterns with explanations | Prevents common failures |
references/test-cases.md |
Edge cases + assertions | Powers eval system (54 assertions) |
This pattern repeats across all 36 skills — 130+ supporting files that make each skill measurable, testable, and improvable.
See CONTRIBUTING.md for guidelines. New skills, improvements, and translations (EN↔KO) are all welcome.
Sanguine Kim — 20-year PM veteran, AI Agent Builder & Educator
Built and scaled AI Dubbing and AI Avatar products, then led Agentic AI product development. Currently exploring the path of AI Agent PM educator — helping PMs navigate the shift from "using AI" to "building agents as products."
📬 For training, consulting, or workshop inquiries: kimsanguine@gmail.com
If you're using this project for corporate training or educational content, I'd appreciate a quick note. Customized consulting and co-teaching are welcome.
- References: Teresa Torres (Continuous Discovery Habits), Anthropic ("Building Effective Agents"), Steve Yegge (Gas Town parallel agent design), Byeonghyeok Kwak (MCP-Skills hierarchy), Michael Polanyi (The Tacit Dimension)
| Repo | What | Link |
|---|---|---|
| AI_PM | Claude Code guide for PMs — learn the why and how | github.com/kimsanguine/AI_PM |
| AI_PM_Skills | Ready-to-use agent skillset — the tools (this repo) | github.com/kimsanguine/AI_PM_Skills |
AI_PM teaches the thinking. AI_PM_Skills gives you the tools.
MIT — LICENSE