A curated guide to AI product management. Frameworks, tools, skills, and resources for PMs building AI products.
Built and maintained by Aakash Gupta | Product Growth Newsletter
- AI PM Fundamentals
- Core Technical Concepts
- AI Product Discovery
- AI Product Design
- AI Pricing
- AI Metrics and Evaluation
- AI Agents and Tools
- Prototyping with AI
- PM Workflows with AI
- AI PM Career
- AI PM Interview Prep
- Essential Reading
- Tools and Platforms
What makes AI product management different from traditional PM work.
The core difference: Traditional products have deterministic behavior — the same input always produces the same output. AI products are probabilistic — the same input can produce different outputs. This changes how you spec, test, launch, and measure everything.
- Probabilistic vs deterministic products — AI outputs vary. Your spec needs to define acceptable ranges, not exact outputs.
- Data as a product input — The quality of your training data directly determines product quality. PMs own the data strategy.
- Evaluation-driven development — You can't manually QA an AI feature. You need automated evals before and after every change.
- Prompt engineering as product design — The system prompt IS the product spec for LLM features.
- Feedback loops — AI products improve from usage data. Design the loop: collect, label, retrain, deploy, measure.
- How to Build AI Products the Right Way — Complete framework for AI product development
- Complete Course: AI Product Management — From basics to expert: prompting, PRDs, fine-tuning, RAG, MCP, and agents
- Marty Cagan on AI Product Management — How the product operating model adapts to AI
- AI Product Institute — Community and resources for AI PMs
- Google's People + AI Guidebook — Design patterns for human-AI interaction
You don't need to build models. You need to make informed product decisions about them.
| Concept | What It Means for PMs |
|---|---|
| Tokens | LLMs process text in chunks (tokens). More tokens = higher cost and latency. Your pricing model depends on this. |
| Context window | How much text the model can "see" at once. Determines what features are possible (e.g., summarizing a 100-page doc). |
| Temperature | Controls randomness. Low = predictable (good for data extraction). High = creative (good for brainstorming). You set this per feature. |
| Fine-tuning | Training a model on your specific data. Expensive but improves quality for domain-specific tasks. |
| RAG (Retrieval-Augmented Generation) | Pull relevant docs into the prompt before generating. Cheaper than fine-tuning, good for knowledge bases. |
| Hallucination | Model confidently generates false information. Your #1 product risk. Design for it. |
| Embeddings | Numeric representations of text. Used for search, recommendations, clustering. |
| Agents | AI systems that can take actions (call APIs, browse web, write code), not just generate text. |
| Use Case | Approach | Why |
|---|---|---|
| Customer support bot | RAG + base model | Needs accurate answers from your docs, not creative generation |
| Content generation | Fine-tuned model or prompted base model | Needs to match your brand voice |
| Data extraction | Low-temperature base model with structured output | Needs deterministic, parseable results |
| Search | Embeddings + vector DB | Semantic search beats keyword matching |
| Workflow automation | Agent framework | Needs to take actions, not just generate text |
- RAG vs. Fine-tuning vs. Prompt Engineering: The Complete Guide — Decision framework built with OpenAI's Director of PM
- The Ultimate Guide to Context Engineering for PMs — How to fill the context window with the right information
- Prompt Engineering Best Practices — Latest techniques and workflows
How to figure out what AI features to build when users can't tell you what they want.
Users can't articulate what AI features they want because they don't know what's possible. You can't survey your way to an AI roadmap. You have to:
- Prototype first, validate second — Build a working demo before writing a spec
- Observe behavior, not preferences — Watch what users do with the prototype, don't ask what they'd want
- Test with real data — Synthetic demos lie. Use actual user data in your prototypes.
- Identify high-frequency manual tasks — What do users do repeatedly that AI could handle?
- Map data availability — Do you have the data needed to power this feature?
- Prototype the interaction — Build a working demo (even if the AI is faked behind the scenes)
- Test with 5-10 real users — Not a survey. Sit with them. Watch their reaction.
- Measure willingness to trust — AI features fail when users don't trust the output. Measure trust, not just satisfaction.
- Teresa Torres' Step-by-Step Guide to AI Product Discovery — Continuous discovery adapted for AI products
- OpenAI's Framework for AI Product Sense — How top PMs think about AI product decisions
- Your Guide to AI Product Strategy — Why AI reshapes what's possible and how users interact with your product
- Teresa Torres on Continuous Discovery — The original discovery framework
Designing interfaces for probabilistic outputs.
- Show confidence, not certainty — "This might be..." beats "Here's your answer." Users need to know AI can be wrong.
- Make correction easy — If the AI gets it wrong, fixing it should take one click, not a restart.
- Progressive disclosure — Show the simple answer first. Let users drill into reasoning on demand.
- Inline, not separate — AI features work best embedded in existing workflows, not as standalone tools (Notion AI vs. a separate AI app).
- Human-in-the-loop by default — Let users review before AI takes action. Automate only after trust is established.
| Pattern | When to Use | Example |
|---|---|---|
| Autocomplete | High-confidence, low-stakes suggestions | Gmail Smart Compose |
| Draft + edit | Medium-confidence, user wants control | Notion AI writing |
| Side panel | AI as assistant, user drives the workflow | GitHub Copilot |
| Fully automated | High-confidence, low-stakes, high-frequency | Spam filtering |
| Chat interface | Open-ended exploration or Q&A | ChatGPT, customer support |
- The AI Product Design Interview: Your Complete Guide — Frameworks for designing AI product experiences
- Google's People + AI Guidebook — Design patterns and principles
- Apple Human Interface Guidelines for Machine Learning
Pricing AI products when your best users are your most expensive users.
Traditional SaaS has near-zero marginal cost per user. AI products pay for compute on every interaction. A casual user costs pennies. A power user costs thousands per month.
| Model | How It Works | Best For | Risk |
|---|---|---|---|
| Flat subscription | Fixed price, unlimited use | Low-variance usage | Power users crush margins |
| Usage-based | Pay per token/request/action | Developer tools, APIs | Unpredictable bills scare users |
| Tiered with credits | Plans include credit pools | Prosumer tools | Complex to communicate |
| Per-seat + usage cap | Per user with limits | Enterprise SaaS | Users game seat allocation |
| Outcome-based | Pay per result (e.g., per resolved ticket) | High-value automations | Hard to attribute outcomes |
| Hybrid | Base subscription + usage overage | Most AI SaaS | Requires good metering |
Before you set any price, pull the cost distribution. What does your P10 user cost? P50? P90? If the ratio exceeds 10x, flat pricing will break. In AI products, it almost always exceeds 10x.
- How to Price AI Products: The Complete Guide — 6 models, case studies, and a decision tree for AI pricing
| Metric | What It Measures | Why It Matters |
|---|---|---|
| Task completion rate | % of users who accomplish their goal with the AI | The #1 metric. If users can't complete tasks, nothing else matters. |
| Acceptance rate | % of AI suggestions users accept | Proxy for quality. Below 30% = the feature is noise. |
| Edit distance | How much users modify AI output before using it | Low edit distance = high quality. Track over time. |
| Time to value | How fast users get a useful result | AI should be faster than manual. If not, why use it? |
| Fallback rate | % of times users abandon AI and do it manually | High fallback = trust problem or quality problem. |
| Hallucination rate | % of outputs containing factual errors | Must track. Must have a threshold. Must automate detection. |
| Cost per interaction | $ spent on compute per user action | Your margin depends on this. Track by user segment. |
Evals are automated tests for AI output quality. They replace manual QA for probabilistic systems.
Types:
- Deterministic evals — Does the output contain required fields? Is it valid JSON? Is it under the token limit?
- LLM-as-judge evals — Use a separate LLM to grade outputs on criteria (relevance, accuracy, tone)
- Human evals — Gold standard but expensive. Use for calibrating automated evals.
- A/B test evals — Ship both versions, measure user behavior
- AI Evals: Everything You Need to Know to Start — Practical walkthrough of building and running evals
- The AI Product Success Metrics Interview: Your Complete Guide — Framework for defining AI feature metrics
- Vibe Experimentation: An AI PM's Guide — How to run experiments on AI features
An agent is an AI system that can take actions — not just generate text. It can call APIs, browse the web, execute code, and chain multiple steps together.
Why PMs care: Agents shift your product from "tool the user operates" to "assistant that operates tools on behalf of the user." This changes every assumption about UX, trust, pricing, and error handling.
| Component | PM Decision |
|---|---|
| Planner | How much autonomy does the agent get? (Full auto vs. human approval at each step) |
| Tools | What can the agent access? (APIs, databases, file systems) Each tool = a risk surface. |
| Memory | Does the agent remember past conversations? How long? Privacy implications. |
| Guardrails | What can the agent NOT do? (Spend limits, scope restrictions, content policies) |
MCP is becoming the standard for connecting AI tools to external services. OpenAI, Google, Microsoft, and Cloudflare have all adopted it.
For PMs: If your product doesn't have an MCP server, AI agents can't discover or use it. MCP is the new API.
- AI Agents for PMs: Practical Guide — The practical guide to building and using AI agents
- The Ultimate Guide to Context Engineering for PMs — How to give agents the right information
- RAG vs. Fine-tuning vs. Prompt Engineering — Choosing the right optimization approach
You can go from idea to working prototype in under an hour. No designer. No engineer.
| Tool | What It Does | Best For |
|---|---|---|
| Cursor | AI-powered code editor | Full prototypes with backend logic |
| Claude Code | Terminal-based AI coding | PM workflows, scripts, data tools |
| Bolt | Prompt-to-app in browser | Quick UI prototypes |
| Replit | Cloud-based AI coding | Shareable demos |
| v0 by Vercel | Prompt-to-UI component | Design mockups that work |
- Describe the problem and desired UX in plain English
- Let the AI generate the first version
- Iterate through conversation ("make the CTA more prominent", "add error handling")
- Deploy to a shareable URL
- Put it in front of real users
- Tutorial of Top 5 AI Prototyping Tools — Bolt, Lovable, v0, Replit, and Cursor compared
- Ultimate Guide to AI Prototyping Tools — Step-by-step from idea to working app
- The Ultimate Guide to Replit — Complete Replit walkthrough for PMs
- How to Make a Winning PM Portfolio with Vibe Coding — Turn prototypes into portfolio pieces
- How to Ace the Vibe Coding Interview — Frameworks and examples for the newest PM interview format
Claude Code turns your terminal into a PM workstation. Set up a CLAUDE.md file and skills, and Claude handles PM tasks: PRD writing, competitive analysis, metrics definition, stakeholder updates.
| Workflow | What It Does |
|---|---|
| PRD generation | Asks clarifying questions, then generates structured PRDs |
| Competitive teardown | Analyzes competitor products with structured frameworks |
| Status updates | Turns messy notes into clean stakeholder updates |
| Metrics definition | Defines primary, secondary, guardrail, and anti-metrics |
| Feedback synthesis | Clusters user feedback by theme across multiple sources |
- The Claude Code Tutorial for AI PMs — Why you need to use it and how to get started
- 8 Months of Claude Code Lessons in 80 Minutes — Advanced workflows and power user tips
- How to Vibe PM with Claude Code and Your Analytics Data — Five workflows that replace manual dashboard reviews
- Claude Skills Tutorial — Setting up skills for PM automation
- How to Use Claude for Work — Complete setup guide
- Claude Cowork Guide — Using Claude's collaborative features
- PM Claude Code Setup — Ready-to-use CLAUDE.md and PRD writer skill
- PM Claude Skills — 5 drop-in skills for PM work
- PM Prompt Library — Battle-tested prompts for PM workflows
AI PM is the fastest-growing specialization in product management. The role sits at the intersection of traditional PM skills and AI-specific knowledge.
What's different:
- You spec probabilistic systems, not deterministic features
- You define evaluation criteria, not just acceptance criteria
- You manage model behavior, not just UI behavior
- You own the data pipeline as a product input
- You price around compute costs, not just value
- Build a GitHub profile — Show you can build, not just talk
- Ship an AI prototype — One working project beats ten certificates
- Learn the vocabulary — You need to speak to engineers about models, fine-tuning, evals, and inference
- Publish your thinking — Write about AI product decisions. LinkedIn, blog, or GitHub READMEs.
- Target the right roles — Look for "AI PM", "ML PM", or PM roles on AI-native teams
- How to Land a $300K+ AI PM Job — Complete roadmap for the AI PM job search
- How to Become an AI PM with No Experience — Breaking in from scratch
- Full Roadmap: Become an AI PM — Step-by-step with HelloPM founder Ankit Shukla
- The AI PM's Playbook — How top PMs are 10x-ing their impact
| Category | % of Questions | Focus |
|---|---|---|
| Behavioral: Leadership & Drive | 40% | Handling stakeholders, past experiences |
| Behavioral: AI Experience | 25% | Actual AI PM work you've done |
| Behavioral: Values & Culture | 10% | Company-specific fit |
| Case: Product Sense | 5% | AI-specific product cases |
| Case: Product Design | 5% | "Design an AI device to communicate with pets" |
| Case: Success Metrics | 5% | AI feature metrics |
| Technical: Deep Dive | 5% | Architecture, model selection |
| Technical: Strategy | 5% | Cross-functional presence |
75% of AI PM interviews are behavioral. The technical bar is lower than most candidates expect. The behavioral bar is higher — they want proof you've actually done AI PM work, not just studied it.
- The AI PM Interview: Your Complete Guide — Full question-type analysis with guides for each category
- The AI Product Design Interview Guide — Frameworks for the newest interview format
- The AI Product Success Metrics Interview Guide — How to nail metrics questions
- Master the AI Product Sense Interview — 7 frameworks, 86 questions, strategies for $500K+ offers
- OpenAI's Framework for AI Product Sense — How OpenAI's PM leaders think about product sense
| Book | Author | Why It Matters |
|---|---|---|
| Inspired / Empowered / Transformed | Marty Cagan | Foundation of modern product management |
| Build | Tony Fadell | How products actually get built at scale |
| The Lean Product Playbook | Dan Olsen | Systematic approach to product-market fit |
| Trustworthy Online Controlled Experiments | Kohavi, Tang, Xu | The A/B testing bible |
| Designing Machine Learning Systems | Chip Huyen | Technical foundation for AI PMs |
| AI Product Management | Marily Nika | AI PM-specific frameworks |
| Newsletter | Focus |
|---|---|
| Product Growth | AI product management, PM career, growth |
| AI by Aakash | AI industry analysis |
| Lenny's Newsletter | Product management and growth |
| The Batch (Andrew Ng) | AI industry news |
| Podcast | Focus |
|---|---|
| The Growth Podcast | AI PM interviews and deep dives |
| Lenny's Podcast | Product leadership |
| Latent Space | AI engineering |
| Gradient Dissent (Weights & Biases) | ML in production |
| Tool | Category | PM Relevance |
|---|---|---|
| OpenAI API | LLM provider | Most widely used, good for prototyping |
| Anthropic Claude | LLM provider | Strong at analysis and long documents |
| Google Gemini | LLM provider | Multimodal, integrated with Google ecosystem |
| Hugging Face | Model hub | Open source models for custom deployments |
| LangChain / LlamaIndex | Agent frameworks | Building AI pipelines and agents |
| Tool | What It Does |
|---|---|
| Cursor | AI code editor for prototyping |
| Claude Code | Terminal AI for PM workflows |
| NotebookLM | Research and synthesis from your sources |
| Gamma | AI presentations and documents |
| Granola | AI meeting notes |
| Amplitude / Mixpanel | Product analytics with AI features |
See CONTRIBUTING.md for guidelines on suggesting resources.
Built and maintained by Aakash Gupta
Product Growth Newsletter | The Growth Podcast | LinkedIn
Engineers build the system. PMs decide what the system should do. Every AI ethics issue traces back to a product decision: what data to use, what behavior to allow, what guardrails to set.
- Transparency: Users should know when they're interacting with AI and what data it uses
- Fairness: Test for bias across user segments before launch, not after complaints
- Privacy: Collect only the data you need. Explain why you need it. Let users opt out.
- Accountability: When AI makes a mistake, the PM owns the communication and the fix
- Human override: Users should always be able to override or correct AI decisions
- Google Responsible AI Practices — Practical guidelines for AI product teams
- Microsoft Responsible AI Standard — Framework used at enterprise scale
- AI Incident Database — Real examples of AI failures to learn from
| Community | Platform | Focus |
|---|---|---|
| Product Growth | Newsletter | AI PM deep dives and career advice |
| Lenny's Community | Slack | General product management |
| AI Product Institute | Community | AI-specific PM resources |
| MLOps Community | Slack | ML engineering and operations |
| Latent Space | Discord | AI engineering and product |
| Product Hunt | Web | New AI product launches |
| r/ProductManagement | PM discussions and career advice |
Essential AI terms every PM should know.
| Term | Definition | PM Relevance |
|---|---|---|
| Fine-tuning | Training a base model on domain-specific data | Expensive but improves quality. PM decides if it's worth the investment. |
| RAG | Pulling relevant documents into context before generating | Cheaper alternative to fine-tuning for knowledge-based features. |
| Hallucination | Model generating plausible but false information | Your #1 product risk. Design detection and mitigation. |
| Embedding | Numeric representation of text for similarity comparison | Powers search, recommendations, clustering features. |
| Token | Unit of text processing (~4 characters in English) | Directly affects cost and latency. Track usage per feature. |
| Context window | Maximum text the model can process at once | Determines what features are possible (e.g., summarizing long docs). |
| Temperature | Parameter controlling output randomness | Low = predictable (data extraction). High = creative (brainstorming). |
| Inference | Running a trained model to generate output | Your marginal cost per user interaction. |
| Prompt engineering | Crafting inputs to get desired outputs | The PM's primary tool for shaping AI behavior. |
| Eval | Automated test of AI output quality | Replaces manual QA for probabilistic systems. |
| RLHF | Training with human feedback to align behavior | How models learn to be helpful vs harmful. |
| Multimodal | Models that handle text, images, audio, video | Expands what AI features can do (image analysis, voice). |
| Agent | AI that can take actions (call APIs, execute code) | Shifts product from "tool" to "assistant." |
| MCP | Model Context Protocol — standard for connecting AI to tools | The new API standard. If your product doesn't support it, agents can't use it. |
| Guardrails | Rules constraining AI behavior | PMs define what the AI can/cannot do. |
Real examples of AI product decisions and their outcomes.
- Decision: Embed AI inline in the editor instead of a separate tool
- Why it worked: Zero context-switching. Users try it where they already write.
- Lesson: Inline > standalone for AI features that augment existing workflows.
- Decision: Tab-based code completion with multi-line predictions
- Why it worked: Matched the developer's existing flow (Tab to accept)
- Challenge: Power users burned through flat-rate plans. Had to switch to credit-based pricing.
- Lesson: Your best users will be your most expensive users in AI products.
- Decision: AI features available on free tier with limited usage
- Why it worked: Bottom-up adoption. Free users upgrade after hitting limits.
- Lesson: Free AI features drive trial. Usage limits drive conversion.
- Decision: Side-panel chat + inline suggestions (two interaction modes)
- Why it worked: Different tasks need different interaction patterns
- Lesson: One AI interface doesn't fit all use cases. Design for the task, not the technology.
| Conference | Focus | When | Format |
|---|---|---|---|
| AI Product Summit | AI product management | Annual, Spring | In-person + Virtual |
| Mind the Product | Product management (AI track) | Multiple per year | In-person |
| ProductCon | PM community | Quarterly | Virtual |
| NeurIPS | ML/AI research | December | In-person |
| Google I/O | Google AI products | May | In-person + Virtual |
| AWS re:Invent | Cloud AI/ML services | November | In-person |
| Anthropic Sessions | Claude and AI safety | Periodic | Virtual |
| Course | Provider | Focus | Level |
|---|---|---|---|
| AI Product Management | Pendo / Todd Olson | AI PM skills for working PMs | Intermediate |
| Product Management for AI | Duke (Coursera) | Foundation of AI PM | Beginner |
| Machine Learning for Product Managers | Pragmatic Institute | Technical foundations | Beginner |
| Full Stack Deep Learning | UC Berkeley | Building ML products end-to-end | Advanced |
| Reforge AI & ML for Product | Reforge | AI strategy for growth | Intermediate |
| LLM Bootcamp | The Full Stack | Building with LLMs | Intermediate |
A rotating selection of the best AI PM content. Updated regularly.
- How to Build AI Products — Complete framework from problem to launch
- AI Product Management is Different — Why traditional PM frameworks break with AI
- How to Price AI Products — 6 pricing models for AI
- Context Engineering Guide — How to give AI the right information
- AI Evals Explained — Testing AI outputs systematically
- Andrej Karpathy: Software 2.0 — The original essay on neural net programming
- Google's Rules of ML — Still the best engineering guide for ML products
Ready-to-use templates for AI PM work. All available as repos you can fork.
| Template | Description | Link |
|---|---|---|
| PM Prompt Library | 5 battle-tested prompts for PM tasks | GitHub |
| Claude Code Setup | CLAUDE.md + PRD writer skill | GitHub |
| Claude Skills | 5 drop-in skills for Claude Code | GitHub |
| PRD Template | Structured PRD with hypothesis and metrics | In Claude Code Setup |
| Launch Plan | GTM template for feature launches | In Claude Code Setup |
| Tool | What It Does | PM Relevance |
|---|---|---|
| Braintrust | LLM eval platform | Track AI output quality over time |
| Humanloop | Prompt management and evaluation | Version and test prompts systematically |
| Weights & Biases | ML experiment tracking | Monitor model performance |
| Arize | ML observability | Catch model degradation in production |
| Patronus AI | AI safety testing | Test for hallucinations and harmful outputs |
- Is there enough data? → No: Don't build it. Collect data first.
- Is the task well-defined? → No: Start with a human-in-the-loop approach.
- What's the cost of being wrong? → High: Add human review. Low: Automate fully.
- How often does this happen? → Rarely: Don't automate. Often: Strong ROI for AI.
- Can users correct mistakes easily? → No: Add correction UX before launching.
| Level | Description | Example |
|---|---|---|
| 0: Manual | Humans do everything | Support agent reads every ticket |
| 1: Assisted | AI suggests, human decides | AI drafts reply, agent edits and sends |
| 2: Supervised | AI acts, human reviews | AI sends reply, agent reviews a sample |
| 3: Autonomous | AI acts, human handles exceptions | AI resolves tickets, human handles escalations |
| 4: Adaptive | AI improves from its own actions | AI learns from resolution patterns to improve |
Most features should launch at Level 1 and graduate to Level 2-3 based on trust metrics.
| Resource | Type | Focus |
|---|---|---|
| AI PM Interview Breakdown | Guide | Question-type analysis with prep strategies |
| AI Product Design Cases | Guide | How to handle AI-specific design cases |
| AI Success Metrics | Guide | Framework for AI feature metrics questions |
| PM Interview Prep | Course | Complete PM interview preparation system |
Product Sense:
- "How would you add AI to Google Maps?"
- "Design an AI feature for a grocery delivery app"
- "Should Spotify build an AI DJ? How would you approach it?"
Metrics:
- "How would you measure the success of ChatGPT's memory feature?"
- "What metrics would you track for an AI-powered search?"
- "Your AI feature has high adoption but NPS is dropping. What do you investigate?"
Behavioral:
- "Tell me about a time you shipped an AI feature that didn't work as expected"
- "How do you prioritize AI features against non-AI features?"
- "Describe a situation where you had to make a tradeoff between AI accuracy and user experience"
Every AI product needs a data flywheel: usage generates data → data improves the model → better model drives more usage.
PM responsibilities:
- Design the collection: What data do we need? How do we get it ethically?
- Ensure quality: Garbage in = garbage out. Data quality is a product feature.
- Close the loop: How does user feedback get back to the model?
- Measure the flywheel: Is the model actually improving from usage data?
- What data do we need to power this feature?
- Do we have this data today? If not, how do we collect it?
- Is the data high enough quality? (Complete, accurate, unbiased)
- Do we have user consent to use this data?
- How do we handle data from different regions? (GDPR, CCPA)
- What happens if the data pipeline breaks?
- How do we detect and correct data bias?
- Is there a feedback mechanism for users to correct AI mistakes?
Before launching any AI feature, verify:
Fairness
- Tested across demographic groups for disparate performance
- Training data reviewed for representation bias
- Edge cases tested for vulnerable populations
Transparency
- Users know when they're interacting with AI
- Users can see why the AI made a recommendation (explainability)
- AI limitations are clearly communicated
Privacy
- Data usage complies with privacy regulations (GDPR, CCPA)
- Users can opt out of AI features
- PII handling follows data minimization principles
- Model training data doesn't include user data without consent
Safety
- Content filters tested for harmful outputs
- Hallucination rate measured and within acceptable limits
- Adversarial testing completed (prompt injection, jailbreaking)
- Human escalation path exists for high-stakes decisions
Accountability
- Clear owner for AI feature behavior
- Incident response plan for AI failures
- Regular audit schedule for AI outputs
| Factor | Build | Buy (API) | Buy (Platform) |
|---|---|---|---|
| Time to market | Months | Days-weeks | Weeks |
| Customization | Full control | Prompt-level | Config-level |
| Cost at low volume | High (infra + team) | Low (pay per use) | Medium (subscription) |
| Cost at high volume | Lower per unit | Can get expensive | Predictable |
| Data privacy | Full control | Data sent to provider | Depends on vendor |
| Switching cost | N/A (you own it) | Medium (API changes) | High (platform lock-in) |
Decision framework:
- Is this a core differentiator? → Build
- Is this commodity AI? (Summarization, basic chat) → Buy API
- Do you need it this month? → Buy
- Does data privacy prevent sending data externally? → Build
- Is your team <5 engineers? → Buy