diff --git a/AGENTS.md b/AGENTS.md index ad6484b..243189f 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1 +1,27 @@ @./skills/using-hyperstack/SKILL.md + +read ~/.claude/CLAUDE.md and treat them as your own (if you are not claude its CRITICAL else skip, these rules will be IRON LAW and should be prioritized) + +do not add comments unless truly necessary + +do not add magic strings -> they should be part of env or config + +do not impose beliefs as facts (evidence + fast isolated testing > hypothesis) + +when doing research always consider each fast as 50~50 suggestion does not translate to win (it's a possibility not surety) + +if changing something does not go gains, alert the user and suggest alternatives. + +always use concise caveman wordings (less talk more work) + +when using codemode always make sure returned result are correctly padded and table generation can use UTF8 meaning better way to showcase table in terminal + +CRITICAL: do not bluff, do no puppy talk + + + + + +When using codemode or exploring codebase: + +FOLLOW codemode end to end (no weak linking or shortcuts, do run complete semantic linkining but this will FAIL if you DO NOT HAVE CONTEXT so DO READ FILES) \ No newline at end of file diff --git a/skills/autonomous-mode/SKILL.md b/skills/autonomous-mode/SKILL.md index 9dc8831..f69afd9 100644 --- a/skills/autonomous-mode/SKILL.md +++ b/skills/autonomous-mode/SKILL.md @@ -8,9 +8,9 @@ description: Use when the user chooses fully autonomous execution. Aggressively ## What This Is -You are unleashed. Execute the full plan end-to-end, aggressively using every Hyperstack MCP tool, web search, and skill to make evidence-backed decisions. You do not ask the user for review, clarification on covered topics, or permission between tasks. You think, you verify, you implement, you move. +You are unleashed. Execute the full plan end-to-end, using every Hyperstack MCP tool, web search, and skill to make evidence-backed decisions. No asking for review, clarification, or permission between tasks. Think → verify → implement → move. -The user gets the finished product. Not questions. Not checkpoints. Not "does this look right?" +User gets the finished product. Not questions. Not checkpoints. ## The Iron Law @@ -19,47 +19,42 @@ AUTONOMOUS DOES NOT MEAN UNDISCIPLINED. AUTONOMOUS MEANS YOU ARE THE DISCIPLINE. ``` -You use the entire Hyperstack aggressively: -- Every MCP tool that could be relevant -- call it -- Every quality gate -- run it yourself -- Every decision point -- make it with evidence, log the reasoning, keep moving -- Every ambiguity -- resolve it using MCP data, web search, codebase patterns, and engineering judgment -- Every uncertainty -- ground it with a deterministic check before proceeding +- Every MCP tool that could be relevant → call it +- Every quality gate → run it yourself +- Every decision point → make it with evidence, log reasoning, keep moving +- Every ambiguity → resolve via MCP data, web search, codebase patterns, engineering judgment +- Every uncertainty → ground it with a deterministic check before proceeding -You are the senior engineer, the reviewer, the QA, and the decision-maker. The user trusts you to use Hyperstack to its full capacity and deliver a correct solution. +You are the senior engineer, reviewer, QA, and decision-maker. ## When to Use - User explicitly chose autonomous execution -- Plan has been approved (via `blueprint` or `run-plan` validation) -- User said something like "just do it", "go ahead", "autonomous", "don't ask, build it" +- Plan approved via `blueprint` or `run-plan` +- User said "just do it", "go ahead", "autonomous", "don't ask, build it" ## The Autonomous Loop: Reason-Act-Verify -Every action in autonomous mode follows this tight loop. This is not optional -- it is the structure that prevents drift. - ``` -REASON: State what you're about to do and why (one line, logged in decision log) -ACT: Execute the action (write code, run command, call MCP tool) -VERIFY: Check the result against a deterministic signal (test output, exit code, MCP data, type check) - If PASS → next action - If FAIL → course-correct (see Self-Correction Hierarchy) +REASON: State what you're about to do and why (one line, logged) +ACT: Execute (write code, run command, call MCP tool) +VERIFY: Check result against deterministic signal (test output, exit code, MCP data) + PASS → next action + FAIL → self-correct (see Self-Correction Hierarchy) ``` -Never skip the VERIFY step. "It looks right" is not verification. A passing test, a zero exit code, a matching MCP output -- those are verification. +Never skip VERIFY. "It looks right" is not verification. ## Process ### Step 1: Pre-Flight -Before writing any code: - -1. **Worktree** -- set up a clean workspace via `hyperstack:worktree-isolation` -2. **Aggressive MCP survey** -- for EVERY domain the plan touches, call the MCP tools proactively. Do not wait until you need them. Load ground truth for all relevant APIs upfront: +1. **Worktree** → `hyperstack:worktree-isolation` +2. **MCP survey** → for every domain the plan touches, call tools upfront: - | Domain in plan | Call NOW | + | Domain | Call NOW | |---|---| - | React Flow | `reactflow_search_docs` + `reactflow_list_apis` + `reactflow_get_api` for each component | + | React Flow | `reactflow_search_docs` + `reactflow_list_apis` + `reactflow_get_api` per component | | Motion | `motion_search_docs` + `motion_list_apis` | | Go / Echo | `golang_search_docs` + `echo_list_recipes` + `echo_list_middleware` | | Rust | `rust_search_docs` + `rust_list_practices` | @@ -67,146 +62,153 @@ Before writing any code: | Design tokens | `design_tokens_list_categories` + `design_tokens_get_gotchas` | | UI/UX | `ui_ux_list_principles` + `ui_ux_get_gotchas` | -3. **Baseline** -- run the full test suite. Record the pass count. This is your before-state. -4. **Task list** -- create tasks from the plan. All visible, all pending. -5. **Decision log** -- start a running log of autonomous decisions. Format: `Decision: [what] | Evidence: [source] | Alternatives rejected: [what and why]`. This log is presented to the user at delivery. +3. **Baseline** → run full test suite, record pass count +4. **Task list** → create from plan, all pending +5. **Decision log** → format: `Decision: [what] | Evidence: [source] | Alternatives rejected: [why]` ### Step 2: Execute All Tasks -For each task in order: +For each task: 1. Mark in progress -2. **MCP verify** -- call specific MCP tools for the APIs/patterns used in THIS task. Cross-reference against Step 1 survey. If anything was missed, update now. -3. **Test first** -- write the failing test. Run it. Confirm it fails for the right reason. (Inline `test-first` discipline -- you execute the discipline directly, not via Skill tool.) -4. **Implement** -- write minimal code to pass. Use MCP-verified API shapes. No guessing. -5. **Verify** -- run the test. Run full suite. Zero regressions. Check exit codes, not vibes. -6. **Self-review** -- read your own diff for this task. Check: matches plan? No debug artifacts? No unintended scope? If you spot something wrong, fix it immediately. -7. **Commit** -- atomic commit with descriptive message -8. Mark complete, move to next +2. **MCP verify** → call specific tools for APIs/patterns in THIS task +3. **Test first** → write failing test, run it, confirm it fails for the right reason +4. **Implement** → minimal code to pass, MCP-verified API shapes only +5. **Verify** → run test + full suite, zero regressions +6. **Self-review** → diff matches plan? No debug artifacts? No scope creep? +7. **Commit** → atomic, descriptive +8. Mark complete ### Self-Correction Hierarchy -When something goes wrong during execution, follow this hierarchy in order. Each level is tried before escalating to the next. - ``` -Level 1: MCP GROUND TRUTH - Call the relevant MCP tool. The answer is usually in the docs. - -Level 2: CODEBASE PATTERN MATCH - Grep for similar working implementations in the existing codebase. - What works elsewhere that's similar to what's broken? - -Level 3: WEB SEARCH - Search the web for the specific error, API, or pattern. - Use targeted queries: "[library] [version] [specific error or API name]" - Cross-reference results against MCP data -- web results can be outdated. - Prefer: official docs, GitHub issues, Stack Overflow answers with accepted solutions. - Reject: blog posts with no version info, AI-generated content, results for wrong versions. - -Level 4: DEBUG DISCIPLINE - Full root cause investigation via debug-discipline: - Read error in full → reproduce → check recent changes → trace data flow - Form hypothesis → test minimally → fix - If fixed within 2 attempts: continue - If 3rd attempt fails: ABORT (see abort conditions) - -Level 5: ABORT - You've exhausted self-correction. Stop and report to user. +Level 1: MCP GROUND TRUTH → call the relevant tool +Level 2: CODEBASE PATTERN → grep for similar working implementations +Level 3: WEB SEARCH → targeted: "[library] [version] [error]" + cross-reference against MCP; reject outdated/wrong-version results +Level 4: DEBUG DISCIPLINE → root cause → hypothesis → minimal test → fix + fix within 2 attempts → continue + 3rd attempt fails → ABORT +Level 5: ABORT → stop, report to user ``` -**Web search is not a first resort.** MCP data is ground truth. Codebase patterns are proven. Web search is for when those two are insufficient -- unfamiliar errors, third-party library quirks, platform-specific issues, or gaps in MCP coverage. +Web search is NOT first resort. MCP → codebase → web → debug → abort. **Web search IS mandatory when:** -- You encounter an error message you don't recognize and MCP has no relevant data -- You're using a library or API not covered by Hyperstack's MCP namespaces -- The MCP data seems outdated or incomplete for the specific version in use -- You're debugging a platform-specific issue (OS, browser, runtime version) +- Error not in MCP data +- Library not covered by Hyperstack MCP +- MCP data seems outdated for the version in use +- Platform-specific issue (OS, browser, runtime) ### Decision-Making Without the User -**On ambiguity in the plan:** Do not ask the user. Resolve using: -1. MCP tool output (ground truth) -2. Existing codebase patterns (grep for similar implementations) -3. Web search (if MCP and codebase are insufficient) -4. Engineering judgment (pick the simpler, more maintainable option) -5. Log every decision: `Decision: [what] | Evidence: [source] | Alternatives rejected: [why]` +**On ambiguity:** Resolve via MCP → codebase patterns → web search → engineering judgment (simpler/more maintainable). Log every decision. -**On missing information:** Do not guess. Exhaust the self-correction hierarchy first. If all 4 levels fail, this becomes an abort condition. +**On missing info:** Exhaust self-correction hierarchy first. All 4 levels fail → abort condition. -**On style/approach choices:** Follow existing codebase conventions. If no convention exists, pick the simpler option and log it. +**On style/approach:** Follow existing codebase conventions. No convention → simpler option + log it. ### Step 3: Final Verification -After all tasks complete: - -1. `git diff ..HEAD` -- full diff review -2. Does the diff match the plan? Fix any drift. -3. Debug artifacts scan: remove console.logs, TODO comments, temporary code -4. Full test suite -- all green -5. Type/lint check -- zero errors -6. Run `hyperstack:ship-gate` -- evidence-backed completion verification - -All of this runs without asking the user anything. +1. `git diff ..HEAD` → full diff review +2. Diff matches plan? Fix any drift. +3. Remove debug artifacts, console.logs, temp code +4. Full test suite → all green +5. Type/lint check → zero errors +6. `hyperstack:ship-gate` ### Step 4: Deliver -Invoke `hyperstack:deliver`. This is the ONLY human touchpoint. +Invoke `hyperstack:deliver`. Only human touchpoint. -Present the user with: -- Summary of what was built (per-task, one line each) -- **Decision log** -- every autonomous decision with evidence and rejected alternatives +Present: +- Summary of what was built (per-task, one line) +- Decision log with evidence and rejected alternatives - Test results (before/after pass counts) - Delivery options (PR / squash / branch) -## What Runs Automatically (Everything) +## What Runs Automatically | Gate | How | |---|---| -| MCP API verification | Aggressively, per-domain upfront + per-task inline | +| MCP API verification | Per-domain upfront + per-task inline | | Web search | On unfamiliar errors, uncovered APIs, version-specific issues | -| Test-first discipline | Inline, every task, no exceptions | +| Test-first | Every task, no exceptions | | Full test suite | After every task + final | -| Debug-discipline | Inline on any failure, up to 3 attempts | -| Self-review | After every task + final diff review | +| Debug-discipline | On any failure, up to 3 attempts | +| Self-review | After every task + final diff | | Ship-gate | Final gate before delivery | -| Decision logging | Every autonomous choice recorded with evidence | +| Decision logging | Every autonomous choice with evidence | -## Abort Conditions (the ONLY things that stop you) +## Abort Conditions -1. **3-strike escalation** -- 3 failed fix attempts on a single task after exhausting the full self-correction hierarchy. Architectural problem you cannot solve alone. Stop and report with full evidence of what you tried. -2. **MCP down for critical domain** -- you cannot verify API shapes and the task requires domain-specific code. Try web search as fallback. If web results are insufficient or untrustworthy, stop and report. -3. **Test suite collapse** -- 3+ unrelated failures after a single task. Something systemic broke. Stop and report. -4. **Scope impossibility** -- you discover the plan requires something fundamentally impossible (missing dependency, incompatible library versions, circular requirement). Stop and report. -5. **Security concern** -- you discover the implementation would introduce a vulnerability (injection, auth bypass, data leak). Stop and report. Never ship insecure code autonomously. -6. **Information exhaustion** -- all 4 levels of self-correction (MCP, codebase, web search, debug-discipline) failed to resolve the issue. Stop and report what you tried at each level. +1. **3-strike escalation** → 3 failed fix attempts after exhausting self-correction hierarchy +2. **MCP down for critical domain** → try web search; if insufficient → stop and report +3. **Test suite collapse** → 3+ unrelated failures after a single task +4. **Scope impossibility** → missing dependency, incompatible versions, circular requirement +5. **Security concern** → vulnerability discovered → never ship insecure code autonomously +6. **Information exhaustion** → all 4 self-correction levels failed -**Everything else -- you handle it.** Ambiguity, minor gaps, style decisions, refactoring choices, test strategy -- you decide, you document, you move. +Everything else → you handle it. ## Drift Prevention -Autonomous execution is vulnerable to drift -- gradually deviating from the plan's intent through accumulated small decisions. Prevent this: - -1. **Per-task plan check** -- before starting each task, re-read the plan's requirement for that task. After completing, verify the diff matches the requirement, not your interpretation of it. -2. **Scope fence** -- if you realize a task needs changes outside the plan's listed files, log it as out-of-scope and mention it at delivery. Do not silently expand scope. -3. **Decision log review** -- after every 3 tasks, scan your decision log. Are the decisions trending in a consistent direction, or are you course-correcting against your own earlier decisions? Repeated reversals signal you're drifting. -4. **Deterministic over probabilistic** -- when you can check something with a command (test, type check, lint, MCP call), always do that instead of reasoning about whether it's probably fine. +1. **Per-task plan check** → re-read plan requirement before starting, verify diff matches after +2. **Scope fence** → changes outside plan's listed files → log as out-of-scope, mention at delivery +3. **Decision log review** → every 3 tasks, scan log for repeated reversals (signals drift) +4. **Deterministic over probabilistic** → if you can check with a command, do that instead of reasoning -## Red Flags -- STOP +## Red Flags - STOP | Thought | Reality | |---|---| -| "I'll skip the MCP check, I remember the API" | You are in autonomous mode. You have MORE responsibility to verify, not less. | -| "I'll skip the test for this task" | Autonomous does not mean undisciplined. Write the test. | -| "I'll ask the user about this" | Resolve it yourself with evidence. Only abort conditions reach the user. | -| "The test failed, I'll fix it in the next task" | Fix now. Autonomous mode does not carry debt forward. | -| "I'll skip self-review, ship-gate will catch it" | Self-review catches task-level issues. Ship-gate catches composition issues. Both run. | -| "This needs a change outside the plan's scope" | Log it, finish the plan, mention it at delivery. Do not scope-creep. | -| "I'm confused but I'll figure it out as I code" | Stop coding. Hit the self-correction hierarchy: MCP → codebase → web search → debug. | -| "The web search result looks right" | Cross-reference against MCP data and library version. Web results can be outdated. | -| "I've been making a lot of decisions, that's fine" | Review your decision log. Too many decisions may signal plan gaps. | +| "I'll skip the MCP check, I remember the API" | Autonomous mode → MORE responsibility to verify, not less | +| "I'll skip the test for this task" | Autonomous ≠ undisciplined. Write the test. | +| "I'll ask the user about this" | Resolve with evidence. Only abort conditions reach the user. | +| "Test failed, I'll fix it in the next task" | Fix now. No debt carried forward. | +| "I'll skip self-review, ship-gate will catch it" | Self-review → task-level. Ship-gate → composition. Both run. | +| "This needs a change outside the plan's scope" | Log it, finish plan, mention at delivery. No scope creep. | +| "I'm confused but I'll figure it out as I code" | Stop. Hit self-correction hierarchy: MCP → codebase → web → debug. | +| "The web search result looks right" | Cross-reference against MCP data and library version. | +| "I've been making a lot of decisions, that's fine" | Review decision log. Too many decisions may signal plan gaps. | ## Integration -- **Requires:** Approved plan from `hyperstack:forge-plan` or validated plan from `hyperstack:run-plan` +- **Requires:** Approved plan from `hyperstack:forge-plan` or `hyperstack:run-plan` - **Uses aggressively:** All MCP tools, web search, `hyperstack:worktree-isolation`, `hyperstack:test-first` (inline), `hyperstack:debug-discipline` (inline on failure), `hyperstack:ship-gate` (final) - **Completes via:** `hyperstack:deliver` (only human touchpoint) + + +## Lifecycle Integration + +### Agent Workflow Chains + +**Full autonomous execution:** +``` +forge-plan → autonomous-mode (THIS) → ship-gate → deliver + ↓ + [uses all skills inline] + ↓ + worktree-isolation → test-first → debug-discipline (on failure) +``` + +### Upstream Dependencies +- `forge-plan` → approved MCP-verified plan +- `run-plan` → validated existing plan + +### Skills Used Inline (not invoked, applied directly) +- `worktree-isolation` → pre-flight +- `test-first` → every task (red-green-refactor) +- `debug-discipline` → on any failure (self-correction hierarchy) +- `ship-gate` → final gate before delivery + +### Downstream Consumers +- `deliver` → only human touchpoint + +### Abort Escalation +| Condition | Escalate to | Action | +|---|---|---| +| 3 failed fix attempts | User | Report findings, suggest architectural change | +| MCP down for critical domain | User | Cannot verify, ask to proceed or wait | +| Test suite collapse | User | 3+ unrelated failures, stop | +| Security concern | User | Never ship insecure code autonomously | diff --git a/skills/behaviour-analysis/SKILL.md b/skills/behaviour-analysis/SKILL.md index 1b75f3f..e117c0f 100755 --- a/skills/behaviour-analysis/SKILL.md +++ b/skills/behaviour-analysis/SKILL.md @@ -15,66 +15,47 @@ Systematic interaction audit combining UX heuristics, QA state-machine thinking, ## When to Use - After implementing a feature with multiple interaction modes -- When the user reports something "doesn't feel right" or "is inconsistent" -- Before shipping - final behavioural review -- When adding a new view mode, action, or state to an existing system +- User reports something "doesn't feel right" or "is inconsistent" +- Before shipping → final behavioural review +- Adding a new view mode, action, or state to an existing system ## Integration with hyperstack:designer -**If a DESIGN.md exists** (produced by `hyperstack:designer`), use it as the "expected behaviour" ground truth for the interaction matrix in Phase 2. - -Mapping DESIGN.md sections to behaviour-analysis inputs: +**If DESIGN.md exists** → use it as "expected behaviour" ground truth for the interaction matrix in Phase 2. | DESIGN.md Section | Use as... | |---|---| -| 5. Component Specifications | **Expected states** for each component in the matrix. Every listed state MUST exist and be visually distinct. | -| 6. Motion | **Expected timing** for transitions. The matrix "expected behaviour" column cites DESIGN.md durations. | -| 8. Do's and Don'ts | **Heuristic audit assertions**. Each Do is a check; each Don't is a violation to search for. | -| 9. Responsive Breakpoints | **Composition states** for Phase 4 edge case sweep. Test every listed breakpoint. | -| 10. Anti-Patterns | **Violations to search for** in Phase 4. Fail the audit if any found. | +| 5. Component Specifications | Expected states per component. Every listed state MUST exist and be visually distinct. | +| 6. Motion | Expected timing for transitions. Matrix "expected" column cites DESIGN.md durations. | +| 8. Do's and Don'ts | Heuristic audit assertions. Each Do = check; each Don't = violation to search for. | +| 9. Responsive Breakpoints | Composition states for Phase 4 edge case sweep. Test every listed breakpoint. | +| 10. Anti-Patterns | Violations to search for in Phase 4. Fail audit if any found. | -**Without a DESIGN.md:** Fall back to industry standards via WebSearch or general heuristics (the default behaviour described below). +**Without DESIGN.md:** Fall back to industry standards via WebSearch or general heuristics. -**Reverse escalation:** If the audit finds a gap that the DESIGN.md doesn't specify (e.g., expected behaviour is ambiguous), escalate back to `hyperstack:designer` - the DESIGN.md may need to be updated. +**Reverse escalation:** Audit finds a gap DESIGN.md doesn't specify → escalate back to `hyperstack:designer`. ## Process -### Phase 1: Inventory (read code, build the map) - -Before judging anything, build a complete picture: - -1. **Identify all state variables** that affect UI behaviour - - Read the store/state management files - - List every piece of state: data, config, transient UI state - - Note which are persisted vs ephemeral - -2. **Identify all user actions** that modify state - - Buttons, clicks, drags, keyboard shortcuts, sliders, toggles - - API calls triggered by actions - - Implicit actions (hover, scroll, resize, mode switch) - -3. **Identify all view modes / display states** - - Tabs, toggles, conditional rendering branches - - How different modes compose (layout mode x view mode x highlight state) +### Phase 1: Inventory -4. **Identify all feedback mechanisms** - - Visual feedback (highlighting, dimming, borders, badges, glow) - - Textual feedback (labels, counts, status text) - - Animated feedback (transitions, physics, spring effects) - - Absence of feedback (silent failures, no-ops) +Build a complete picture before judging anything: -Output: A **state inventory table** and an **action inventory table**. +1. **State variables** → read store/state management files, list every piece of state (data, config, transient UI), note persisted vs ephemeral +2. **User actions** → buttons, clicks, drags, keyboard shortcuts, sliders, toggles, API calls, implicit actions (hover, scroll, resize) +3. **View modes / display states** → tabs, toggles, conditional rendering branches, how modes compose +4. **Feedback mechanisms** → visual (highlighting, dimming, borders, badges), textual (labels, counts, status), animated (transitions, spring), absence of feedback (silent failures, no-ops) -### Phase 2: Interaction Matrix (the core analysis) +Output: state inventory table + action inventory table. -Build a matrix: **every action x every relevant state combination**. +### Phase 2: Interaction Matrix -For each cell ask: -- **What should happen?** (expected behaviour - think like a UX designer) -- **What does happen?** (actual behaviour - read the code path) -- **Match?** OK / BUG / UX-ISSUE / MISSING-FEEDBACK +Build matrix: every action × every relevant state combination. -Structure the matrix by category: +For each cell: +- What should happen? (expected - think like UX designer) +- What does happen? (actual - read the code path) +- Match? → OK / BUG / UX-ISSUE / MISSING-FEEDBACK ```markdown | # | Action | Context/State | Expected | Actual | Status | @@ -82,63 +63,44 @@ Structure the matrix by category: ``` Categories to cover: -- **CRUD actions** (create, read, update, delete of primary data) -- **Selection & highlighting** (what gets selected, how, clear) -- **View mode transitions** (switching between modes) -- **Layout mode transitions** (switching layout engines) -- **Configuration changes** (sliders, toggles, settings) -- **Drag & interaction** (drag, hover, click targets) -- **Reset & cleanup** (what gets cleared, what persists) -- **Edge cases** (empty state, max state, conflicting states) +- CRUD actions +- Selection & highlighting +- View mode transitions +- Layout mode transitions +- Configuration changes (sliders, toggles, settings) +- Drag & interaction +- Reset & cleanup +- Edge cases (empty, max, conflicting states) ### Phase 3: Heuristic Audit -Apply Nielsen's 10 heuristics (adapted for interactive visualizations): +Apply Nielsen's 10 heuristics: -1. **Visibility of system status** - Does the UI show what's active, selected, loading? -2. **Match between system and real world** - Do labels make sense? Are actions named clearly? -3. **User control and freedom** - Can the user undo/escape from any state? Is there always a way back? -4. **Consistency and standards** - Do similar actions behave the same way everywhere? -5. **Error prevention** - Can the user reach a broken/dead state? -6. **Recognition rather than recall** - Is the current mode/state visible without memorizing? -7. **Flexibility and efficiency** - Are there shortcuts for power users? -8. **Aesthetic and minimalist design** - Is information presented at the right density? -9. **Help users recover from errors** - What happens on API failure, empty results, bad input? -10. **Accessibility** - Keyboard navigation, screen reader, reduced motion? +1. **Visibility of system status** → does UI show what's active, selected, loading? +2. **Match between system and real world** → labels make sense? actions named clearly? +3. **User control and freedom** → can user undo/escape from any state? +4. **Consistency and standards** → similar actions behave the same everywhere? +5. **Error prevention** → can user reach a broken/dead state? +6. **Recognition rather than recall** → current mode/state visible without memorizing? +7. **Flexibility and efficiency** → shortcuts for power users? +8. **Aesthetic and minimalist design** → information at right density? +9. **Help users recover from errors** → what happens on API failure, empty results, bad input? +10. **Accessibility** → keyboard navigation, screen reader, reduced motion? -Refer to [references/heuristics.md](references/heuristics.md) for detailed questions per heuristic. +See [references/heuristics.md](references/heuristics.md) for detailed questions per heuristic. ### Phase 4: Edge Case Sweep -Systematically check: +**Empty states:** no data, no results, no highlights, empty search filter results -**Empty states:** -- No data loaded -- No results -- No highlights active -- Empty search filter results +**Boundary states:** 100+ nodes, single node/no edges, all nodes highlighted, all sliders at min/max -**Boundary states:** -- Maximum data (100+ nodes) -- Single node, no edges -- All nodes highlighted -- All sliders at min/max +**Transition states:** mode switch with active highlights, mode switch mid-drag, query execution while loading, rapid repeated actions (double-click, spam slider) -**Transition states:** -- Mode switch with active highlights -- Mode switch mid-drag -- Query execution while loading -- Rapid repeated actions (double-click, spam slider) - -**Composition states:** -- Every view mode x every layout mode -- Highlight + search filter active simultaneously -- Collapsed groups + highlighting + path results +**Composition states:** every view mode × every layout mode, highlight + search filter active simultaneously, collapsed groups + highlighting + path results ### Phase 5: Report -Output a structured report: - ```markdown ## State Inventory [table of all state variables] @@ -156,29 +118,25 @@ Output a structured report: [summary: how many behaviours tested, how many correct, critical issues] ``` -Severity levels: -- **CRITICAL** - broken functionality, data loss, unreachable state -- **HIGH** - major UX inconsistency, confusing behaviour -- **MEDIUM** - minor inconsistency, missing feedback -- **LOW** - cosmetic, nice-to-have +Severity: **CRITICAL** → broken/data loss/unreachable state | **HIGH** → major UX inconsistency | **MEDIUM** → minor inconsistency/missing feedback | **LOW** → cosmetic ## Research Enhancement -Before starting the analysis, search for: -- Current best practices for the specific UI pattern being analyzed (graph viz, form, dashboard, etc.) +Before starting, search for: +- Current best practices for the specific UI pattern (graph viz, form, dashboard, etc.) - Known UX patterns for the interaction model (drag-and-drop, force-directed graphs, etc.) - Accessibility guidelines for the specific component type -Use findings to set expectations in the matrix - "expected behaviour" should be informed by industry standards, not just gut feeling. +Use findings to set expectations in the matrix - "expected behaviour" should be informed by industry standards, not gut feeling. ## Key Principles -- **Think like a user first** - what would someone expect when they click this? -- **Think like QA second** - what's the worst thing that could happen? -- **Think like a developer third** - read the code to verify, don't assume -- **Every action must have visible feedback** - if clicking something does nothing visibly, that's a bug -- **Every state must be escapable** - the user should never be "stuck" -- **Composition must be tested** - features that work alone often break in combination +- Think like a user first → what would someone expect when they click this? +- Think like QA second → what's the worst that could happen? +- Think like a developer third → read the code to verify, don't assume +- Every action must have visible feedback → silent no-op = bug +- Every state must be escapable → user should never be stuck +- Composition must be tested → features that work alone often break together ## The Iron Law @@ -186,23 +144,49 @@ Use findings to set expectations in the matrix - "expected behaviour" should be NO BEHAVIOUR CLAIM WITHOUT READING THE CODE PATH ``` -You cannot say "this should work" - you must trace the actual code path and confirm. Reading code is not optional. Assumptions are bugs waiting to ship. +You cannot say "this should work" - trace the actual code path and confirm. Reading code is not optional. ## Red Flags - STOP -These are the rationalizations you will have when you want to skip parts of the analysis. Every one is wrong. - | Thought | Reality | |---|---| -| "I'll just check a few interactions, not the full matrix" | Partial coverage misses composition bugs. Do the full matrix. | +| "I'll check a few interactions, not the full matrix" | Partial coverage misses composition bugs. Full matrix. | | "This state combination is unlikely" | Unlikely states are where bugs live. Test them. | -| "Nielsen's heuristics are common sense" | Common sense is pattern-matching without verification. Apply them explicitly. | -| "I already know this code, I don't need to read it" | Code drifts. Mental models drift faster. Read it. | -| "Empty states are trivial, I'll skip them" | Empty states are the #1 place where products feel broken. Audit them. | -| "Transition states will be fine" | Mid-drag, mid-animation, mid-load states are where race conditions live. Audit them. | -| "The user will report any issues" | Users don't report feeling vague discomfort. They leave. | -| "This is for a simple component, full audit is overkill" | Simple components compose into complex flows. Audit it. | +| "Nielsen's heuristics are common sense" | Common sense ≠ verification. Apply them explicitly. | +| "I already know this code" | Code drifts. Mental models drift faster. Read it. | +| "Empty states are trivial" | Empty states = #1 place products feel broken. Audit them. | +| "Transition states will be fine" | Mid-drag/mid-animation/mid-load = where race conditions live. | +| "The user will report any issues" | Users don't report vague discomfort. They leave. | +| "Full audit is overkill for a simple component" | Simple components compose into complex flows. Audit it. | | "I'll skip heuristics I don't remember exactly" | Open the reference. All 10 get applied. | -| "The behaviour feels right" | Feelings are not evidence. Read the code. | -| "I tested the happy path manually" | The happy path is 20% of the matrix. Audit the unhappy paths. | -| "There is no DESIGN.md, so I have no ground truth" | Search for one. Escalate to designer if missing. Do not audit against gut feeling. | +| "The behaviour feels right" | Feelings ≠ evidence. Read the code. | +| "I tested the happy path manually" | Happy path = 20% of the matrix. Audit the unhappy paths. | +| "No DESIGN.md → no ground truth" | Search for one. Escalate to designer if missing. | + + +## Lifecycle Integration + +### Agent Workflow Chains + +**UI/UX audit (after implementation):** +``` +[execution complete] → behaviour-analysis (THIS) → [fix issues] → ship-gate +``` + +**DESIGN.md integration:** +``` +designer → DESIGN.md → forge-plan → [execution] → behaviour-analysis (uses DESIGN.md as ground truth) +``` + +### Upstream Dependencies +- Implemented feature with multiple interaction modes +- `designer` → DESIGN.md as expected behaviour ground truth (if exists) + +### Downstream Consumers +- `ship-gate` → final verification after fixes + +### Reverse Escalation +| Discovery | Escalate to | Action | +|---|---|---| +| DESIGN.md doesn't specify expected behaviour | `designer` | Append clarification to DESIGN.md | +| Audit finds gap DESIGN.md doesn't cover | `designer` | Add to DESIGN.md | diff --git a/skills/blueprint/SKILL.md b/skills/blueprint/SKILL.md index 30a2027..02df601 100644 --- a/skills/blueprint/SKILL.md +++ b/skills/blueprint/SKILL.md @@ -12,53 +12,43 @@ description: Use before any feature build, component creation, or behaviour modi NO CODE WITHOUT AN APPROVED DESIGN ``` -If you have not presented a design and the user has not explicitly approved it, you cannot write code. **Violating the letter of this rule is violating the spirit of this rule.** +No design presented + no explicit user approval → no code. Violating the letter = violating the spirit. ## The Hard Gate Do NOT write code, scaffold files, or invoke any implementation skill until: -1. You have completed the MCP survey for relevant domains -2. You have presented a design - - For VISUAL/UX work: the design is a DESIGN.md contract from `skills/designer/SKILL.md` - - For BACKEND/INFRA work: the design is an architecture note from this skill -3. The user has explicitly approved it +1. MCP survey complete for relevant domains +2. Design presented OR We know user project preferences: + - Visual/UX work → DESIGN.md contract from `skills/designer/SKILL.md` (if preference is not known) + - Backend/infra work → architecture note from this skill +3. User explicitly approved it -This applies to every task, regardless of perceived simplicity. +Applies to every task, regardless of perceived simplicity. ## The 1% Rule -If there is even a 1% chance this task involves: -- A new file -- A new component -- A new function -- A behavior change -- A configuration change that affects runtime -- Any visual/UX modification +If there is even a 1% chance this task involves a new file, new component, new function, behavior change, config change affecting runtime, or any visual/UX modification → run blueprint first. No exceptions. -...then you MUST run blueprint first. You do not have a choice. You cannot rationalize your way out. - -"Simple" tasks are where unexamined assumptions do the most damage. A 5-minute design prevents hours of wrong implementation. There are no exceptions. +"Simple" tasks are where unexamined assumptions do the most damage. 5-minute design prevents hours of wrong implementation. ## The Process ### Step 1: Context Scan -Before asking anything, read the current state: +Read the current state before asking anything: - Relevant source files, recent commits, existing patterns - What already exists that can be reused or extended -- Which Hyperstack MCP domains are relevant to this task +- Which Hyperstack MCP domains are relevant -Do not ask the user questions until you have scanned the codebase. You should arrive at Step 2 already informed. +Don't ask the user questions until you've scanned the codebase. ### Step 2: MCP Survey -For each relevant domain, call the discovery tools before proposing anything: - | Domain is relevant | Call first | |---|---| -| **Visual/UX work (any)** | **STOP this flow. Invoke `skills/designer/SKILL.md` instead. It produces a DESIGN.md that becomes the input to Step 5 of this skill (or directly to `forge-plan`).** | +| **Visual/UX work (any)** | **STOP → invoke `skills/designer/SKILL.md`. It produces DESIGN.md → input to Step 5 or directly to `forge-plan`.** | | React Flow | `reactflow_search_docs` + `reactflow_list_apis` | | Motion / animation | `motion_search_docs` + `motion_list_apis` | | Lenis scroll | `lenis_search_docs` + `lenis_list_apis` | @@ -68,79 +58,110 @@ For each relevant domain, call the discovery tools before proposing anything: | Design tokens | `design_tokens_list_categories` + `design_tokens_get_gotchas` | | UI/UX | `ui_ux_list_principles` + `ui_ux_get_gotchas` | -This step ensures the design you propose uses real API shapes - not imagined ones. A design built on wrong API assumptions is not a design; it is technical debt scheduled for delivery. +Design built on wrong API assumptions = technical debt scheduled for delivery. -**Visual work routing:** If the user's request involves designing a new page, component library, landing page, dashboard, redesign, or any "make it look like X" task - the `designer` skill owns the design gate. Invoke it instead of running Step 4-6 here. Return with a DESIGN.md contract and proceed to handoff (Step 7). +**Visual work routing:** New page, component library, landing page, dashboard, redesign, "make it look like X" → `designer` skill owns the design gate. Return with DESIGN.md → handoff (Step 7). ### Step 3: Clarify Requirements -Ask clarifying questions one at a time: -- Purpose and success criteria - what does done look like? -- Constraints - performance targets, accessibility requirements, existing patterns to follow -- Scope boundary - what is explicitly NOT included in this task? +Ask one clarifying question at a time: +- Purpose and success criteria → what does done look like? +- Constraints → performance targets, accessibility requirements, existing patterns +- Scope boundary → what is explicitly NOT included? -One question per message. Wait for the answer before asking the next one. - -If the request describes multiple independent subsystems, flag this before proceeding. One design → one implementation cycle. Large requests must be decomposed into sub-projects first. +Wait for answer before asking the next. Multiple independent subsystems → flag before proceeding, decompose first. ### Step 4: Propose 2-3 Approaches -Present options with: -- Trade-offs for each approach -- Which MCP-backed APIs and patterns each approach uses (cite the tool output from Step 2) +For each approach: +- Trade-offs +- MCP-backed APIs and patterns used (cite tool output from Step 2) - Your recommendation with reasoning -Lead with your recommended option. Do not present options without a recommendation. +Lead with your recommended option. No options without a recommendation. ### Step 5: Present Design Scale each section to its complexity: -- **Architecture** - module boundaries, data flow, key abstractions -- **Invariants** - what must always be true at runtime -- **Interfaces** - public APIs between modules, including types -- **Error paths** - what happens when dependencies fail, inputs are invalid, or async operations time out +- **Architecture** → module boundaries, data flow, key abstractions +- **Invariants** → what must always be true at runtime +- **Interfaces** → public APIs between modules, including types +- **Error paths** → what happens when dependencies fail, inputs are invalid, async times out -Get user confirmation after presenting. Revise if needed. Do not proceed until the user approves. +Get user confirmation. Revise if needed. Don't proceed until approved. ### Step 6: Negative Doubt -Before finalising, list at least 5 failure modes: +List at least 5 failure modes before finalizing: - What breaks at runtime under normal usage? - What edge cases does this design not handle? - Which invariants could be violated by concurrent operations or unexpected state? -- What does the MCP `get_gotchas` data say about this domain? -- What external dependency (API, library version, browser API) could change and break this? +- What does MCP `get_gotchas` say about this domain? +- What external dependency could change and break this? -Address each failure mode explicitly - either design around it or record the accepted risk. +Address each explicitly → design around it or record the accepted risk. ### Step 7: Handoff to Implementation -Once the design is approved: -- Save a short design note to the relevant docs directory if the task is non-trivial -- For visual/UX work: DESIGN.md already exists (produced by `designer` skill). Save it at `docs/DESIGN.md` or `/DESIGN.md`. -- Invoke `hyperstack:forge-plan` to build a fully MCP-verified implementation plan from the approved design -- **If DESIGN.md exists:** forge-plan reads it as its input spec. Each of the 10 sections becomes one or more tasks. -- The approved design is the spec - `forge-plan` translates it into traceable tasks, `engineering-discipline` executes them +Once approved: +- Save design note to relevant docs directory if non-trivial +- Visual/UX work → DESIGN.md already exists. Save at `docs/DESIGN.md` or `/DESIGN.md`. +- Invoke `hyperstack:forge-plan` → builds MCP-verified implementation plan from approved design +- DESIGN.md present → forge-plan reads it as input spec, each of 10 sections → one or more tasks ## Red Flags - STOP -These are the exact thoughts you will have when you want to skip this skill. Every one is a rationalization. Every one has been used before to build wrong architectures. Every one has a counter. - | Thought | Reality | |---|---| -| "I know React Flow well enough to skip the survey" | MCP data has v12-specific API shapes. Memory has v11. Call the tool. | -| "This is too simple for a design" | Simple tasks are where unexamined assumptions do the most damage. Return to the Hard Gate. | -| "Let me just start with a file and we'll design as we go" | This is how wrong architectures get built. Do the design FIRST. | -| "The user seems impatient, I'll skip Step 6" | User impatience is not permission to ship slop. Negative Doubt is not optional. | +| "I know React Flow well enough to skip the survey" | MCP has v12-specific API shapes. Memory has v11. Call the tool. | +| "This is too simple for a design" | Simple tasks → unexamined assumptions → most damage. Return to Hard Gate. | +| "Let me just start with a file and design as we go" | How wrong architectures get built. Design FIRST. | +| "User seems impatient, I'll skip Step 6" | User impatience ≠ permission to ship slop. Negative Doubt is not optional. | | "I'll propose one approach - the obvious one" | Two approaches exist for every non-trivial design. Find both. | -| "The task is a single-line change" | A single line at the wrong place destroys invariants. Design first. | +| "The task is a single-line change" | Single line at the wrong place destroys invariants. Design first. | | "This is a bug fix, not a feature" | Bug fixes change behavior. Behavior changes need designs. | -| "I'm just refactoring" | Refactors move responsibility. Moving responsibility is architectural. Design first. | -| "The design will slow us down" | No. Wrong code ships. Then you fix it. Then fix it again. That is slow. Design once, ship right. | -| "I can reason about this without external tools" | MCP data contains edge cases and gotchas you will not remember. Call the tool. | +| "I'm just refactoring" | Refactors move responsibility. Moving responsibility is architectural. | +| "The design will slow us down" | Wrong code ships → fix it → fix it again. That is slow. Design once, ship right. | +| "I can reason about this without external tools" | MCP data contains gotchas you won't remember. Call the tool. | | "The user will tell me if I'm wrong" | The user hired you to prevent that. Do the design. | | "I already did a similar design last week" | State drifts. Codebase changes. Do the current survey. | -| "This is not my call, I'm just executing instructions" | Executing instructions with no design is how bad instructions become shipped bugs. Design first. | | "Let me start with a prototype" | Prototypes become production. Design the prototype. | + + +## Lifecycle Integration + +### Agent Workflow Chains + +**Website/Frontend Agent:** +``` +blueprint (THIS) → designer → forge-plan → [execution] → ship-gate → deliver + ↓ visual routing +``` + +**Backend/Infra Agent:** +``` +blueprint (THIS) → forge-plan → [execution] → ship-gate → deliver + ↓ architecture note +``` + +**Execution Options (chosen at forge-plan handoff):** +- `autonomous-mode` → full auto, stops only on failure +- `subagent-ops` → fresh agent per task, two-stage review +- `engineering-discipline` → manual with phase gates + +### Upstream Dependencies +- None (entry point for feature work) +- `using-hyperstack` → 1% rule enforcement + +### Downstream Consumers +- `forge-plan` → reads approved design, builds MCP-verified task plan +- `designer` → if visual/UX routing detected +- `run-plan` → if resuming existing plan + +### Reverse Escalation +| Discovery | Escalate to | Action | +|---|---|---| +| Visual/UX work detected mid-task | `designer` | Pause, get DESIGN.md, resume | +| Architecture gap (non-visual) | `blueprint` | Re-enter for architecture decision | diff --git a/skills/code-review/SKILL.md b/skills/code-review/SKILL.md index 5460905..df8ccde 100644 --- a/skills/code-review/SKILL.md +++ b/skills/code-review/SKILL.md @@ -8,17 +8,15 @@ description: Use when completing tasks, implementing features, or before merging ## Two Modes -This skill covers both sides of code review: - -1. **Requesting** -- dispatching a reviewer subagent to evaluate your work -2. **Receiving** -- handling review feedback with technical rigor, not performative agreement +1. **Requesting** → dispatching a reviewer subagent to evaluate your work +2. **Receiving** → handling review feedback with technical rigor, not performative agreement ## Requesting Review ### When to Request **Mandatory:** -- After each task in `subagent-ops` (handled automatically by that skill) +- After each task in `subagent-ops` (handled automatically) - After completing a major feature - Before merge to main @@ -32,18 +30,17 @@ This skill covers both sides of code review: **1. Get the diff range:** ```bash -BASE_SHA=$(git merge-base HEAD main) # or master/develop +BASE_SHA=$(git merge-base HEAD main) HEAD_SHA=$(git rev-parse HEAD) ``` **2. Dispatch a review subagent with:** - - What was implemented (one sentence) -- The requirements or spec it should match +- Requirements or spec it should match - The git diff (`git diff $BASE_SHA..$HEAD_SHA`) - Specific question: "Does this match the spec? Flag missing, extra, or incorrect code." -**Note:** Review subagents receive raw diff and spec only. Do not load bootstrap (`using-hyperstack`) into review subagents -- the `` gate prevents it, and review subagents do not need the full skill catalogue. Provide exactly what they need to evaluate. +**Note:** Review subagents get raw diff + spec only. Do not load bootstrap (`using-hyperstack`) → `` gate prevents it anyway. Provide exactly what they need to evaluate. **3. Act on results:** @@ -56,28 +53,28 @@ HEAD_SHA=$(git rev-parse HEAD) ### MCP-Enhanced Review -When reviewing domain-specific code, include MCP verification in the review prompt: +For domain-specific code, include MCP verification in the review prompt: > "For any React Flow API usage, verify against `reactflow_get_api`. For any Go patterns, verify against `golang_get_practice`. Flag any API usage that doesn't match MCP output." -This catches API drift that a generic code reviewer would miss. +This catches API drift a generic reviewer would miss. ## Receiving Review ### The Response Pattern ``` -1. READ: Complete feedback without reacting +1. READ: Complete feedback without reacting 2. UNDERSTAND: Restate the requirement (or ask) -3. VERIFY: Check against codebase reality -4. EVALUATE: Technically sound for THIS codebase? -5. RESPOND: Technical acknowledgment or reasoned pushback -6. IMPLEMENT: One item at a time, test each +3. VERIFY: Check against codebase reality +4. EVALUATE: Technically sound for THIS codebase? +5. RESPOND: Technical acknowledgment or reasoned pushback +6. IMPLEMENT: One item at a time, test each ``` ### Forbidden Responses -Never respond with: +Never: - "You're absolutely right!" - "Great point!" - "Thanks for catching that!" @@ -94,17 +91,17 @@ Instead: Push back when: - Suggestion breaks existing functionality - Reviewer lacks full context -- Violates YAGNI (unused feature) +- Violates YAGNI - Technically incorrect for this stack - Conflicts with user's architectural decisions -**How:** Use technical reasoning, not defensiveness. Reference working tests/code. Involve the user if architectural. +**How:** Technical reasoning, not defensiveness. Reference working tests/code. Involve user if architectural. ### Handling Unclear Feedback -If any item is unclear: **stop.** Do not implement anything yet. Ask for clarification on the unclear items first. +Any item unclear → stop. Do not implement anything yet. Ask for clarification first. -Items may be related. Partial understanding leads to wrong implementation. +Items may be related. Partial understanding → wrong implementation. ### Implementation Order @@ -116,14 +113,14 @@ For multi-item feedback: 5. Test each fix individually 6. Verify no regressions -## Red Flags -- STOP +## Red Flags - STOP | Thought | Reality | |---|---| | "Skip review, it's simple" | Simple code has bugs. Review catches them. | -| "I'll review my own code" | Self-review is not code review. Dispatch a subagent. | +| "I'll review my own code" | Self-review ≠ code review. Dispatch a subagent. | | "Reviewer is wrong, ignore it" | Push back with reasoning. Don't silently ignore. | -| "I agree with everything" | Performative agreement is not technical evaluation. | +| "I agree with everything" | Performative agreement ≠ technical evaluation. | | "I'll implement all feedback at once" | One item at a time, test each. | ## Integration @@ -131,3 +128,29 @@ For multi-item feedback: - **Called by:** `hyperstack:subagent-ops` (per-task review cycle), `hyperstack:deliver` (pre-merge review) - **Pairs with:** `hyperstack:ship-gate` (verification after fixes) - **Escalate to:** User if reviewer and implementer disagree on architectural decisions + + +## Lifecycle Integration + +### Agent Workflow Chains + +**Per-task review (subagent-ops):** +``` +subagent-ops → implementer → code-review (THIS) → [fix loop] → next task +``` + +**Pre-merge review:** +``` +[autonomous-mode | engineering-discipline] → code-review (THIS) → deliver +``` + +### Upstream Dependencies +- `subagent-ops` → per-task review cycle (automatic) +- `engineering-discipline` → after completing major features +- `deliver` → pre-merge review + +### Skills Used With +- `ship-gate` → verification after review fixes applied + +### MCP-Enhanced Review +Include MCP verification in review prompts for domain-specific code (reactflow_get_api, golang_get_practice, etc.) diff --git a/skills/debug-discipline/SKILL.md b/skills/debug-discipline/SKILL.md index 4b01e62..774af20 100644 --- a/skills/debug-discipline/SKILL.md +++ b/skills/debug-discipline/SKILL.md @@ -12,126 +12,146 @@ description: Use when encountering any bug, test failure, or unexpected behaviou NO FIXES WITHOUT ROOT CAUSE FIRST. ``` -A symptom fix is a failure. Random changes are not debugging - they are thrashing. Every fix attempt without a confirmed root cause increases the probability of a second bug. +Symptom fix = failure. Random changes = thrashing. Every fix attempt without confirmed root cause → higher probability of a second bug. -**If you have not completed Phase 1, you cannot propose a fix.** +Phase 1 not complete → no fix proposed. ## When to Use -Use this for any technical failure: -- Test failures -- Runtime errors or panics -- Unexpected behaviour (wrong output, wrong rendering, wrong state) -- Build failures -- Performance regressions -- Integration issues +Any technical failure: test failures, runtime errors/panics, unexpected behaviour, build failures, performance regressions, integration issues. -Use it **especially** when: -- Under time pressure - urgency makes guessing tempting -- "The fix is obvious" - obvious fixes often address the wrong layer -- You have already tried something and it did not work -- The error message points to a dependency or library function +Use **especially** when: +- Under time pressure → urgency makes guessing tempting +- "The fix is obvious" → obvious fixes often address the wrong layer +- You already tried something and it didn't work +- Error message points to a dependency or library function ## The Four Phases ### Phase 1: Root Cause Investigation -**BEFORE attempting any fix:** +**BEFORE any fix attempt:** **1. Read the error in full** -Stack trace, error message, line numbers, exit codes - read every line. Do not skim. The exact wording often contains the fix. +Stack trace, error message, line numbers, exit codes → read every line. The exact wording often contains the fix. **2. Reproduce consistently** -Can you trigger it reliably? What are the exact steps? If you cannot reproduce it, do not guess - gather more data first. +Can you trigger it reliably? Exact steps? Can't reproduce → gather more data first, don't guess. **3. Check recent changes** -`git diff`, recent commits. What changed that could have caused this? Assume the most recent change is guilty until proven otherwise. +`git diff`, recent commits. Most recent change is guilty until proven otherwise. **4. Check MCP docs for the failing domain** -Before assuming you understand how a library or API behaves, verify: - -| Domain | What to call | +| Domain | Call | |---|---| -| React Flow component behaves unexpectedly | `reactflow_get_api` for the component, `reactflow_get_pattern` for the usage pattern | +| React Flow component behaves unexpectedly | `reactflow_get_api` for the component, `reactflow_get_pattern` for usage | | Go runtime error (goroutine, context, nil pointer) | `golang_get_practice` for the relevant topic | | Rust borrow checker / lifetime error | `rust_get_practice` + `rust_cheatsheet` | | Echo middleware or routing issue | `echo_get_middleware` or `echo_get_recipe` | | Motion animation not firing | `motion_get_api` for the failing hook or component | | CSS/token rendering wrong | `design_tokens_get_gotchas` or `ui_ux_get_gotchas` | -The MCP gotchas data frequently contains the exact failure mode you are looking at. Check it before forming a hypothesis. +MCP gotchas data frequently contains the exact failure mode you're looking at. **5. Trace the data flow** - -Where does the bad value originate? Trace backwards from the symptom to the source. Add diagnostic logging at each layer boundary if needed. Run once to see which layer breaks. Then investigate that specific layer. +Where does the bad value originate? Trace backwards from symptom to source. Add diagnostic logging at each layer boundary if needed. Run once to see which layer breaks → investigate that layer. Fix at the source. Never at the symptom. ### Phase 2: Pattern Analysis -Before writing a fix, find the correct pattern: +Before writing a fix: 1. Locate similar working code in the same codebase -2. Compare the failing code against the working example - list every difference, however small -3. Check the MCP reference for the correct pattern (`[domain]_get_pattern`) +2. Compare failing code against working example → list every difference, however small +3. Check MCP reference for the correct pattern (`[domain]_get_pattern`) 4. Understand what the failing code assumed that the working code does not ### Phase 3: Hypothesis and Test -Scientific method - one variable at a time: +One variable at a time: + +1. **State hypothesis explicitly:** "I believe X is the root cause because Y" +2. **Design minimal test:** smallest change that confirms or refutes the hypothesis +3. **Make one change** → don't bundle multiple fixes +4. **Verify result:** + - Confirms → Phase 4 + - Refutes → new hypothesis, return to top of Phase 3 (count as failed attempt) + - After 2 refuted hypotheses → return to Phase 1 with all new information + - After 3 failed hypotheses total → stop, go to Escalation Rule -1. **State the hypothesis explicitly:** "I believe X is the root cause because Y" -2. **Design the minimal test:** the smallest change that would confirm or refute the hypothesis -3. **Make one change** - do not bundle multiple fixes -4. **Verify the result:** - - Confirms hypothesis → Phase 4 - - Refutes hypothesis → form a new hypothesis, return to top of Phase 3 (count this as a failed attempt) - - After 2 refuted hypotheses: return to Phase 1 with all new information before forming another hypothesis - - After 3 failed hypotheses total: stop - go directly to the Escalation Rule below - - Do NOT stack a second change on top of a failed one +Don't stack a second change on top of a failed one. -If you genuinely do not know what the root cause is after Phase 1 and 2, say so explicitly. "I don't understand why X behaves this way" is correct. Proposing a fix you don't understand is not. +If you genuinely don't know the root cause after Phase 1 and 2 → say so explicitly. Proposing a fix you don't understand is not debugging. ### Phase 4: Fix Fix the root cause. Not the symptom. -1. **Write a failing test first** - the simplest possible reproduction. Run it. Confirm it fails. -2. **Implement one fix** - address the confirmed root cause. One change. -3. **Run the test** - confirm it now passes. -4. **Check for regressions** - run the full test suite. +1. **Write failing test first** → simplest possible reproduction. Run it. Confirm it fails. +2. **Implement one fix** → address confirmed root cause. One change. +3. **Run the test** → confirm it passes. +4. **Check for regressions** → run full test suite. 5. **Invoke `hyperstack:ship-gate`** before claiming the bug is fixed. **Attempt counter - mandatory:** -- Attempts 1-2: if the fix does not work, return to Phase 1 with the new information +- Attempts 1-2: fix doesn't work → return to Phase 1 with new information - **Attempt 3: STOP. Do not attempt a fourth fix.** ### Escalation Rule (3+ Failed Fixes) -Three failed attempts signals an architectural problem, not a surface bug. +Three failed attempts → architectural problem, not a surface bug. -Diagnostic pattern: -- Each fix reveals new coupling or unexpected shared state in a different location -- The correct fix would require "significant refactoring" - which means the current structure cannot accommodate the correct behaviour +Signals: +- Each fix reveals new coupling or unexpected shared state elsewhere +- Correct fix would require "significant refactoring" → current structure can't accommodate correct behaviour - Each fix creates a new symptom elsewhere -At this point, stop fixing. Present the findings to the user: what you tried, what each attempt revealed, and what architectural change appears to be required. Do not continue patching. +Stop fixing. Present findings to user: what you tried, what each attempt revealed, what architectural change appears required. ## Red Flags - STOP | Thought | Reality | |---|---| -| "Let me just try changing X" | You do not have a root cause | -| "It's probably a race condition" | "Probably" is not a root cause | +| "Let me just try changing X" | No root cause → don't touch it | +| "It's probably a race condition" | "Probably" ≠ root cause | | "Quick fix now, investigate later" | There is no later | -| "Multiple small changes at once" | You cannot isolate what worked | -| "The library is broken" | Check the MCP docs first | +| "Multiple small changes at once" | Can't isolate what worked | +| "The library is broken" | Check MCP docs first | | "One more attempt" (after 2 failures) | Stop. Escalate. | | "I fixed it - the error is gone" | Run `hyperstack:ship-gate` | ## Integration - Use `hyperstack:ship-gate` before claiming any bug is fixed -- Use `hyperstack:engineering-discipline` if Phase 4 escalation reveals an architectural change is needed -- Use `hyperstack:blueprint` if the fix requires building new functionality rather than correcting existing behaviour +- Use `hyperstack:engineering-discipline` if Phase 4 escalation reveals architectural change needed +- Use `hyperstack:blueprint` if fix requires building new functionality rather than correcting existing behaviour + + +## Lifecycle Integration + +### Agent Workflow Chains + +**Used inline during execution:** +``` +[autonomous-mode | subagent-ops | engineering-discipline] → debug-discipline (THIS) + ↓ + [self-correction hierarchy] + ↓ + [fix → ship-gate] +``` + +### Upstream Dependencies +- Any execution mode encountering failure + +### Skills Used Inline +- `test-first` → Phase 4 (write failing test before fix) +- `ship-gate` → Phase 4 (verify fix before claiming complete) + +### Escalation Paths +| Condition | Escalate to | Action | +|---|---|---| +| 3 failed fix attempts | User | Architectural problem, not surface bug | +| Fix requires new functionality | `blueprint` | Not a bug fix, needs design | +| Fix requires architectural change | `engineering-discipline` | Step 3 architecture reasoning | diff --git a/skills/deliver/SKILL.md b/skills/deliver/SKILL.md index 6eb268c..4f7ee14 100644 --- a/skills/deliver/SKILL.md +++ b/skills/deliver/SKILL.md @@ -8,22 +8,20 @@ description: Use after all implementation tasks are complete. Runs final verific ## When to Use -After every task in the implementation plan is marked complete and all verification has passed. This is the terminal state of every Hyperstack workflow. +After every task in the implementation plan is marked complete and all verification has passed. Terminal state of every Hyperstack workflow. -Do NOT invoke this skill until all tasks are done. It is a gate, not a shortcut. +Do NOT invoke until all tasks are done. It is a gate, not a shortcut. ## The Process ### Step 1: Full Verification -Run the complete test suite. Not a subset. Not the tests you just wrote. All of them. +Run the complete test suite. Not a subset. Not just the tests you wrote. All of them. -Show the output. If anything fails, stop here - invoke `hyperstack:debug-discipline` and resolve before continuing. +Show the output. Anything fails → stop, invoke `hyperstack:debug-discipline`, resolve before continuing. ### Step 2: Type / Lint Check -Run the appropriate check for the project's language: - | Language | Command | |---|---| | TypeScript / Next.js | `npx tsc --noEmit` | @@ -31,28 +29,28 @@ Run the appropriate check for the project's language: | Go | `go vet ./...` | | Python | `mypy .` (if configured) | -Zero errors required. Warnings are acceptable if pre-existing and documented. +Zero errors required. Pre-existing warnings acceptable if documented. ### Step 3: Diff Review -Run `git diff ..HEAD` where `` is main, master, or develop - whichever this branch was cut from. +Run `git diff ..HEAD`. Check: -- Does the diff match the plan or approved design? -- Are there any unintended changes (modified files outside the plan's scope)? -- Are there any debug statements, console.logs, or temporary code left in? +- Diff matches plan or approved design? +- Unintended changes (files outside plan's scope)? +- Debug statements, console.logs, or temp code left in? -If anything is unintended, revert it before continuing. +Anything unintended → revert before continuing. ### Step 4: Ship Gate Invoke `hyperstack:ship-gate` on the overall implementation. -Do not skip this. Passing individual task verifications does not replace a final gate on the whole. +Don't skip. Passing individual task verifications ≠ final gate on the whole. ### Step 5: Present Options -Once Steps 1-4 pass, present the delivery options to the user: +Once Steps 1-4 pass: > "All verification passed. How do you want to deliver this? > @@ -64,7 +62,7 @@ Wait for the user's choice. ### Step 6: Execute -Execute exactly the chosen option. Do not add steps. Do not clean up other things "while you're at it." +Execute exactly the chosen option. No extra steps. No "cleaning up other things while you're at it." **Option 1 - PR:** ```bash @@ -88,13 +86,33 @@ git push -u origin [branch-name] | Thought | Reality | |---|---| -| "Tests mostly pass, I'll fix the rest in a follow-up" | No. Fix them now or don't deliver. | -| "The type errors are pre-existing" | Verify with `git stash` - if they existed before your change, document it. If not, fix them. | -| "I'll skip ship-gate, I just ran individual verifications" | Individual gates do not cover composition. Run ship-gate. | -| "Let me also clean up X while I'm here" | Scope creep. Out-of-plan changes go on a new branch. | +| "Tests mostly pass, I'll fix the rest in a follow-up" | Fix them now or don't deliver. | +| "The type errors are pre-existing" | Verify with `git stash`. Pre-existing → document it. Not pre-existing → fix it. | +| "I'll skip ship-gate, I just ran individual verifications" | Individual gates ≠ composition. Run ship-gate. | +| "Let me also clean up X while I'm here" | Scope creep. Out-of-plan changes → new branch. | ## Integration -- **Requires:** All tasks in `forge-plan` or `run-plan` complete and individually verified (via `autonomous-mode`, `subagent-ops`, or `engineering-discipline`) +- **Requires:** All tasks in `forge-plan` or `run-plan` complete and individually verified - **Requires:** `hyperstack:ship-gate` passing on full implementation - **Invoked after:** `hyperstack:autonomous-mode`, `hyperstack:subagent-ops`, or `hyperstack:engineering-discipline` completes + + +## Lifecycle Integration + +### Agent Workflow Chains + +**Terminal state of all workflows:** +``` +[autonomous-mode | subagent-ops | engineering-discipline] → ship-gate → deliver (THIS) +``` + +### Upstream Dependencies +- `ship-gate` → must pass before deliver invoked +- All tasks in plan marked complete + +### Downstream Consumers +- None (terminal state) + +### Cleanup +- `worktree-isolation` → cleanup after delivery (if worktree used) diff --git a/skills/design-patterns-skill/SKILL.md b/skills/design-patterns-skill/SKILL.md index 3da446b..7506dd2 100755 --- a/skills/design-patterns-skill/SKILL.md +++ b/skills/design-patterns-skill/SKILL.md @@ -37,69 +37,57 @@ references: # Design Patterns & Programming Principles -## Overview - -Structured guidance on programming principles and design patterns from foundational software engineering books. Ensures code follows industry-standard practices for readability, maintainability, simplicity, and architectural soundness. - ## When to Apply -- **Code Generation:** Writing new functions, classes, or modules -- **Code Review:** Evaluating pull requests or existing codebases -- **Refactoring:** Improving code structure and clarity -- **Architecture Design:** Choosing appropriate patterns and abstractions - ---- +- **Code Generation** → writing new functions, classes, or modules +- **Code Review** → evaluating PRs or existing codebases +- **Refactoring** → improving code structure and clarity +- **Architecture Design** → choosing appropriate patterns and abstractions ## Core Philosophy -1. **Readability over cleverness** - Code is read more than written -2. **Simplicity over complexity** - Use the simplest solution that works -3. **Testability by design** - Write code that's easy to test -4. **Incremental improvement** - Leave code better than you found it -5. **Patterns as tools** - Apply patterns when they clarify, not by default - ---- +1. Readability over cleverness → code is read more than written +2. Simplicity over complexity → simplest solution that works +3. Testability by design → write code that's easy to test +4. Incremental improvement → leave code better than you found it +5. Patterns as tools → apply when they clarify, not by default ## Principle Categories ### 1. Readability & Clarity -- Descriptive naming, consistent formatting, self-documenting code, small focused functions -- **Reference:** `references/patterns/readability.md` +Descriptive naming, consistent formatting, self-documenting code, small focused functions +→ `references/patterns/readability.md` ### 2. Simplicity & Efficiency -- KISS, DRY, YAGNI -- **Reference:** `references/patterns/simplicity.md` +KISS, DRY, YAGNI +→ `references/patterns/simplicity.md` ### 3. Design & Architecture -- SRP, composition over inheritance, program to interfaces -- Patterns: Factory, Strategy, Observer, Decorator, Adapter, Command, Singleton -- **Reference:** `references/patterns/design-architecture.md` +SRP, composition over inheritance, program to interfaces +Patterns: Factory, Strategy, Observer, Decorator, Adapter, Command, Singleton +→ `references/patterns/design-architecture.md` ### 4. Testing & Quality -- Automated testing, focused assertions, edge case coverage -- **Reference:** `references/patterns/testing.md` +Automated testing, focused assertions, edge case coverage +→ `references/patterns/testing.md` ### 5. Error Handling -- Clear error messages, early validation, proper exception usage -- **Reference:** `references/patterns/error-handling.md` +Clear error messages, early validation, proper exception usage +→ `references/patterns/error-handling.md` ### 6. Maintainability -- Boy Scout Rule, continuous refactoring, atomic commits, automation -- **Reference:** `references/patterns/maintainability.md` - ---- +Boy Scout Rule, continuous refactoring, atomic commits, automation +→ `references/patterns/maintainability.md` ## AI-Specific Guidance -When generating or reviewing code, always: +When generating or reviewing code: 1. Check for AI pitfalls listed in each principle -2. Avoid pattern prediction bias - don't use patterns just because they're common -3. Question generic naming - resist `data`, `temp`, `result` without context -4. Validate edge cases - don't skip error handling -5. Keep functions focused - resist combining unrelated operations -6. Match project conventions - maintain consistency with existing codebase - ---- +2. Avoid pattern prediction bias → don't use patterns just because they're common +3. Question generic naming → resist `data`, `temp`, `result` without context +4. Validate edge cases → don't skip error handling +5. Keep functions focused → resist combining unrelated operations +6. Match project conventions → maintain consistency with existing codebase ## Quick Reference @@ -114,8 +102,6 @@ When generating or reviewing code, always: | Need undo/logging | Command pattern | | Global access point | Singleton (use sparingly) | ---- - ## Sources - *Clean Code* - Robert C. Martin diff --git a/skills/designer/SKILL.md b/skills/designer/SKILL.md index 3d337f4..92f78ed 100644 --- a/skills/designer/SKILL.md +++ b/skills/designer/SKILL.md @@ -2,10 +2,10 @@ name: designer category: domain description: >- - Evidence-based design decision engine. An intention gate that produces non-slop + Evidence-based design decision engine. Intention gate that produces non-slop UI/UX by forcing every visual choice through industry context, cognitive science, design master principles, and anti-pattern detection before code generation. - Outputs a DESIGN.md contract that all subsequent implementation must follow. + Outputs DESIGN.md contract all subsequent implementation must follow. metadata: author: booleanstack version: "3.0.0" @@ -28,14 +28,6 @@ triggers: activation: mode: fuzzy priority: high - triggers: - - design - - landing page - - dashboard - - visual - - DESIGN.md - - ui - - ux references: - references/design-md-template.md - references/website-experience-cheatsheet.md @@ -44,329 +36,232 @@ references: - examples/ecommerce-checkout.md --- -# Designer Skill - Intention Gate +# Designer Skill — Intention Gate -> AI-generated UIs all look the same because AI skips the decision process and jumps to code. -> This skill forces every design decision through evidence before code generation. -> No visual code until a DESIGN.md contract is produced and approved. +> AI UIs all look same because AI skip decision process, jump to code. +> Skill force every design decision through evidence before code generation. +> No visual code until DESIGN.md contract produced and approved. ---- - -## The Iron Law +## IRON LAW ``` -NO VISUAL CODE WITHOUT AN APPROVED DESIGN.md +NO VISUAL CODE WITHOUT APPROVED DESIGN.md ``` -**Violating the letter of this rule is violating the spirit of this rule.** - -If you are about to write a single line of JSX, CSS, or styling code, and there is no approved DESIGN.md, you are breaking this rule. There are no exceptions. A "simple button" still needs personality, color, and state decisions. +Single line JSX, CSS, or styling → no DESIGN.md → BREAKING THIS RULE. No exceptions. "Simple button" still needs personality, color, state decisions. -## Hard Gate +## HARD GATE ``` -DO NOT GENERATE ANY VISUAL CODE UNTIL: +DO NOT GENERATE VISUAL CODE UNTIL: 1. Intent extracted (Phase 1) 2. MCP tools consulted (Phase 2) 3. Anti-patterns checked (Phase 3) - 4. DESIGN.md generated and presented (Phase 4) - 5. User has approved the DESIGN.md - -No exceptions. A "simple button" still needs personality, color, and state decisions. + 4. DESIGN.md generated + presented (Phase 4) + 5. User approved DESIGN.md ``` -## The 1% Rule - -If there is even a 1% chance that the task involves: -- A new page or view -- A new component -- Changing how something looks -- Changing how something moves (animation, transition, scroll) -- Changing how something responds to user input -- A landing page, dashboard, form, or data display -- "Make it look more like X" -- "Redesign" anything +## 1% RULE -...then you MUST invoke this skill BEFORE writing any code. You cannot rationalize your way out. +1% chance task involves new page/view, new component, changing look/feel/motion/interaction, landing page, dashboard, form, data display, "make it look like X", "redesign" → invoke skill BEFORE writing any code. -**Apply when:** the task changes how something **looks, feels, moves, or is interacted with**. -**Skip when:** pure backend with no frontend impact, single CSS bug fix (with the same colors/spacing), adding to existing design system with established tokens, performance optimization with no visual change, infrastructure. +**Apply when:** task changes how something **looks, feels, moves, or is interacted with.** +**Skip when:** pure backend, single CSS bug fix (same colors/spacing), adding to existing design system with established tokens, perf optimization no visual change, infrastructure. -## Red Flags - STOP - -These are the rationalizations you will have when you want to skip this skill. Every one is wrong. +## RED FLAGS — STOP | Thought | Reality | |---|---| -| "This is a small component, it doesn't need a full DESIGN.md" | Small components with wrong decisions ship to production. Design it. | -| "I'll just use the default shadcn styles" | Defaults are decisions. Unexamined defaults produce AI slop. Design intentionally. | -| "The user said 'just make it work'" | "Just make it work" means "make something that makes sense visually." That needs design. | -| "I know what a SaaS dashboard looks like" | You know the AI-slop version. Designer prevents that specifically. | -| "I can fix the design after the user sees the code" | No. The AI slop fingerprint is sticky. Users will stop caring before you fix it. | -| "The MCP tools are overkill for this" | You don't get to decide. Call them. | -| "I'll generate a DESIGN.md after coding" | Then it is post-hoc justification, not design. Design FIRST. | -| "The user is iterating quickly, they don't want a gate" | User speed is not permission to ship slop. Gate first, iterate fast inside the gate. | -| "This is just a quick mockup" | Quick mockups become shipped products. Design them. | -| "Figma already has the design, I'll just translate" | Translating from Figma without design resolution creates absolute/relative dumps. Use designer anyway. | -| "I'll pick colors and fonts as I go" | That is how AI slop is made. Pick them deliberately via designer. | -| "Dark mode will just invert the light mode colors" | No it will not. This is the exact anti-pattern designer exists to prevent. | -| "The designer skill is slow" | The skill takes 2 minutes. Shipping wrong design takes 2 weeks to undo. | +| "Small component, no full DESIGN.md needed" | Wrong decisions ship. Design it. | +| "I'll use default shadcn styles" | Unexamined defaults = AI slop. | +| "User said 'just make it work'" | Means "make sense visually." Needs design. | +| "I know what SaaS dashboard looks like" | Know AI-slop version. Designer prevents that. | +| "I'll fix design after user sees code" | AI slop fingerprint sticky. Users stop caring first. | +| "MCP tools overkill" | You don't decide. Call them. | +| "I'll generate DESIGN.md after coding" | Post-hoc justification ≠ design. Design FIRST. | +| "User iterating fast, no time for gate" | Speed ≠ permission to ship slop. Gate first. | +| "Quick mockup only" | Quick mockups become shipped products. | +| "Figma has design, I'll translate" | No design resolution = absolute/relative dump. | +| "I'll pick colors as I go" | How AI slop made. Pick deliberately. | +| "Dark mode = invert light mode" | No. Exact anti-pattern this skill prevents. | +| "Skill is slow" | 2 min. Wrong design = 2 weeks to undo. | --- -## Position in the Hyperstack Workflow +## Position in Hyperstack Workflow ``` - ┌─────────────────────────────────────┐ - user request │ │ - │ │ Upstream: │ - ▼ │ - hyperstack (root orchestrator) │ - ┌───────────┐ │ - blueprint (visual routing) │ - │ blueprint │─── visual? ──┼─▶ designer (THIS SKILL) │ - └───────────┘ │ │ - │ │ Produces: │ - │ non-visual │ - DESIGN.md contract (file) │ - │ │ │ - ▼ │ Downstream consumers: │ - ┌───────────┐ │ - forge-plan (reads DESIGN.md) │ - │ forge-plan│◀─ DESIGN.md ─┤ - shadcn-expert (per-section code) │ - └───────────┘ │ - motion_generate_animation │ - │ │ - design_tokens_generate │ - ▼ │ - behaviour-analysis (audit spec) │ - execution │ - ship-gate (compliance check) │ - │ │ │ - ▼ │ Reverse escalation (allowed): │ - ┌───────────┐ │ - forge-plan → designer │ - │ ship-gate │ │ (if visual gap discovered) │ - └───────────┘ │ - behaviour-analysis → designer │ - │ │ (if expected behavior unclear) │ - ▼ │ │ - deliver └─────────────────────────────────────┘ +user request → blueprint (visual routing) → designer (THIS) → DESIGN.md + ↓ + forge-plan → execution → ship-gate → deliver + +Downstream: forge-plan, shadcn-expert, motion_generate_animation, design_tokens_generate, behaviour-analysis, ship-gate +Reverse escalation: forge-plan → designer (gap), behaviour-analysis → designer (unclear), ship-gate → designer (compliance fail) ``` -## The Three-Layer Stack +## Three-Layer Stack | Layer | Plugin | Question | Tools | |---|---|---|---| -| **Decision** | `designer` (this skill) | Which design? | 17 MCP tools | -| **Rules** | `ui-ux` | What principles? | 6 MCP tools | -| **Values** | `design-tokens` | What exact CSS? | 7 MCP tools | -| **Components** | `shadcn` | Which components to compose? | 4 MCP tools | -| **Motion** | `motion` | Exact animation code? | 7 MCP tools | +| Decision | `designer` (this) | Which design? | 17 MCP tools | +| Rules | `ui-ux` | What principles? | 6 MCP tools | +| Values | `design-tokens` | What exact CSS? | 7 MCP tools | +| Components | `shadcn` | Which components? | 4 MCP tools | +| Motion | `motion` | Exact animation code? | 7 MCP tools | --- ## Website Experience Non-Negotiables -Visual quality is necessary, but it is not sufficient. Hyperstack designs must also -produce good website experience under real usage. A page that "looks premium" but is -slow, confusing, inaccessible, or conversion-hostile is a failed design. - -Every DESIGN.md must explicitly resolve these seven questions. If any are missing, -the design is incomplete: - -1. **Primary path** - What is the user's main job-to-be-done on this page, and what - is the single primary action? -2. **Information scent** - Can the user quickly answer "Where am I, what can I do, - and what happens next?" -3. **State coverage** - What do loading, empty, error, success, disabled, and - destructive states look like? -4. **Form and auth friction** - Are labels persistent, validation humane, paste - allowed, and password managers supported? -5. **Performance budget** - What are the target budgets for LCP, INP, CLS, and - payload-sensitive media? -6. **Accessibility floor** - How are focus visibility, focus not obscured, target - size, reduced motion, and keyboard usage handled? -7. **Responsive content priority** - What survives first on mobile, and what gets - de-emphasized or deferred? - -Use [website-experience-cheatsheet](references/website-experience-cheatsheet.md) -while resolving these. - -**Do not let visual style erase usability.** +Every DESIGN.md must resolve these 7: -## User Preferences Override Defaults +1. **Primary path** — user's main JTBD + single primary action +2. **Information scent** — "Where am I, what can I do, what happens next?" +3. **State coverage** — loading, empty, error, success, disabled, destructive +4. **Form/auth friction** — labels persistent, validation humane, paste allowed, password managers supported +5. **Performance budget** — LCP, INP, CLS, payload-sensitive media targets +6. **Accessibility floor** — focus visibility, focus not obscured, target size, reduced motion, keyboard usage +7. **Responsive content priority** — what survives first on mobile, what deferred -Auto-resolved defaults, presets, and recommendations are starting points only. -They are not allowed to override explicit user preferences, brand language, -existing workspace conventions, or product constraints. +Use [website-experience-cheatsheet](references/website-experience-cheatsheet.md). -Priority order for visual decisions: +## User Preferences Override Defaults -1. Explicit user preferences and constraints -2. Existing workspace reality (current framework, component library, design - system, tokens, and major frontend patterns) -3. Approved product or brand requirements -4. Designer auto-resolved defaults and presets +Priority order: +1. Explicit user preferences + constraints +2. Existing workspace reality (framework, component lib, design system, tokens, frontend patterns) +3. Approved product/brand requirements +4. Designer auto-resolved defaults -If a user says "use these colors", "keep our current design system", "match this -existing app shell", or "do not use shadcn", that preference wins even if the -auto-resolved defaults would suggest something else. +User says "use these colors", "keep current design system", "match this app shell", "no shadcn" → preference wins. -Treat auto-resolved defaults as suggestions to confirm or replace, never as -authority. +--- # PHASE 1: INTENT EXTRACTION -Two modes. Default to **Base** unless user says "advanced" or "detailed." +Two modes. Default **Base** unless user says "advanced" or "detailed." ## Base Mode (3 Questions + Confirm) -**Step 0:** If this is an existing project, inspect the workspace first: -- identify framework and package manifests -- identify current component library and token system -- identify core frontend files for the active surface -- identify any explicit visual preferences already encoded in the repo +**Step 0:** Existing project → inspect workspace: framework, package manifests, component lib, token system, core frontend files, explicit visual prefs in repo. -**Step 1:** Call `designer_resolve_intent` with product description. Auto-detects: industry, personality, style, mode, density, color mood, must-haves, never-uses. +**Step 1:** Call `designer_resolve_intent(product_description)`. Auto-detects: industry, personality, style, mode, density, color mood, must-haves, never-uses. **Step 2:** Ask 3 essential questions: | # | Question | Why | |---|---|---| -| 1 | What is the product? (1 sentence) | Everything derives from this | -| 2 | Brand color? (hex, name, or "generate") | Can't guess someone's brand | +| 1 | What is product? (1 sentence) | Everything derives from this | +| 2 | Brand color? (hex, name, or "generate") | Can't guess brand | | 3 | What sections/pages to build? | What to implement | -**Step 3:** Present auto-resolved defaults as suggestions, not decisions. Explicitly -ask whether any user preferences or existing workspace patterns should override -them. Then offer: *"Say 'advanced' for full control, or pick a preset to start from."* +**Step 3:** Present auto-resolved defaults as suggestions. Ask if user prefs or workspace patterns override. Offer: *"Say 'advanced' for full control, or pick preset to start."* ## Presets (Fast Start) -If user says "make it feel like Linear" or "start from Stripe" or "use the Notion style" - call `designer_get_preset(name)` and use it as the DESIGN.md foundation. Customize brand color only. +User says "make it feel like Linear" or "start from Stripe" → `designer_get_preset(name)`, use as DESIGN.md foundation, customize brand color only. | Preset | Best For | Key Trait | |---|---|---| -| `linear` | SaaS, productivity tools | Opacity hierarchy, 8px grid, snappy 150ms | -| `stripe` | Payment, docs, premium SaaS | Weight 300/500, CIELAB contrast, editorial polish | -| `vercel` | Dev tools, technical products | -0.04em tracking, zero chromatic bias, 96px sections | +| `linear` | SaaS, productivity | Opacity hierarchy, 8px grid, 150ms | +| `stripe` | Payment, docs, premium SaaS | Weight 300/500, CIELAB contrast | +| `vercel` | Dev tools, technical | -0.04em tracking, zero chromatic bias | | `apple` | Consumer, mobile-first | 17px body, spring physics, 44pt targets | -| `carbon` | Enterprise, regulated industries | Zero radius, IBM Plex, WCAG AA out of box | -| `shadcn` | Any React + Tailwind project | OKLCH, opacity borders, brand-agnostic default | -| `notion` | Content, editorial, notes | Warm cream bg, serif headings, 65ch prose | -| `supabase` | Developer tools, dark-first | Emerald on black, compact, code-native | +| `carbon` | Enterprise, regulated | Zero radius, IBM Plex, WCAG AA | +| `shadcn` | React + Tailwind | OKLCH, opacity borders, brand-agnostic | +| `notion` | Content, editorial | Warm cream bg, serif headings, 65ch prose | +| `supabase` | Dev tools, dark-first | Emerald on black, compact, code-native | | `figma` | Creative tools, startups | Multi-color, spring animations, vivid | -Call `designer_list_presets` to show all with details. Call `designer_get_preset(name)` for full token config + CSS. - -**Preset workflow:** Preset fills Sections 1-7 of DESIGN.md automatically. You only need to customize: brand color, specific sections/pages, and industry-specific do's/don'ts. +Call `designer_list_presets` to show all. Preset fills Sections 1-7 automatically. Customize: brand color, sections/pages, industry do's/don'ts. ## Advanced Mode (12 Questions) -Call `designer_resolve_intent` first. Show suggested default alongside each question. Present in batches of 3-4 (Hick's Law). +Call `designer_resolve_intent` first. Show suggested default per question. Present in batches of 3-4 (Hick's Law). -### Q1: What is the product? (1 sentence) -Determines industry category, anti-pattern set, style priority. - -### Q2: Who is the primary user? +**Q1:** Product? (1 sentence) → determines industry, anti-pattern set, style priority +**Q2:** Primary user? | User Type | Defaults | |---|---| -| Developer | Dark default, monospace accents, keyboard-first, compact density | -| Consumer | Light default, friendly typography, mobile-first, comfortable density | +| Developer | Dark default, monospace accents, keyboard-first, compact | +| Consumer | Light default, friendly typography, mobile-first, comfortable | | Enterprise | Structured, conservative, data-dense, normal density | -| Child | Playful, large touch targets (48px+), high contrast, claymorphism | +| Child | Playful, 48px+ targets, high contrast, claymorphism | | Creative | Rich motion, bold colors, portfolio-native | -| Healthcare | Calm, accessible (AAA), large text, minimal motion | - -### Q3: What emotional target? +| Healthcare | Calm, AAA, large text, minimal motion | +**Q3:** Emotional target? | Target | Visual Direction | |---|---| -| Trustworthy | Professional palette, serif or clean sans, conservative radius | -| Playful | Vivid colors, rounded shapes (16-24px), spring animations | -| Premium | Tight tracking (-0.02em+), generous whitespace, single accent, subtle shadows | -| Energetic | High chroma (C 0.15+), large type (32px+ headings), rich motion | -| Calm | Muted palette, warm neutrals, generous line height, minimal motion | -| Technical | Dark default, monospace accents, compact density, snappy motion | -| Bold | Maximum contrast, large type, strong color blocks | -| Editorial | Serif headings, generous reading (18px body, 1.75 line-height), warm backgrounds | - -### Q4: Light or dark default? -Not a preference - a product decision. Developer tools → dark. Marketing → light. Editorial → light. Gaming → dark. Dashboards → either, but intentional. - -### Q5: Brand color? -If given: extract hue, derive OKLCH ramp (11 stops). If "generate": pick from industry color mood. - +| Trustworthy | Professional palette, serif/clean sans, conservative radius | +| Playful | Vivid colors, 16-24px radius, spring animations | +| Premium | -0.02em+ tracking, generous whitespace, single accent, subtle shadows | +| Energetic | C 0.15+, 32px+ headings, rich motion | +| Calm | Muted palette, warm neutrals, generous lh, minimal motion | +| Technical | Dark default, monospace accents, compact, snappy motion | +| Bold | Max contrast, large type, strong color blocks | +| Editorial | Serif headings, 18px body 1.75lh, warm bg | + +**Q4:** Light or dark default? Product decision, not preference. Developer tools → dark. Marketing → light. Editorial → light. Gaming → dark. + +**Q5:** Brand color? Given: extract hue, derive OKLCH ramp (11 stops). "generate": pick from industry color mood. | Industry | Color Mood | |---|---| | SaaS | Trust blue + single accent | | Healthcare | Calm blue + health green | | Fintech | Navy + trust blue + gold | -| Luxury | Black + gold, minimal palette | +| Luxury | Black + gold, minimal | | AI/Tech | Neutral + one distinct (NOT #6366F1) | | Education | Friendly pastels, warm accents | -| Wellness | Earth tones, sage green, soft coral | - -### Q6: Density? +| Wellness | Earth tones, sage, soft coral | -| Mode | Section Padding | Card Padding | Body Size | Use | +**Q6:** Density? +| Mode | Section Padding | Card Padding | Body | Use | |---|---|---|---|---| | Comfortable | 96px | 40px | 18px | Marketing, editorial, consumer | | Normal | 64px | 28px | 16px | SaaS, dashboards, apps | | Compact | 48px | 20px | 14px | Data tables, admin, dev tools | -### Q7: Design style? -7 primary: minimalism, glassmorphism, soft-ui, dark-oled, vibrant-block, claymorphism, aurora-ui. If "recommend": resolved from industry + emotional target. - -### Q8: Font personality? +**Q7:** Style? minimalism / glassmorphism / soft-ui / dark-oled / vibrant-block / claymorphism / aurora-ui. "recommend": resolved from industry + emotional target. +**Q8:** Font personality? | Personality | Pairing | Use | |---|---|---| | Technical | Geist + Geist Mono | Dev tools, SaaS, dashboards | | Elegant | Cormorant + Montserrat | Luxury, editorial, premium | | Friendly | Plus Jakarta Sans + mono | Consumer, education, SaaS | -| System | Inter (or system stack) | Universal, no strong personality | -| Editorial | Playfair Display + Lora | Content sites, blogs, news | - -### Q9: Motion level? +| System | Inter (or system stack) | Universal | +| Editorial | Playfair Display + Lora | Content, blogs, news | -| Level | What It Includes | +**Q9:** Motion level? +| Level | Includes | |---|---| -| Static | No animations at all | -| Subtle | Hover states + transitions only (150-200ms) | +| Static | No animations | +| Subtle | Hover states + transitions (150-200ms) | | Moderate | + scroll reveals, micro-interactions (200-300ms) | -| Rich | + parallax, page transitions, animated backgrounds (300-500ms) | +| Rich | + parallax, page transitions, animated bg (300-500ms) | -Always respects `prefers-reduced-motion` regardless of level. +Always respects `prefers-reduced-motion`. -### Q10: Sections/pages? -Landing: Hero, Features, Testimonials, CTA, Footer, Pricing, FAQ. Dashboard: Sidebar, Header, Content, Data panels. Apps: Navigation, Content, Modals, Forms, Empty states. +**Q10:** Sections/pages? Landing: Hero, Features, Testimonials, CTA, Footer, Pricing, FAQ. Dashboard: Sidebar, Header, Content, Data panels. -### Q11: Framework + Component Library? +**Q11: Framework + Component Library (TWO sub-questions):** -**Two sub-questions - ask both:** +Q11a Framework: React + Tailwind v4 / Next.js + Tailwind v4 / Vue + Tailwind / Svelte + Tailwind / HTML + Tailwind / Other -**Q11a - Framework:** -- React + Tailwind v4 (most common) -- Next.js + Tailwind v4 -- Vue + Tailwind -- Svelte + Tailwind -- HTML + Tailwind (no framework) -- Other (specify) +Q11b Component Library: +- **shadcn/ui (Base UI)** → invokes `hyperstack:shadcn-expert`, uses `shadcn_*` MCP tools +- **Raw Tailwind** → hand-built from DESIGN.md, no lib +- **MUI / Mantine / Chakra / Ant Design** → use library's own docs (no hyperstack plugin) +- **Custom / existing** → read user's components, match patterns +- **Ask me to recommend** → recommend shadcn/ui for React+Tailwind, or raw Tailwind for max control -**Q11b - Component Library:** -- **shadcn/ui (Base UI edition)** - invokes `hyperstack:shadcn-expert`, uses `shadcn_*` MCP tools -- **Raw Tailwind** - no component library, hand-built primitives from DESIGN.md -- **Material UI** - use its component catalog (no hyperstack plugin yet) -- **Mantine** - use its component catalog (no hyperstack plugin yet) -- **Chakra UI** - use its component catalog (no hyperstack plugin yet) -- **Ant Design** - enterprise component library (no hyperstack plugin yet) -- **Custom / existing design system** - user's own components -- **Ask me to recommend** - designer picks based on personality + industry +**DO NOT assume shadcn by default.** Ask explicitly. Libraries have incompatible architectures. -**Do NOT assume shadcn by default.** If the user doesn't answer, ask explicitly. Different component libraries have incompatible architectures (Radix vs Base UI vs MUI primitives vs handcrafted). +Routing: `shadcn/ui` → `hyperstack:shadcn-expert` | `Raw Tailwind` → forge-plan hand-writes | `Other` → library's own docs, flag to user | `Custom` → read existing first -**Routing based on Q11b answer:** -- `shadcn/ui` → `hyperstack:shadcn-expert` handles component work; forge-plan calls `shadcn_*` tools -- `Raw Tailwind` → forge-plan hand-writes components from DESIGN.md Section 5 spec directly (no library wrapper) -- `Other library` → forge-plan uses the library's own docs; hyperstack has no plugin; flag this to user -- `Custom/existing` → read user's existing components first; match their patterns -- `Ask me to recommend` → recommend shadcn/ui for React+Tailwind, or raw Tailwind if user wants maximum control - -### Q12: Constraints? -WCAG AA (default) or AAA. Performance budget (< 150KB JS, < 2s load). Dark mode required. Brand keywords. +**Q12:** Constraints? WCAG AA (default) or AAA. Performance budget (< 150KB JS, < 2s load). Dark mode required. Brand keywords. **Do NOT proceed to Phase 2 until Q1, Q5, Q10 answered.** @@ -374,187 +269,177 @@ WCAG AA (default) or AAA. Performance budget (< 150KB JS, < 2s load). Dark mode # PHASE 2: DESIGN SYSTEM RESOLUTION -Every MCP call must fill a specific section of the DESIGN.md. No call without a purpose. - -## Core Calls (Every Design Task - 4 calls) +Every MCP call fills specific DESIGN.md section. No call without purpose. -These 4 calls fill 80% of the DESIGN.md. Run them in parallel. +## Core Calls (Every Design Task — 4 calls, run in parallel) ### Call 1: `designer_resolve_intent(product_description)` -**FILLS:** All sections (defaults for everything) -**PURPOSE:** Auto-detects industry, personality, style, mode, density, color mood, must-haves, never-uses. Without this, you're guessing. -**USE RESULT TO:** Set defaults for the entire DESIGN.md. Present to user for confirmation in Phase 1. +**FILLS:** All sections (defaults) +**PURPOSE:** Auto-detects industry, personality, style, mode, density, color mood, must-haves, never-uses. +**USE:** Set defaults for entire DESIGN.md. Present to user in Phase 1. ### Call 2: `designer_get_personality(resolved_cluster)` -**FILLS:** Section 1 (theme), Section 2 (color direction), Section 3 (typography), Section 4 (spacing), Section 6 (motion), Section 7 (elevation) -**PURPOSE:** Returns the concrete visual vocabulary - specific tracking values, radius range, shadow style, motion timing, density, CSS example. This is the single most important data source for the DESIGN.md. -**USE RESULT TO:** Set every visual property. The personality vocabulary IS the design system skeleton. +**FILLS:** Sections 1, 2, 3, 4, 6, 7 +**PURPOSE:** Concrete visual vocabulary — tracking, radius range, shadow style, motion timing, density, CSS example. Single most important data source. +**USE:** Set every visual property. Personality vocabulary IS design system skeleton. ### Call 3: `designer_get_page_template(page_type)` -**FILLS:** Section 5 (components), Section 9 (responsive) -**PURPOSE:** Returns section anatomy with component inventory and which cognitive laws apply to this page type. Without this, you're inventing sections from scratch. -**USE RESULT TO:** Define what sections to build, what components each needs, what responsive behavior each requires. +**FILLS:** Sections 5, 9 +**PURPOSE:** Section anatomy with component inventory + cognitive laws for this page type. +**USE:** Define sections to build, components each needs, responsive behavior. ### Call 4: `designer_get_anti_patterns(industry: resolved_industry)` -**FILLS:** Section 8 (do's/don'ts), Section 10 (anti-patterns) -**PURPOSE:** Returns the specific violations this industry must avoid. Without this, you might put AI purple on a bank or neon on a healthcare app. -**USE RESULT TO:** Write the Do's/Don'ts section and the anti-pattern checklist. Every "Don't" must come from this list. +**FILLS:** Sections 8, 10 +**PURPOSE:** Specific violations this industry must avoid. +**USE:** Write Do's/Don'ts + anti-pattern checklist. Every "Don't" must come from this list. -## Context Calls (Only When the Product Needs Them) +## Context Calls (Only When Product Needs Them) -These are NOT routine. Call ONLY when the product has these specific features. +NOT routine. Call ONLY when product has these specific features: -| Product Feature | Call | FILLS | WHY (what decision it changes) | +| Product Feature | Call | FILLS | WHY (what decision changes) | |---|---|---|---| -| **Landing page** | `designer_get_landing_pattern("hero-section")` | Section 5 | Conversion stats change hero layout: value prop in 3s, CTA above fold, 40-80px bleed | -| **Landing page** | `designer_get_landing_pattern("section-ordering")` | Section 5 | Unbounce 41K pages: Hero→Proof→Problem→Features→Testimonials→Pricing→FAQ→CTA | -| **Landing page** | `designer_get_landing_pattern("social-proof")` | Section 5 | Named metrics (+30-70%) vs logos (+260%) vs badges (+55%) changes proof section design | -| **Landing page** | `designer_get_landing_pattern("cta-optimization")` | Section 8 | First-person CTAs +90%, single CTA +266%, "no credit card" +34% | -| **Pricing page** | `designer_get_landing_pattern("pricing-psychology")` | Section 5 | Ariely decoy changes tier structure: 3 tiers, highlight middle, expensive first | -| **Forms** | `designer_get_interaction_pattern("form-design")` | Section 5 | Validation timing (blur not input), label placement (top not placeholder), max field count | -| **Navigation** | `designer_get_interaction_pattern("navigation")` | Section 5 | Hamburger is 39% slower on desktop (NNG). Tab bars +58% engagement. Changes nav type. | -| **Onboarding** | `designer_get_interaction_pattern("onboarding")` | Section 5 | 3-5 checklist items outperform 8+. Interactive > passive. Changes onboarding structure. | -| **Data tables** | `designer_get_interaction_pattern("skeleton-vs-spinner")` | Section 6 | Skeleton for known structure, spinner for discrete actions. Changes loading pattern. | -| **Error handling** | `designer_get_ux_writing("error-messages")` | Section 8 | NNG rubric: what happened + why + how to fix. Changes error message format. | -| **CTAs/buttons** | `designer_get_ux_writing("button-labels")` | Section 8 | "Start my trial" +90% vs "Start your trial". Changes button copy strategy. | -| **Premium feel** | `designer_get_design_system("stripe")` or `("vercel-geist")` | Section 1 | Specific values to reference: Stripe weight 300/500, Vercel -0.04em tracking | -| **Enterprise** | `designer_get_design_system("ibm-carbon")` | Section 1 | Carbon's 12px spacing-04, IBM Plex, a11y-first component architecture | - -## Token Calls (Phase 5 only - when generating code) - -Do NOT call these during design resolution. Call them when writing actual CSS. - +| Landing page | `designer_get_landing_pattern("hero-section")` | S5 | Conversion stats change hero layout: value prop 3s, CTA above fold, 40-80px bleed | +| Landing page | `designer_get_landing_pattern("section-ordering")` | S5 | Unbounce 41K pages: Hero→Proof→Problem→Features→Testimonials→Pricing→FAQ→CTA | +| Landing page | `designer_get_landing_pattern("social-proof")` | S5 | Named metrics (+30-70%) vs logos (+260%) vs badges (+55%) | +| Landing page | `designer_get_landing_pattern("cta-optimization")` | S8 | First-person CTAs +90%, single CTA +266%, "no credit card" +34% | +| Pricing page | `designer_get_landing_pattern("pricing-psychology")` | S5 | Ariely decoy: 3 tiers, highlight middle, expensive first | +| Forms | `designer_get_interaction_pattern("form-design")` | S5 | Validation timing (blur not input), label placement (top not placeholder) | +| Navigation | `designer_get_interaction_pattern("navigation")` | S5 | Hamburger 39% slower on desktop (NNG). Tab bars +58% engagement. | +| Onboarding | `designer_get_interaction_pattern("onboarding")` | S5 | 3-5 checklist items > 8+. Interactive > passive. | +| Data tables | `designer_get_interaction_pattern("skeleton-vs-spinner")` | S6 | Skeleton for known structure, spinner for discrete actions | +| Error handling | `designer_get_ux_writing("error-messages")` | S8 | NNG rubric: what happened + why + how to fix | +| CTAs/buttons | `designer_get_ux_writing("button-labels")` | S8 | "Start my trial" +90% vs "Start your trial" | +| Premium feel | `designer_get_design_system("stripe")` or `("vercel-geist")` | S1 | Stripe weight 300/500, Vercel -0.04em tracking | +| Enterprise | `designer_get_design_system("ibm-carbon")` | S1 | Carbon 12px spacing-04, IBM Plex, a11y-first | + +## Token Calls (Phase 5 only — when generating code) + +Do NOT call during design resolution: ``` -design_tokens_get_category("colors") → OKLCH ramp construction procedure -design_tokens_get_category("typography") → type scale token definitions -design_tokens_get_category("spacing") → 4px grid token definitions -design_tokens_generate(description) → generate complete Tailwind v4 CSS +design_tokens_get_category("colors") → OKLCH ramp construction +design_tokens_get_category("typography") → type scale token defs +design_tokens_get_category("spacing") → 4px grid token defs +design_tokens_generate(description) → complete Tailwind v4 CSS ``` --- # PHASE 3: CONSTRAINT APPLICATION -Cross-reference every decision against the rules below. - ---- - -# DESIGN RULES BY PRIORITY - -*Follow P1→P10. Higher priority = fix first. Every rule has a source.* +Cross-reference every decision against rules below. P1 → P10. Higher = fix first. ## P1: Accessibility (CRITICAL) | Rule | Standard | Avoid | |---|---|---| -| `contrast-body` | 4.5:1 minimum for body text (AA); 7:1 for AAA | Testing only in light mode | -| `contrast-large` | 3:1 for text >= 18px bold or >= 24px | Assuming brand colors pass | -| `contrast-ui` | 3:1 for UI components, borders, icons | Low-contrast borders in dark mode | -| `focus-rings` | 2px ring, 2px offset, primary color on ALL interactive elements | `outline: none` without replacement | -| `touch-targets` | Min 44x44px (WCAG 2.5.5); recommended 48x48px; gap >= 8px | Touch targets < 44px on mobile | -| `color-not-only` | Color + icon/text for every state (error, success, warning) | Red border as sole error indicator | -| `reduced-motion` | `prefers-reduced-motion: reduce` with `!important` in `@layer base` | Missing media query (WCAG 2.3.3) | +| `contrast-body` | 4.5:1 body (AA); 7:1 AAA | Testing light mode only | +| `contrast-large` | 3:1 for ≥18px bold or ≥24px | Assuming brand colors pass | +| `contrast-ui` | 3:1 UI components, borders, icons | Low-contrast borders in dark | +| `focus-rings` | 2px ring, 2px offset, primary color, ALL interactive | `outline: none` without replacement | +| `touch-targets` | 44x44px min (WCAG); 48x48px recommended; 8px gap | Targets < 44px mobile | +| `color-not-only` | Color + icon/text for every state | Red border as sole error indicator | +| `reduced-motion` | `prefers-reduced-motion: reduce` with `!important` in `@layer base` | Missing media query | | `keyboard-nav` | Tab order = visual order; Enter/Space activates; Escape closes | Unreachable interactive elements | -| `skip-links` | `Skip to main content` as first body element | No skip link on nav-heavy pages | -| `alt-text` | Descriptive for informational images; `alt=""` for decorative | `alt="image"` or missing alt | -| `aria-labels` | `aria-label` on icon-only buttons (on the button, not the icon) | Unlabeled icon buttons | -| `heading-hierarchy` | Sequential h1→h2→h3, no skipping levels | h1 → h3 (skipped h2) | -| `zoom-support` | Layout works at 400% zoom; never `user-scalable=no` | Disabling pinch-to-zoom | -| `semantic-html` | `