From cfd47a712d23f039abc728cfe3da7c34b6d40b46 Mon Sep 17 00:00:00 2001 From: Kailas Mahavarkar <66670953+KailasMahavarkar@users.noreply.github.com> Date: Tue, 14 Apr 2026 15:14:31 +0530 Subject: [PATCH 1/2] fix: require verification evidence in message body and auto-invoke designer_verify_implementation Addresses two critical enforcement gaps from behaviour analysis: 1. VERIFICATION CLAIMS UNVERIFIED - Add Evidence Format section with concrete examples - Require actual command output pasted in message (not "I ran it") - Add red flags: "I ran tests" without output = no claim 2. DESIGN.md COMPLIANCE NOT AUTO-CHECKED - Replace manual grep checks with automated designer_verify_implementation tool - Require tool output as compliance evidence - Add red flags: manual checks miss edge cases, use the tool Changes: - Add "Evidence Format" section after "The Gate" with examples for tests, types, builds, MCP patterns - Clarify: "No output = no claim. Period." - Replace DESIGN.md Compliance Gate 10-row manual checklist with auto-invoke tool - Add Step 1 (Auto-Invoke), Step 2 (Show Output), Step 3 (Handle Failures) - Add 5 new red flags targeting common rationalizations around evidence and DESIGN.md Impact: Blocks two major verification loopholes. Moves compliance checking from optional manual steps to mandatory tool invocation. Co-Authored-By: Claude Haiku 4.5 --- skills/ship-gate/SKILL.md | 119 ++++++++++++++++++++++++++++++++++---- 1 file changed, 108 insertions(+), 11 deletions(-) diff --git a/skills/ship-gate/SKILL.md b/skills/ship-gate/SKILL.md index df9db87..f850650 100644 --- a/skills/ship-gate/SKILL.md +++ b/skills/ship-gate/SKILL.md @@ -34,6 +34,58 @@ Before claiming any status or expressing satisfaction: Skipping any step = lying, not verifying. +**CRITICAL: Evidence must be visible in this message.** Not "I ran the command." Not "I checked earlier." Actual command output, pasted into this message, proves your claim. No output = no claim. + +## Evidence Format + +Every completion claim requires evidence embedded in your message. Here is what valid evidence looks like: + +### For test claims: +``` +✅ Tests pass: + +$ npm test +PASS src/components/__tests__/Button.test.tsx + Button + ✓ renders with label (5ms) + ✓ handles click event (3ms) + ✓ disables when prop is set (2ms) + +Test Suites: 1 passed, 1 total +Tests: 3 passed, 3 total +``` + +**Not valid:** "Tests pass" with no output. "I ran npm test" with no results. "Should pass" with no evidence. + +### For type checks: +``` +✅ Types check: + +$ tsc --noEmit +[no output = success] +``` + +### For build claims: +``` +✅ Build succeeds: + +$ npm run build +[build output showing exit 0] +``` + +### For MCP-verified patterns: +``` +✅ Code matches MCP output: + +MCP tool: reactflow_get_api(Handle) +Output: props = { children, position, type } + +Code implementation: + +``` + +**If you cannot show evidence, you cannot make the claim.** Period. + ## Verification Map | Claim | What is required | What is not sufficient | @@ -50,21 +102,61 @@ Skipping any step = lying, not verifying. ## DESIGN.md Compliance Gate (visual/UX tasks only) -If the task involved `hyperstack:designer` (a DESIGN.md exists), the completion claim must pass this gate: +If the task involved `hyperstack:designer` (a DESIGN.md exists in the repo), the completion claim must pass this automated gate: + +### Step 1: Auto-Invoke Verification Tool -| DESIGN.md Section | Verification Command/Check | +Before claiming any visual task is complete, you MUST run the automated compliance checker: + +```bash +# Call the MCP tool directly (or via your agent framework) +designer_verify_implementation( + design_md_path: "path/to/DESIGN.md", + code_paths: ["src/components/**/*.tsx", "src/styles/**/*.css"] +) +``` + +This tool programmatically verifies all 10 DESIGN.md sections against your code: + +| DESIGN.md Section | What the tool checks | |---|---| -| 2. Color Palette | `grep -r "oklch" ` - all OKLCH tokens present. Run contrast checker. | -| 3. Typography | Font family loaded. Type scale tokens defined. Tracking values match. | -| 4. Spacing | Spacing tokens defined on 4px grid. No arbitrary pixel values. | -| 5. Components | For EACH component: `grep` for ALL required states (default/hover/focus/active/disabled/loading). `grep` for semantic HTML (`