Skip to content

fix: require verification evidence and auto-invoke DESIGN.md compliance checks#48

Closed
KailasMahavarkar wants to merge 103 commits intomainfrom
fix/ship-gate-enforcement-evidence
Closed

fix: require verification evidence and auto-invoke DESIGN.md compliance checks#48
KailasMahavarkar wants to merge 103 commits intomainfrom
fix/ship-gate-enforcement-evidence

Conversation

@KailasMahavarkar
Copy link
Copy Markdown
Collaborator

Summary

Fixes two critical enforcement gaps identified in behaviour analysis:

  1. Verification claims unverified - Agents could claim "tests pass" without showing output
  2. DESIGN.md compliance not auto-checked - Agents manually checked compliance instead of using the automated tool

Changes

Fix 1: Require Verification Output in Message Body

  • Add "Evidence Format" section with concrete examples (tests, types, builds, MCP patterns)
  • Explicitly require command output pasted into message, not just "I ran it"
  • Add 2 new red flags targeting this behavior

Fix 2: Auto-Invoke designer_verify_implementation for DESIGN.md

  • Replace 10-row manual checklist (grep checks, eyeballing) with automated tool
  • Add Step 1: Auto-invoke verification tool
  • Add Step 2: Show tool output as compliance evidence
  • Add Step 3: Handle failures (fix or escalate)
  • Add 3 new red flags targeting manual checks and DESIGN.md blocking

Testing

  • Verified ship-gate skill structure is syntactically valid
  • Verified frontmatter and markdown formatting
  • Verified new sections integrate with existing content

Impact

Blocks two major verification loopholes. Moves compliance checking from optional manual steps to mandatory tool invocation with required evidence output.

Addresses:

🤖 Generated with Claude Code

KailasMahavarkar and others added 30 commits April 8, 2026 20:19
* feat: add 7 new plugins with snippet-based architecture

Adds lenis, react, echo, golang, rust, design-tokens, and ui-ux plugins
to unified-mcp. All code examples are stored as .md files in snippets/
and loaded at runtime via createSnippetLoader — no inline template
literals in any data.ts file.

- 370 snippet files across 7 new plugins
- Each plugin follows the established reactflow/motion pattern
- golang merges best-practices + design-patterns from two skill sources
- design-tokens covers full Tailwind v4 OKLCH token system with procedures
- ui-ux covers typography, color, spacing, elevation, motion, a11y principles
- All plugins registered in src/index.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: update README with all 9 plugins and snippet architecture

Adds all 7 new plugins (lenis, react, echo, golang, rust, design-tokens,
ui-ux) to the plugins table, tools section, and architecture diagram.
Each plugin section is collapsible. Architecture section now documents
the snippet-based .md file pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: improve README badges - split into rows, proper logos per library

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…rop Go logo

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat: migrate unified-skill into unified-mcp repo
chore: rename project to hyperstack
refactor: encapsulate snippets inside plugins and remove dist compilation
ci: add github action to publish ghcr docker image and update readme
feat: integrate 10 new static engineering skills
This reverts commit 64b8a1e, reversing
changes made to f28e2a8.
fix: move skills out of MCP plugins into top-level directory
chore: remove redundant and unrelated skills
docs: rewrite README following evidence-writer rules
KailasMahavarkar and others added 27 commits April 10, 2026 18:10
Uses GH_PACKAGES_PAT (admin:packages scope) to call GitHub API
and set the container package to public on every new release.
GITHUB_TOKEN alone cannot change visibility.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fix(ci): revert to ghcr.io from Docker Hub
Triggers first successful ghcr.io push + auto-sets package public
via GH_PACKAGES_PAT after build.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GitHub API PATCH for org package visibility is restricted - must be
set manually once via UI. Package is now public, step is dead weight.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
chore(ci): remove dead Make package public step
Clean slate after deleting all test releases and packages.
Next version bump will create the first real v1.0.0 release
with a clean Docker image on ghcr.io.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… pre-check

- Step 2: one-liner for clone-or-pull so upgrades work without manual cleanup
- Option A: docker pull reminder so cached images don't run stale versions
- Step 4: pre-check command to validate MCP server starts before opening IDE
  (server must be running - this is a core requirement)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs(install): upgrade handling, docker pull reminder, server pre-check
…timeout note

MCP servers have a short init timeout. If Docker pulls the image during
startup it times out and reports failed. Fix: pull before configuring,
document this as a required step and in troubleshooting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs(install): require docker pull before MCP config
Previous docker run --rm pattern spawned a new container for every
claude CLI invocation. Combined with the server not exiting on stdin
close, this left orphaned containers running across sessions.

Fix:
- install.md Option A now documents a persistent `hyperstack-mcp`
  container started once with `sleep infinity` as PID 1, and each
  MCP session `docker exec`s a fresh bun process into it. One container
  total, zero per-invocation container startup cost.
- src/index.ts now handles stdin close / SIGTERM / SIGINT to exit
  cleanly when the client disconnects, preventing zombie bun processes
  inside the persistent container.
- Troubleshooting section adds a cleanup command for users migrating
  from the legacy docker run --rm config.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Triggers rebuild of ghcr.io image with stdin close handler fix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fix(docker): persistent container + exec pattern
- Step 2: adds Qwen Code row to skills clone/upgrade table with note
  explaining Qwen has no plugin system / SessionStart hook
- Step 3: adds MCP config file table with Qwen Code path
  (~/.qwen/settings.json) and a note about root-level mcpServers key
- Step 4: Verification 0 sanity check and adjusted Verification A/B
  to clearly call out platforms without hook support
- Step 5: clarifies hook expectation per platform
- Troubleshooting: adds Qwen-specific notes for config path and hook
  limitations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs(install): add Qwen Code support
… mode inversion

- @theme generates Tailwind utilities, NOT CSS custom properties on :root.
  If :root references var() values only defined in @theme, they resolve to
  undefined at runtime — all text/backgrounds go white.
- Before refactoring a site's color system, verify its existing aesthetic
  direction. Dark-first sites break if :root is set to light values.
fix(design-tokens): add gotchas for @theme var resolution and default mode inversion
The MCP config references 'hyperstack-mcp' but users often have a
differently-named container from a prior 'docker run' without --name.
This is the #1 cause of 'tool not found' errors.

- Add name mismatch repair script in Step 3 (before config)
- Add auto-repair in Step 4 pre-check verification
- Promote to #1 item in troubleshooting section
Existing containers cause 'docker run --name' to fail since the name
is already in use. They also may run stale images or have leftover
state. Add 'docker rm -f hyperstack-mcp 2>/dev/null' before container
creation in both fresh install and upgrade paths.
fix(install): always delete old hyperstack-mcp before creating new one
docs: clarify harness identity and install flow
…signer_verify_implementation

Addresses two critical enforcement gaps from behaviour analysis:

1. VERIFICATION CLAIMS UNVERIFIED
   - Add Evidence Format section with concrete examples
   - Require actual command output pasted in message (not "I ran it")
   - Add red flags: "I ran tests" without output = no claim

2. DESIGN.md COMPLIANCE NOT AUTO-CHECKED
   - Replace manual grep checks with automated designer_verify_implementation tool
   - Require tool output as compliance evidence
   - Add red flags: manual checks miss edge cases, use the tool

Changes:
- Add "Evidence Format" section after "The Gate" with examples for tests, types, builds, MCP patterns
- Clarify: "No output = no claim. Period."
- Replace DESIGN.md Compliance Gate 10-row manual checklist with auto-invoke tool
- Add Step 1 (Auto-Invoke), Step 2 (Show Output), Step 3 (Handle Failures)
- Add 5 new red flags targeting common rationalizations around evidence and DESIGN.md

Impact: Blocks two major verification loopholes. Moves compliance checking from optional manual steps to mandatory tool invocation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@KailasMahavarkar KailasMahavarkar deleted the fix/ship-gate-enforcement-evidence branch April 14, 2026 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant