diff --git a/.github/skills/experimental/hifi-prototype/SKILL.md b/.github/skills/experimental/hifi-prototype/SKILL.md new file mode 100644 index 000000000..f209d085d --- /dev/null +++ b/.github/skills/experimental/hifi-prototype/SKILL.md @@ -0,0 +1,182 @@ +--- +name: hifi-prototype +description: 'Opinionated scaffold and iteration loop for local-only high-fidelity prototypes that treat every build as a measurable experiment - Brought to you by microsoft/hve-core' +license: MIT +compatibility: 'Requires a web browser. Optional: Python 3.11+ (Flask), Node.js 18+ (Express), or .NET 8+ (Minimal API)' +metadata: + authors: "microsoft/hve-core" + spec_version: "1.0" + last_updated: "2026-04-10" +--- + +# High-Fidelity Prototype Builder + +## Overview + +Builds local-only, experiment-framed, intentionally-rough functional prototypes +with telemetry and Markdown reporting. Every prototype is an experiment with a +hypothesis, success criteria, and a clear way to know if it failed. + +Core design constraints: + +* A hypothesis and success criteria are required before scaffolding begins. +* Telemetry is built in from the start so every session produces measurable data. +* Rough UI is enforced deliberately to keep stakeholder feedback on behavior, not aesthetics. +* Everything runs locally with no cloud accounts, no deployments, no wait. +* Simulated components are visibly labeled so prototypes are never confused with production. +* Prototypes are disposable. If the experiment concluded, archive or delete it. + +## When to Use + +- Validating whether a concept works functionally before investing in production +- Testing user workflows with real-ish data and measuring actual behavior +- Building a prototype that needs to run on your machine with no cloud accounts +- Creating something stakeholders can click through while you watch what they do +- Generating structured experiment documentation alongside the prototype + +## When Not to Use + +- You need a polished, production-ready application +- The work requires cloud infrastructure, multi-user auth, or scalability +- You're past the experiment phase and need production code +- You only need a static mockup or wireframe (use Figma or paper) +- You need to deploy this for unsupervised remote user testing + +## Prerequisites + +No installation is required for the default HTML/CSS/JS stack. Open `index.html` in any modern browser. + +| Stack | Runtime | +|----------------|----------------------------------------| +| HTML (default) | Any modern browser | +| Python | Python 3.11+ with Flask | +| Node.js | Node.js 18+ with Express | +| .NET | .NET 8+ SDK | + +Optional dependencies: + +* OpenTelemetry SDK for backend telemetry (installed per-stack) +* An LLM provider (Ollama or remote API) only if simulation requires one + +## Inputs + +| Input | Required | Description | +|--------------------|----------|--------------------------------------------------------------------------------------------------------| +| Hypothesis | Yes | What you believe to be true and want to validate | +| Success criteria | Yes | Measurable conditions that confirm or reject the hypothesis | +| Stack preference | No | `html` (default), `python` (Flask), `node` (Express), or `dotnet` (minimal API) | +| Storage | No | `sqlite` (default) or `files` (JSON/Markdown flat files) | +| Simulation needs | No | What parts of the system should be simulated rather than built | +| LLM provider | No | Endpoint and model for simulation (e.g., `ollama/llama3`). Defaults to no LLM | +| Telemetry level | No | `basic` (page views, clicks, task timing) or `detailed` (basic + custom events, session replay) | + +## Architecture Principles + +### Local-Only, Zero Cloud + +Everything runs on the developer's machine. No cloud accounts, no deployments, +no API keys unless the user explicitly opts into an LLM provider for simulation. + +### Intentionally Rough UI + +Enforced through specific design constraints: + +* System fonts only (`system-ui, sans-serif`). No custom fonts. +* Maximum 2 colors: one neutral (gray), one accent. +* Visible 1px dashed borders on major layout sections. No rounded corners beyond `4px`, no shadows, no gradients. +* Minimum `16px` body text, `44px` touch targets. +* A visible banner on every page: **"⚠ EXPERIMENT — not a real product. [Prototype Name] | Hypothesis: [one-liner]"** + +This is a deliberate Design Thinking technique (Method 7) that prevents stakeholders from giving feedback on visual polish when the goal is behavior validation. + +### Simulation Layers + +Simulated components must be: + +1. **Visibly labeled** in the UI with a `[SIMULATED]` badge. +2. **Documented** in the experiment card with assumptions. +3. **Swappable** via isolated modules in a `sim/` directory. + +See [stack-reference.md](references/stack-reference.md#simulation-approaches) for simulation approaches by need. + +### Telemetry from Day One + +Telemetry is not optional. **Basic** telemetry (page views, clicks, task timing, errors, session UUID) is always included. **Detailed** telemetry (custom events, funnel tracking, rage-click detection, session recording) is opt-in. + +See [stack-reference.md](references/stack-reference.md#telemetry-implementation) for implementation details per stack. + +## Project Structure + +``` +{prototype-name}/ +├── experiment-card.md # Hypothesis, criteria, measurement plan +├── index.html # Entry point (or app.py / server.js / Program.cs) +├── style.css # Rough UI styles (pre-populated with constraints) +├── app.js # Frontend logic and telemetry +├── telemetry.js # Telemetry capture module +├── sim/ # Simulation layer +│ ├── fixtures/ # JSON/CSV mock data +│ └── stubs.js # Stub functions for simulated services +├── data/ # SQLite file or JSON/Markdown data files +│ └── prototype.db # (or *.json files if file storage chosen) +├── telemetry/ # Telemetry output +│ └── events.json # Captured events (append-only) +├── reports/ # Markdown experiment reports +│ └── session-{n}.md # Per-session observation report +└── README.md # Setup, run instructions, and experiment context +``` + +## Workflow + +Follow the six-step workflow to build and evaluate a prototype experiment. +Each step has a checkpoint that must pass before proceeding. + +| Step | Name | Purpose | +|------|----------------------------|-----------------------------------------------------------| +| 1 | Write the Experiment Card | Define hypothesis, success/failure criteria, measurements | +| 2 | Scaffold the Prototype | Generate project structure, styles, telemetry, sim stubs | +| 3 | Build the Core Interaction | Implement the minimum UI that tests the hypothesis | +| 4 | Add Secondary Views | Add supporting views if needed (max 5 total) | +| 5 | Run a Test Session | Execute task script, capture telemetry, write session report| +| 6 | Generate Experiment Report | Aggregate data, evaluate criteria, declare verdict | + +See [workflow.md](references/workflow.md) for detailed step instructions and checkpoints. +See [templates.md](references/templates.md) for experiment card, session report, and summary templates. + +## Validation + +- [ ] `experiment-card.md` exists before any code was written +- [ ] Hypothesis is falsifiable (failure criteria are specific) +- [ ] Prototype runs locally with a single command (no cloud setup) +- [ ] Experiment banner is visible on every page +- [ ] All simulated components are labeled `[SIMULATED]` +- [ ] Telemetry captures events to a local file +- [ ] Rough UI constraints are applied (system fonts, 2 colors, dashed borders) +- [ ] No view exists that does not test the hypothesis +- [ ] Session reports are in Markdown with structured data +- [ ] Experiment summary evaluates each success criterion with evidence + +## Troubleshooting + +| Issue | Cause | Solution | +|----------------------------------------|-------------------------------------|----------------------------------------------------------------------------------------------------| +| Code written before experiment card | Skipped hypothesis definition | Refuse to scaffold until the experiment card is complete | +| UI looks polished | Design constraints not enforced | Enforce rough constraints in `style.css`; remove any shadows, gradients, or custom fonts | +| No telemetry data captured | Telemetry module missing or unwired | Telemetry module is scaffolded in Step 2; verify events fire in Step 3 | +| Feature creep beyond hypothesis | Scope expanded past experiment card | If a feature does not appear in the experiment card, it does not get built | +| Simulated output mistaken for real | Missing simulation labels | Every simulated component gets a `[SIMULATED]` badge; the experiment card catalogs all simulations | +| Hypothesis not testable | No failure criteria defined | Ask "what would convince you this is wrong?"; if unanswerable, the hypothesis needs refinement | +| Conclusions drawn from one session | Insufficient session count | Experiment card defines session count target; do not write the summary until it is reached | +| Prototype kept past experiment | Over-investment in disposable code | Archive or delete prototypes when the experiment concludes | + +## References + +| File | Covers | +|---------------------------------------------------------|------------------------------------------------------------| +| [workflow.md](references/workflow.md) | Six-step workflow with detailed instructions and checkpoints| +| [templates.md](references/templates.md) | Experiment card, session report, and summary templates | +| [stack-reference.md](references/stack-reference.md) | Per-stack setup, simulation approaches, telemetry details | + +> Brought to you by microsoft/hve-core + +*🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.* diff --git a/.github/skills/experimental/hifi-prototype/references/stack-reference.md b/.github/skills/experimental/hifi-prototype/references/stack-reference.md new file mode 100644 index 000000000..0e0d743df --- /dev/null +++ b/.github/skills/experimental/hifi-prototype/references/stack-reference.md @@ -0,0 +1,74 @@ +# Stack Quick-Reference + +Per-stack setup commands, storage options, and telemetry implementation details. + +## HTML/CSS/JS (default, no backend) + +```bash +# Just open it +open index.html +# Or use a simple server for telemetry flush +npx serve . +``` + +Storage: `localStorage` + JSON files in `data/` (manual export). +Telemetry: events buffer in `localStorage`, export to JSON on demand. + +## Python (Flask) + +```bash +pip install flask +python app.py +# → http://localhost:5000 +``` + +Storage: SQLite via `sqlite3` stdlib or JSON files. +Telemetry: OpenTelemetry SDK with `opentelemetry-exporter-otlp` or file export. + +## Node.js (Express) + +```bash +npm install express better-sqlite3 +node server.js +# → http://localhost:3000 +``` + +Storage: SQLite via `better-sqlite3` or JSON files. +Telemetry: `@opentelemetry/sdk-node` with file exporter. + +## .NET (Minimal API) + +```bash +dotnet new web -n {name} +dotnet run +# → http://localhost:5000 +``` + +Storage: SQLite via `Microsoft.Data.Sqlite` or JSON files. +Telemetry: `OpenTelemetry.Extensions.Hosting` with file exporter. + +## Simulation Approaches + +| Need | Approach | +|------------------------|-----------------------------------------------------------------------------------| +| API responses | JSON fixture files in `sim/fixtures/` returned by a mock route | +| Sensor/IoT data | CSV or JSON time-series files replayed at configurable speed | +| AI/ML predictions | LLM call with a system prompt describing expected behavior, or a decision tree | +| User-generated content | Seeded SQLite database or JSON files with realistic sample data | +| External service calls | Stub functions that log the call and return canned responses | + +## Telemetry Implementation + +**Frontend**: a small `telemetry.js` module (~50 lines) that captures events +and writes them to `localStorage`, then flushes to a local JSON file via +a backend endpoint or on page unload. + +**Backend** (if present): OpenTelemetry SDK with a file exporter writing to +`telemetry/traces.json` and `telemetry/events.json`. + +No external services unless the user explicitly requests one of: + +* **Application Insights**: instrument with the JS SDK, connection string in `.env`. +* **OpenTelemetry Collector**: export spans and metrics to a local OTLP endpoint + or a remote collector. Provide a `docker-compose.yml` for a local Jaeger + or Zipkin instance. diff --git a/.github/skills/experimental/hifi-prototype/references/templates.md b/.github/skills/experimental/hifi-prototype/references/templates.md new file mode 100644 index 000000000..beabc641d --- /dev/null +++ b/.github/skills/experimental/hifi-prototype/references/templates.md @@ -0,0 +1,135 @@ +# Experiment Templates + +Templates for experiment cards, session reports, and experiment summaries used +by the hifi-prototype skill. + +## Experiment Card Template + +Every prototype starts with an experiment card. No card, no code. + +```markdown +# Experiment Card: {Prototype Name} + +## Status + +🔬 Active | 📊 Collecting Data | ✅ Concluded | ❌ Invalidated + +## Hypothesis + +{One clear statement of what you believe to be true.} + +## Success Criteria + +| Metric | Target | How Measured | +|--------|--------|-------------| +| {metric} | {target} | {telemetry event or observation} | + +## Failure Criteria + +What evidence would REJECT the hypothesis? Be specific: + +- {condition that disproves the hypothesis} + +## What Is Simulated + +| Component | Real or Simulated | Assumptions | +|-----------|-------------------|-------------| +| {component} | {Real / Simulated} | {what the simulation assumes} | + +## Measurement Plan + +- Telemetry level: {basic / detailed} +- Session count target: {how many sessions before analysis} +- Key events to track: {list specific telemetry events} + +## Risks and Limitations + +- {known risk or limitation of the experiment design} + +## Dates + +- Started: {date} +- Target conclusion: {date} +``` + +## Session Report Template + +Generate a session report in `reports/session-{n}.md` after each test session. + +```markdown +# Session {n} Report + +**Date**: {date} +**Participant**: {role or persona — no PII} +**Duration**: {minutes} + +## Task Completion + +| Task | Completed | Time | Errors | Notes | +|------|-----------|------|--------|-------| +| {task} | Yes/No | {time} | {count} | {observation} | + +## Telemetry Summary + +- Events captured: {count} +- Key events: {summary of notable telemetry} + +## Observations + +- {what the user did, not what they said} +- {confusion points, workarounds, unexpected behavior} + +## Quotes + +- "{anything the user said that reveals intent or frustration}" + +## Preliminary Signal + +Does this session support or weaken the hypothesis? +{brief assessment — not a conclusion from one session} +``` + +## Experiment Summary Template + +After the target number of sessions, produce a summary in `reports/experiment-summary.md`. + +```markdown +# Experiment Summary: {Prototype Name} + +## Hypothesis + +{restated from experiment card} + +## Verdict + +✅ Supported | ⚠️ Weakened | ❌ Invalidated + +## Evidence + +| Criterion | Target | Actual | Verdict | +|-----------|--------|--------|---------| +| {metric} | {target} | {measured} | ✅/⚠️/❌ | + +## Telemetry Findings + +- {aggregated telemetry insights} + +## What We Learned + +- {insight — valuable regardless of hypothesis outcome} + +## What Surprised Us + +- {unexpected behavior or finding} + +## Recommended Next Step + +{iterate / pivot / proceed} — {rationale} + +## Artifacts + +- Experiment card: `experiment-card.md` +- Session reports: `reports/session-*.md` +- Telemetry data: `telemetry/events.json` +- Prototype source: `{entry point}` +``` diff --git a/.github/skills/experimental/hifi-prototype/references/workflow.md b/.github/skills/experimental/hifi-prototype/references/workflow.md new file mode 100644 index 000000000..10d686257 --- /dev/null +++ b/.github/skills/experimental/hifi-prototype/references/workflow.md @@ -0,0 +1,104 @@ +# Prototype Workflow + +Six-step workflow for building and evaluating a high-fidelity prototype experiment. + +## Step 1: Write the Experiment Card + +Before any code is written, create `experiment-card.md`: + +1. Ask the user for their hypothesis in plain language. +2. Ask what success looks like. Push for measurable criteria, not vibes. + If the user says "users should like it," respond: "What would users DO + differently if they liked it? Complete a task faster? Come back again? + That's your metric." +3. Ask what would prove the hypothesis wrong. This is the hardest question + and the most important one. +4. Identify which parts will be real and which will be simulated. +5. Define the measurement plan: what telemetry events, how many sessions. +6. Write the experiment card using the [experiment card template](templates.md#experiment-card-template). + +**Checkpoint**: `experiment-card.md` exists with hypothesis, success criteria, +failure criteria, simulation inventory, and measurement plan. User has confirmed +it reflects their intent. + +## Step 2: Scaffold the Prototype + +1. Create the project directory using the prototype name as a kebab-case slug. +2. Detect or confirm the stack preference: + - Default: plain HTML/CSS/JS (no build step, no framework, open `index.html`). + - If backend needed: ask `python`, `node`, or `dotnet` and scaffold minimal server. +3. Generate `style.css` with the rough UI constraints pre-applied. +4. Generate `telemetry.js` with the appropriate telemetry level. +5. Create the `sim/` directory with fixture templates matching the simulation + inventory from the experiment card. +6. Create the `data/` directory with an empty SQLite database or starter JSON + files. +7. Generate `README.md` with setup and run instructions. + +**Checkpoint**: Project runs locally with `open index.html` or a single terminal +command. The experiment banner is visible. Telemetry is capturing events to a +local file. + +## Step 3: Build the Core Interaction + +Focus exclusively on the interaction that tests the hypothesis. Do not build +features that do not directly contribute to validating or invalidating the +hypothesis. + +1. Identify the core user task from the experiment card. +2. Build the minimum UI to support that task: + - Form inputs, buttons, display areas, and nothing more than needed. + - Wire up simulation stubs for any components marked as simulated. + - Connect to SQLite or file storage for any state that must persist. +3. Add telemetry events for: + - Task start and completion. + - Each meaningful interaction point. + - Error cases. +4. If an LLM is used for simulation, implement the call with: + - A system prompt in `sim/prompts/` describing expected behavior. + - Visible `[SIMULATED]` badge on any LLM-generated output. + - Fallback to a canned response if the LLM is unavailable. + +**Checkpoint**: A user can walk through the core task end-to-end. Simulated +parts are visibly labeled. Telemetry events fire correctly (verify by checking +the telemetry output file). + +## Step 4: Add Secondary Views + +If the hypothesis requires context beyond the core interaction: + +1. Add navigation between views (plain `` links or minimal routing). +2. Build supporting views: dashboards, lists, detail pages, all rough. +3. Populate with realistic sample data from `sim/fixtures/`. +4. Do not exceed 5 total views. If you need more, your hypothesis is too broad. + +**Checkpoint**: All views needed to test the hypothesis are navigable. No view +exists that does not directly support the experiment. + +## Step 5: Run a Test Session + +Guide the user through running a test session: + +1. Open the prototype in a browser. +2. Follow a task script derived from the experiment card's success criteria. +3. After the session, generate a Markdown report in `reports/session-{n}.md` + using the [session report template](templates.md#session-report-template). + +**Checkpoint**: Session report exists. Telemetry data is captured and parseable. + +## Step 6: Generate the Experiment Report + +After the target number of sessions, produce a summary report: + +1. Aggregate telemetry data across all sessions. +2. Evaluate each success criterion against collected evidence. +3. Declare the hypothesis supported, weakened, or invalidated. +4. Document what was learned regardless of outcome. +5. Recommend next steps: iterate (refine and retest), pivot (new hypothesis), + or proceed (move toward production). + +Write to `reports/experiment-summary.md` using the +[experiment summary template](templates.md#experiment-summary-template). + +**Checkpoint**: Experiment summary exists with evidence-backed verdict. Team +has a clear next step.