microsoft · TechPreacher · Apr 29, 2026 · May 4, 2026
@@ -70,8 +70,11 @@
     "collab",
     "easyops",
     "figjam",
+    "graphify",
+    "graphifyy",
     "hideable",
     "learning",
+    "safishamsi",
     "smol",
     "subcat",
     "whiteboarding",

@@ -0,0 +1,84 @@
+---
+name: Graph Researcher
+description: "Answers structural questions about a codebase by querying a graphify-built knowledge graph through MCP tools, returning evidence-tagged findings"
+---
+
+# Graph Researcher
+
+Specialized researcher that answers questions about a codebase using a pre-built [graphify](../../skills/experimental/graphify/SKILL.md) knowledge graph. Use when the user asks structural questions ("what depends on X", "what cluster is Y in", "what connects A and B") that grep cannot answer cleanly.
+
+This agent does not build the graph. It assumes `graphify-out/graph.json` exists in the workspace and the `graphify` MCP server is registered in `.vscode/mcp.json`. If either is missing, surface the gap to the user with the exact remediation step from the skill.
+
+Read and follow the conventions in [graphify.instructions.md](../../instructions/experimental/graphify.instructions.md) for working-directory layout, audit-tag reporting, and confidence-score handling.
+
+## Required Phases
+
+### Phase 1: Verify the graph is available
+
+Before answering any question:
+
+1. Confirm `graphify-out/graph.json` exists at the workspace root.
+2. Confirm at least one `mcp_graphify_*` tool is available in the current Copilot Chat session.
+3. If either check fails, stop and report exactly one of:
+   * "No `graphify-out/graph.json` found. Build the graph first: `graphify . --mode standard --update`."
+   * "Graphify MCP server not registered. Add the snippet from the [graphify skill Quick Start](../../skills/experimental/graphify/SKILL.md#quick-start) to `.vscode/mcp.json` and reload the window."
+
+Do not proceed with speculative answers when the graph is unavailable.
+
+### Phase 2: Pick the right tool for the question
+
+Map the user's question to the smallest sufficient MCP tool:
+
+| Question shape                           | Tool                         | Why                                                         |
+|------------------------------------------|------------------------------|-------------------------------------------------------------|
+| "What is X?" / "Tell me about X"         | `mcp_graphify_get_node`      | Direct node fetch with metadata                             |
+| "What does X depend on / call / import?" | `mcp_graphify_get_neighbors` | Edge-typed neighbour lookup                                 |
+| "What connects A and B?"                 | `mcp_graphify_shortest_path` | Returns path nodes + edge types                             |
+| "What are the central pieces here?"      | `mcp_graphify_god_nodes`     | High-centrality top-N                                       |
+| "What clusters / themes exist?"          | `mcp_graphify_graph_stats`   | Communities, density, clustering coefficient                |
+| "What community contains X?"             | `mcp_graphify_get_community` | Returns the cluster X belongs to                            |
+| Open-ended exploration                   | `mcp_graphify_query_graph`   | Use last; expensive and less deterministic than typed tools |
+
+Prefer typed tools over `query_graph` when the question fits a typed shape. Reserve `query_graph` for genuine exploration.
+
+### Phase 3: Report findings with audit tags
+
+Every answer must:
+
+1. Name the MCP tool(s) used and the node IDs touched.
+2. Tag each load-bearing edge in the answer with its audit tag (`EXTRACTED`, `INFERRED`, `AMBIGUOUS`) and confidence score where present.
+3. Distinguish "the graph says" from "I conclude". The graph is evidence, not an oracle.
+4. Surface `AMBIGUOUS` edges to the user as open questions, not facts.
+5. When the graph contradicts the user's stated assumption, say so directly.
+
+Example reporting shape:
+
+```text
+Tool: mcp_graphify_shortest_path(from="auth_middleware.py", to="legacy_session_store")
+Path: auth_middleware.py -> session_manager.py -> legacy_session_store
+Edge tags: EXTRACTED, INFERRED (confidence 0.71)
+Conclusion: There is a 2-hop dependency, but the second hop is INFERRED — the
+LLM saw a likely reference. Verify by reading session_manager.py.
+```
+
+### Phase 4: Suggest the next read
+
+End every non-trivial answer with one suggested file or symbol the user should read next, picked from the graph result. This keeps the conversation grounded in the source rather than the graph.
+
+## Required Protocol
+
+1. Never invent edges or nodes. If a question cannot be answered from `mcp_graphify_*` tool output, say so and suggest a graph rebuild scope (e.g., "the docs aren't in this graph; rebuild with the `docs/` folder included").
+2. Never trigger a graph rebuild yourself. Builds are user-initiated because they have cost and time implications.
+3. Never claim a path or relationship without naming the MCP tool call that produced it.
+4. When the user asks a question that grep would answer faster (e.g., "where is the string 'TODO'?"), say so and decline gracefully — this agent is for structural questions.
+5. When `mcp_graphify_graph_stats` shows the graph has more than ~30% `INFERRED` or `AMBIGUOUS` edges, warn the user that conclusions are tentative and suggest re-running with `--mode deep`.
+
+## Out of Scope
+
+* Building or rebuilding the graph (use the [graphify skill](../../skills/experimental/graphify/SKILL.md) Quick Start directly).
+* Editing source files in response to graph findings (use a separate implementor agent).
+* Semantic code review (use a code-review agent — graph centrality is not the same as code quality).
+
+> Brought to you by microsoft/hve-core
+
+*🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.*
@@ -0,0 +1,68 @@
+---
+description: "Conventions for working with graphify-out/ directories and graph-derived evidence"
+applyTo: '**/graphify-out/**'
+---
+
+# Graphify Instructions
+
+Conventions that apply whenever Copilot reads, writes, or references files under any `graphify-out/` directory. These instructions govern the [graphify skill](../../skills/experimental/graphify/) and the [Graph Researcher agent](../../agents/experimental/graph-researcher.agent.md).
+
+## Working Directory
+
+A `graphify-out/` directory is generated build output. It contains:
+
+```text
+graphify-out/
+├── graph.json          # Canonical graph data — read-only for agents
+├── graph.html          # Interactive visualization
+├── GRAPH_REPORT.md     # God nodes, surprising connections, suggested questions
+├── wiki/               # Per-community markdown articles
+└── cache/              # SHA256 incremental cache (do not edit)
+```
+
+Rules:
+
+* Treat every file under `graphify-out/` as build output. Do not edit by hand.
+* Add `graphify-out/` to the target repository's `.gitignore` before the first build.
+* When reading `graph.json`, prefer MCP queries over direct JSON parsing. The MCP server applies confidence filtering and edge typing that raw JSON does not.
+
+## Audit-Tag Reporting
+
+Every edge in a graphify graph carries an audit tag: `EXTRACTED`, `INFERRED`, or `AMBIGUOUS`. When summarizing graph findings:
+
+| Tag         | How to report                                                               |
+|-------------|-----------------------------------------------------------------------------|
+| `EXTRACTED` | State as fact: "X depends on Y."                                            |
+| `INFERRED`  | Hedge with the confidence score: "X likely depends on Y (confidence 0.74)." |
+| `AMBIGUOUS` | Surface as a question, not a claim: "It is unclear whether X depends on Y." |
+
+Never collapse multiple audit tags into a single sentence without distinguishing them. A path through the graph that contains both `EXTRACTED` and `INFERRED` edges is an `INFERRED` path overall — the chain is only as strong as its weakest edge.
+
+## Reading GRAPH_REPORT.md
+
+`GRAPH_REPORT.md` is a generated summary. When the user asks an open-ended exploration question ("what's interesting in this codebase?"), prefer reading `GRAPH_REPORT.md` over running multiple MCP queries — the report already contains god-node, surprising-connection, and suggested-question sections that are cheaper to read than to recompute.
+
+If `GRAPH_REPORT.md` is older than the most recent commit on the default branch, recommend a `graphify . --update` rebuild before relying on it.
+
+## Cost Discipline
+
+The deep-mode rebuild path issues many parallel Claude API calls. Agents must not trigger rebuilds autonomously. When a user's question would benefit from a fresher graph, surface the recommendation and the approximate cost-shape ("roughly N files changed since last build, expect a partial rebuild") and let the user decide.
+
+## Privacy and Upload Discipline
+
+Graphify's deep-extraction stage uploads file *contents* to the Claude API. Before recommending a deep rebuild, check:
+
+* Does the target tree contain secrets, credentials, or `.env` files that are not gitignored?
+* Does the tree contain content under upload restrictions (customer data, regulated material)?
+
+If either is true, recommend `--mode fast` (no LLM, AST-only) instead, and note the reduced fidelity in the conversation.
+
+## Out of Scope
+
+These instructions do not cover:
+
+* How to install or configure `graphifyy` — see the [skill](../../skills/experimental/graphify/SKILL.md).
+* How to register the MCP server with Copilot Chat — see the skill Quick Start.
+* General code-review or refactor practices — graph centrality is not a code-quality signal.
+
+*🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.*
@@ -0,0 +1,149 @@
+---
+name: graphify
+description: 'Build and query knowledge graphs over a codebase using the graphifyy CLI and MCP server - Brought to you by microsoft/hve-core'
+license: MIT
+compatibility: 'Requires Python 3.10+, the graphifyy PyPI package (pinned), and an ANTHROPIC_API_KEY environment variable for the semantic-extraction stage. GitHub Copilot Chat uses the graphify MCP server for queries.'
+metadata:
+  authors: "microsoft/hve-core"
+  spec_version: "1.0"
+  last_updated: "2026-04-29"
+---
+
+# Graphify Skill
+
+Use this skill to build a knowledge graph over a folder of source code, documentation, PDFs, and images, and to query that graph from GitHub Copilot Chat. The graph surfaces structural relationships, high-centrality nodes, and clusters that are not visible from grep alone.
+
+This skill wraps the upstream [`graphifyy`](https://pypi.org/project/graphifyy/) PyPI package — it does not reimplement Graphify. Pin a single version of `graphifyy` so behaviour stays stable as the upstream project iterates.
+
+## Third-Party Attribution
+
+Graphify is an MIT-licensed project by Safi Shamsi. See <https://github.com/safishamsi/graphify>. This skill orchestrates the upstream CLI and MCP server; no upstream source is vendored.
+
+## Prerequisites
+
+| Requirement         | Notes                                                                                                    |
+|---------------------|----------------------------------------------------------------------------------------------------------|
+| Python 3.10+        | Match the upstream `graphifyy` minimum                                                                   |
+| `graphifyy`         | Install with `pip install graphifyy==0.5.4`. The CLI binary is `graphify` (single `y`)                   |
+| `ANTHROPIC_API_KEY` | Required for deep-mode semantic extraction. Each build issues parallel Claude calls — budget accordingly |
+| MCP-capable client  | GitHub Copilot Chat in VS Code 1.97+ reads `.vscode/mcp.json` and surfaces tools as `mcp_graphify_*`     |
+
+Optional extras:
+
+| Extra              | Purpose                                                            |
+|--------------------|--------------------------------------------------------------------|
+| `graphifyy[video]` | Adds yt-dlp + Whisper for transcribing audio/video sources         |
+| Neo4j driver       | Required only if pushing the graph to a Neo4j instance for queries |
+| `obsidian` extras  | Required only when exporting an Obsidian vault                     |
+
+## Quick Start
+
+### 1. Build a graph
+
+```bash
+graphify ./path/to/repo --mode deep --update
+```
+
+This writes outputs into `./path/to/repo/graphify-out/`:
+
+| File              | Purpose                                                                  |
+|-------------------|--------------------------------------------------------------------------|
+| `graph.json`      | Canonical graph data (nodes, edges, communities, audit tags)             |
+| `graph.html`      | Interactive vis.js visualization                                         |
+| `GRAPH_REPORT.md` | God nodes, surprising connections, suggested questions, token-cost table |
+| `wiki/`           | One markdown article per community (agent-crawlable)                     |
+| `cache/`          | SHA256 incremental cache for `--update` and `--watch`                    |
+
+The `graphify-out/` directory **must be gitignored** in target repositories. See [graphify.instructions.md](../../../instructions/experimental/graphify.instructions.md) for the canonical pattern.
+
+### 2. Register the MCP server with Copilot Chat
+
+Add `graphify` to the workspace's `.vscode/mcp.json`:
+
+```json
+{
+  "servers": {
+    "graphify": {
+      "command": "python3",
+      "args": ["-m", "graphify.serve", "graphify-out/graph.json"],
+      "type": "stdio"
+    }
+  }
+}
+```
+
+Reload the VS Code window. Copilot Chat surfaces these tools (names follow GHCC's `mcp_<server>_<tool>` convention):
+
+| Tool                         | Purpose                                                       |
+|------------------------------|---------------------------------------------------------------|
+| `mcp_graphify_query_graph`   | Free-form natural-language query against graph + communities  |
+| `mcp_graphify_get_node`      | Fetch a node by ID with metadata                              |
+| `mcp_graphify_get_neighbors` | Direct neighbours of a node, optionally filtered by edge type |
+| `mcp_graphify_get_community` | All nodes in a community (cluster)                            |
+| `mcp_graphify_god_nodes`     | High-centrality nodes (top connectors)                        |
+| `mcp_graphify_graph_stats`   | Counts, density, clustering coefficient                       |
+| `mcp_graphify_shortest_path` | Shortest path between two nodes                               |
+
+### 3. Ask Copilot Chat structural questions
+
+Once the MCP server is registered, the `@graph-researcher` agent (this collection) can answer questions like:
+
+* "What other modules are implicitly affected if I change `auth_middleware.py`?"
+* "Which agents in `.github/agents/` are most connected to security artifacts?"
+* "Show me the shortest path between `feature_x` and `legacy_config_y`."
+* "What communities exist in this repo, and which one is the auth code in?"
+
+## Build Modes
+
+| Mode               | Flag          | When to use                                                                                    |
+|--------------------|---------------|------------------------------------------------------------------------------------------------|
+| Fast               | `--mode fast` | AST/tree-sitter only. Deterministic, no LLM calls, no API key required. Use for CI smoke tests |
+| Standard (default) | (no flag)     | AST + selective semantic extraction. Reasonable cost, good coverage                            |
+| Deep               | `--mode deep` | Full parallel Claude semantic extraction. Highest fidelity, highest cost                       |
+| Update             | `--update`    | Reuses the SHA256 cache; rebuilds only changed files. Safe to combine with any mode            |
+| Watch              | `--watch`     | Daemon mode; rebuilds on file change                                                           |
+
+For HVE Core's primary use case (analysing the artifact library itself), prefer `--mode standard --update`.
+
+## Edge Audit Tags
+
+Every edge in `graph.json` carries one of:
+
+| Tag         | Meaning                                                                          |
+|-------------|----------------------------------------------------------------------------------|
+| `EXTRACTED` | Derived deterministically from AST/tree-sitter — high confidence                 |
+| `INFERRED`  | Derived from LLM semantic extraction — medium confidence, has `confidence_score` |
+| `AMBIGUOUS` | Multiple candidate interpretations — low confidence, surface to user             |
+
+When the `@graph-researcher` agent answers a question, it must report the audit tag of the edges its conclusion rests on. Do not collapse `INFERRED` and `EXTRACTED` evidence in summaries.
+
+## Cost and Safety Notes
+
+* Deep-mode builds dispatch many parallel Claude calls. A first build over a 10k-file repo can run several USD; budget before enabling.
+* Graphify uploads file *contents* to the Claude API during semantic extraction. Do not run deep mode against repositories containing secrets or content under upload restrictions. Use `--mode fast` (no LLM) for sensitive trees.
+* The `cache/` directory under `graphify-out/` contains hashed file content snapshots. Treat it like build output — gitignore it.
+* The MCP server reads `graph.json` from disk and exposes it over stdio. Do not commit `graph.json` to repos with private content.
+
+## Troubleshooting
+
+| Symptom                                          | Cause                                                     | Resolution                                                                  |
+|--------------------------------------------------|-----------------------------------------------------------|-----------------------------------------------------------------------------|
+| `graphify: command not found`                    | Wrong package name installed                              | The PyPI distribution name is `graphifyy` (double y); the CLI is `graphify` |
+| `ANTHROPIC_API_KEY is not set`                   | Deep mode invoked without API credentials                 | Export the key, or downgrade to `--mode fast`                               |
+| `graphify-out/graph.json not found`              | MCP server started before first build                     | Run `graphify <path>` once before reloading the VS Code window              |
+| MCP tools not visible in Copilot Chat            | `.vscode/mcp.json` missing or VS Code not reloaded        | Confirm file path, then `Developer: Reload Window`                          |
+| Graph contains no edges                          | Repository contains only file types Graphify cannot parse | Verify with `graphify <path> --dry-run` to see detected file types          |
+| Stale results after edits                        | Cache hit on changed files                                | Run with `--update` (recommended) or delete `graphify-out/cache/`           |
+| Edge `INFERRED` confidence is low for many edges | Deep extraction over an unfamiliar codebase               | Increase `--mode deep` budget or treat low-confidence edges as hypotheses   |
+
+## Version Pinning Policy
+
+The upstream `graphifyy` project is on default branch `v5` with frequent releases. This skill pins to a specific version. Bumps to the pinned version follow the standard `feat(skills)` / `fix(skills)` commit flow and require:
+
+1. A re-run of the skill's regression tests in [tests/](tests/).
+2. A diff review of the upstream `CHANGELOG` for breaking tool-name or output-shape changes that would invalidate `graph-researcher` agent assumptions.
+3. A note in the version-bump commit body referencing the upstream tag.
+
+> Brought to you by microsoft/hve-core
+
+*🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.*