-
Notifications
You must be signed in to change notification settings - Fork 172
Add graphify skill, Graph Researcher agent, and graphify instructions. #1518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
TechPreacher
wants to merge
2
commits into
microsoft:main
Choose a base branch
from
TechPreacher:feat/add-graphify
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| --- | ||
| name: Graph Researcher | ||
| description: "Answers structural questions about a codebase by querying a graphify-built knowledge graph through MCP tools, returning evidence-tagged findings" | ||
| --- | ||
|
|
||
| # Graph Researcher | ||
|
|
||
| Specialized researcher that answers questions about a codebase using a pre-built [graphify](../../skills/experimental/graphify/SKILL.md) knowledge graph. Use when the user asks structural questions ("what depends on X", "what cluster is Y in", "what connects A and B") that grep cannot answer cleanly. | ||
|
|
||
| This agent does not build the graph. It assumes `graphify-out/graph.json` exists in the workspace and the `graphify` MCP server is registered in `.vscode/mcp.json`. If either is missing, surface the gap to the user with the exact remediation step from the skill. | ||
|
|
||
| Read and follow the conventions in [graphify.instructions.md](../../instructions/experimental/graphify.instructions.md) for working-directory layout, audit-tag reporting, and confidence-score handling. | ||
|
|
||
| ## Required Phases | ||
|
|
||
| ### Phase 1: Verify the graph is available | ||
|
|
||
| Before answering any question: | ||
|
|
||
| 1. Confirm `graphify-out/graph.json` exists at the workspace root. | ||
| 2. Confirm at least one `mcp_graphify_*` tool is available in the current Copilot Chat session. | ||
| 3. If either check fails, stop and report exactly one of: | ||
| * "No `graphify-out/graph.json` found. Build the graph first: `graphify . --mode standard --update`." | ||
| * "Graphify MCP server not registered. Add the snippet from the [graphify skill Quick Start](../../skills/experimental/graphify/SKILL.md#quick-start) to `.vscode/mcp.json` and reload the window." | ||
|
|
||
| Do not proceed with speculative answers when the graph is unavailable. | ||
|
|
||
| ### Phase 2: Pick the right tool for the question | ||
|
|
||
| Map the user's question to the smallest sufficient MCP tool: | ||
|
|
||
| | Question shape | Tool | Why | | ||
| |------------------------------------------|------------------------------|-------------------------------------------------------------| | ||
| | "What is X?" / "Tell me about X" | `mcp_graphify_get_node` | Direct node fetch with metadata | | ||
| | "What does X depend on / call / import?" | `mcp_graphify_get_neighbors` | Edge-typed neighbour lookup | | ||
| | "What connects A and B?" | `mcp_graphify_shortest_path` | Returns path nodes + edge types | | ||
| | "What are the central pieces here?" | `mcp_graphify_god_nodes` | High-centrality top-N | | ||
| | "What clusters / themes exist?" | `mcp_graphify_graph_stats` | Communities, density, clustering coefficient | | ||
| | "What community contains X?" | `mcp_graphify_get_community` | Returns the cluster X belongs to | | ||
| | Open-ended exploration | `mcp_graphify_query_graph` | Use last; expensive and less deterministic than typed tools | | ||
|
|
||
| Prefer typed tools over `query_graph` when the question fits a typed shape. Reserve `query_graph` for genuine exploration. | ||
|
|
||
| ### Phase 3: Report findings with audit tags | ||
|
|
||
| Every answer must: | ||
|
|
||
| 1. Name the MCP tool(s) used and the node IDs touched. | ||
| 2. Tag each load-bearing edge in the answer with its audit tag (`EXTRACTED`, `INFERRED`, `AMBIGUOUS`) and confidence score where present. | ||
| 3. Distinguish "the graph says" from "I conclude". The graph is evidence, not an oracle. | ||
| 4. Surface `AMBIGUOUS` edges to the user as open questions, not facts. | ||
| 5. When the graph contradicts the user's stated assumption, say so directly. | ||
|
|
||
| Example reporting shape: | ||
|
|
||
| ```text | ||
| Tool: mcp_graphify_shortest_path(from="auth_middleware.py", to="legacy_session_store") | ||
| Path: auth_middleware.py -> session_manager.py -> legacy_session_store | ||
| Edge tags: EXTRACTED, INFERRED (confidence 0.71) | ||
| Conclusion: There is a 2-hop dependency, but the second hop is INFERRED — the | ||
| LLM saw a likely reference. Verify by reading session_manager.py. | ||
| ``` | ||
|
|
||
| ### Phase 4: Suggest the next read | ||
|
|
||
| End every non-trivial answer with one suggested file or symbol the user should read next, picked from the graph result. This keeps the conversation grounded in the source rather than the graph. | ||
|
|
||
| ## Required Protocol | ||
|
|
||
| 1. Never invent edges or nodes. If a question cannot be answered from `mcp_graphify_*` tool output, say so and suggest a graph rebuild scope (e.g., "the docs aren't in this graph; rebuild with the `docs/` folder included"). | ||
| 2. Never trigger a graph rebuild yourself. Builds are user-initiated because they have cost and time implications. | ||
| 3. Never claim a path or relationship without naming the MCP tool call that produced it. | ||
| 4. When the user asks a question that grep would answer faster (e.g., "where is the string 'TODO'?"), say so and decline gracefully — this agent is for structural questions. | ||
| 5. When `mcp_graphify_graph_stats` shows the graph has more than ~30% `INFERRED` or `AMBIGUOUS` edges, warn the user that conclusions are tentative and suggest re-running with `--mode deep`. | ||
|
|
||
| ## Out of Scope | ||
|
|
||
| * Building or rebuilding the graph (use the [graphify skill](../../skills/experimental/graphify/SKILL.md) Quick Start directly). | ||
| * Editing source files in response to graph findings (use a separate implementor agent). | ||
| * Semantic code review (use a code-review agent — graph centrality is not the same as code quality). | ||
|
|
||
| > Brought to you by microsoft/hve-core | ||
|
|
||
| *🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.* | ||
68 changes: 68 additions & 0 deletions
68
.github/instructions/experimental/graphify.instructions.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| --- | ||
| description: "Conventions for working with graphify-out/ directories and graph-derived evidence" | ||
| applyTo: '**/graphify-out/**' | ||
| --- | ||
|
|
||
| # Graphify Instructions | ||
|
|
||
| Conventions that apply whenever Copilot reads, writes, or references files under any `graphify-out/` directory. These instructions govern the [graphify skill](../../skills/experimental/graphify/) and the [Graph Researcher agent](../../agents/experimental/graph-researcher.agent.md). | ||
|
|
||
| ## Working Directory | ||
|
|
||
| A `graphify-out/` directory is generated build output. It contains: | ||
|
|
||
| ```text | ||
| graphify-out/ | ||
| ├── graph.json # Canonical graph data — read-only for agents | ||
| ├── graph.html # Interactive visualization | ||
|
TechPreacher marked this conversation as resolved.
|
||
| ├── GRAPH_REPORT.md # God nodes, surprising connections, suggested questions | ||
| ├── wiki/ # Per-community markdown articles | ||
| └── cache/ # SHA256 incremental cache (do not edit) | ||
| ``` | ||
|
|
||
| Rules: | ||
|
|
||
| * Treat every file under `graphify-out/` as build output. Do not edit by hand. | ||
| * Add `graphify-out/` to the target repository's `.gitignore` before the first build. | ||
| * When reading `graph.json`, prefer MCP queries over direct JSON parsing. The MCP server applies confidence filtering and edge typing that raw JSON does not. | ||
|
|
||
| ## Audit-Tag Reporting | ||
|
|
||
| Every edge in a graphify graph carries an audit tag: `EXTRACTED`, `INFERRED`, or `AMBIGUOUS`. When summarizing graph findings: | ||
|
|
||
| | Tag | How to report | | ||
| |-------------|-----------------------------------------------------------------------------| | ||
| | `EXTRACTED` | State as fact: "X depends on Y." | | ||
| | `INFERRED` | Hedge with the confidence score: "X likely depends on Y (confidence 0.74)." | | ||
| | `AMBIGUOUS` | Surface as a question, not a claim: "It is unclear whether X depends on Y." | | ||
|
|
||
| Never collapse multiple audit tags into a single sentence without distinguishing them. A path through the graph that contains both `EXTRACTED` and `INFERRED` edges is an `INFERRED` path overall — the chain is only as strong as its weakest edge. | ||
|
|
||
| ## Reading GRAPH_REPORT.md | ||
|
|
||
| `GRAPH_REPORT.md` is a generated summary. When the user asks an open-ended exploration question ("what's interesting in this codebase?"), prefer reading `GRAPH_REPORT.md` over running multiple MCP queries — the report already contains god-node, surprising-connection, and suggested-question sections that are cheaper to read than to recompute. | ||
|
|
||
| If `GRAPH_REPORT.md` is older than the most recent commit on the default branch, recommend a `graphify . --update` rebuild before relying on it. | ||
|
|
||
| ## Cost Discipline | ||
|
|
||
| The deep-mode rebuild path issues many parallel Claude API calls. Agents must not trigger rebuilds autonomously. When a user's question would benefit from a fresher graph, surface the recommendation and the approximate cost-shape ("roughly N files changed since last build, expect a partial rebuild") and let the user decide. | ||
|
|
||
| ## Privacy and Upload Discipline | ||
|
|
||
| Graphify's deep-extraction stage uploads file *contents* to the Claude API. Before recommending a deep rebuild, check: | ||
|
|
||
| * Does the target tree contain secrets, credentials, or `.env` files that are not gitignored? | ||
| * Does the tree contain content under upload restrictions (customer data, regulated material)? | ||
|
|
||
| If either is true, recommend `--mode fast` (no LLM, AST-only) instead, and note the reduced fidelity in the conversation. | ||
|
|
||
| ## Out of Scope | ||
|
|
||
| These instructions do not cover: | ||
|
|
||
| * How to install or configure `graphifyy` — see the [skill](../../skills/experimental/graphify/SKILL.md). | ||
| * How to register the MCP server with Copilot Chat — see the skill Quick Start. | ||
| * General code-review or refactor practices — graph centrality is not a code-quality signal. | ||
|
|
||
| *🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.* | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,149 @@ | ||
| --- | ||
| name: graphify | ||
| description: 'Build and query knowledge graphs over a codebase using the graphifyy CLI and MCP server - Brought to you by microsoft/hve-core' | ||
| license: MIT | ||
| compatibility: 'Requires Python 3.10+, the graphifyy PyPI package (pinned), and an ANTHROPIC_API_KEY environment variable for the semantic-extraction stage. GitHub Copilot Chat uses the graphify MCP server for queries.' | ||
| metadata: | ||
| authors: "microsoft/hve-core" | ||
| spec_version: "1.0" | ||
| last_updated: "2026-04-29" | ||
| --- | ||
|
|
||
| # Graphify Skill | ||
|
|
||
| Use this skill to build a knowledge graph over a folder of source code, documentation, PDFs, and images, and to query that graph from GitHub Copilot Chat. The graph surfaces structural relationships, high-centrality nodes, and clusters that are not visible from grep alone. | ||
|
|
||
| This skill wraps the upstream [`graphifyy`](https://pypi.org/project/graphifyy/) PyPI package — it does not reimplement Graphify. Pin a single version of `graphifyy` so behaviour stays stable as the upstream project iterates. | ||
|
|
||
|
TechPreacher marked this conversation as resolved.
|
||
| ## Third-Party Attribution | ||
|
|
||
| Graphify is an MIT-licensed project by Safi Shamsi. See <https://github.com/safishamsi/graphify>. This skill orchestrates the upstream CLI and MCP server; no upstream source is vendored. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| | Requirement | Notes | | ||
| |---------------------|----------------------------------------------------------------------------------------------------------| | ||
| | Python 3.10+ | Match the upstream `graphifyy` minimum | | ||
| | `graphifyy` | Install with `pip install graphifyy==0.5.4`. The CLI binary is `graphify` (single `y`) | | ||
| | `ANTHROPIC_API_KEY` | Required for deep-mode semantic extraction. Each build issues parallel Claude calls — budget accordingly | | ||
| | MCP-capable client | GitHub Copilot Chat in VS Code 1.97+ reads `.vscode/mcp.json` and surfaces tools as `mcp_graphify_*` | | ||
|
|
||
| Optional extras: | ||
|
|
||
| | Extra | Purpose | | ||
| |--------------------|--------------------------------------------------------------------| | ||
| | `graphifyy[video]` | Adds yt-dlp + Whisper for transcribing audio/video sources | | ||
| | Neo4j driver | Required only if pushing the graph to a Neo4j instance for queries | | ||
| | `obsidian` extras | Required only when exporting an Obsidian vault | | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ### 1. Build a graph | ||
|
|
||
| ```bash | ||
| graphify ./path/to/repo --mode deep --update | ||
| ``` | ||
|
|
||
| This writes outputs into `./path/to/repo/graphify-out/`: | ||
|
|
||
| | File | Purpose | | ||
| |-------------------|--------------------------------------------------------------------------| | ||
| | `graph.json` | Canonical graph data (nodes, edges, communities, audit tags) | | ||
| | `graph.html` | Interactive vis.js visualization | | ||
| | `GRAPH_REPORT.md` | God nodes, surprising connections, suggested questions, token-cost table | | ||
| | `wiki/` | One markdown article per community (agent-crawlable) | | ||
| | `cache/` | SHA256 incremental cache for `--update` and `--watch` | | ||
|
|
||
| The `graphify-out/` directory **must be gitignored** in target repositories. See [graphify.instructions.md](../../../instructions/experimental/graphify.instructions.md) for the canonical pattern. | ||
|
|
||
| ### 2. Register the MCP server with Copilot Chat | ||
|
|
||
| Add `graphify` to the workspace's `.vscode/mcp.json`: | ||
|
|
||
| ```json | ||
| { | ||
| "servers": { | ||
| "graphify": { | ||
| "command": "python3", | ||
| "args": ["-m", "graphify.serve", "graphify-out/graph.json"], | ||
| "type": "stdio" | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Reload the VS Code window. Copilot Chat surfaces these tools (names follow GHCC's `mcp_<server>_<tool>` convention): | ||
|
|
||
| | Tool | Purpose | | ||
| |------------------------------|---------------------------------------------------------------| | ||
| | `mcp_graphify_query_graph` | Free-form natural-language query against graph + communities | | ||
| | `mcp_graphify_get_node` | Fetch a node by ID with metadata | | ||
| | `mcp_graphify_get_neighbors` | Direct neighbours of a node, optionally filtered by edge type | | ||
| | `mcp_graphify_get_community` | All nodes in a community (cluster) | | ||
| | `mcp_graphify_god_nodes` | High-centrality nodes (top connectors) | | ||
| | `mcp_graphify_graph_stats` | Counts, density, clustering coefficient | | ||
| | `mcp_graphify_shortest_path` | Shortest path between two nodes | | ||
|
|
||
| ### 3. Ask Copilot Chat structural questions | ||
|
|
||
| Once the MCP server is registered, the `@graph-researcher` agent (this collection) can answer questions like: | ||
|
|
||
| * "What other modules are implicitly affected if I change `auth_middleware.py`?" | ||
| * "Which agents in `.github/agents/` are most connected to security artifacts?" | ||
| * "Show me the shortest path between `feature_x` and `legacy_config_y`." | ||
| * "What communities exist in this repo, and which one is the auth code in?" | ||
|
|
||
| ## Build Modes | ||
|
|
||
| | Mode | Flag | When to use | | ||
| |--------------------|---------------|------------------------------------------------------------------------------------------------| | ||
| | Fast | `--mode fast` | AST/tree-sitter only. Deterministic, no LLM calls, no API key required. Use for CI smoke tests | | ||
| | Standard (default) | (no flag) | AST + selective semantic extraction. Reasonable cost, good coverage | | ||
| | Deep | `--mode deep` | Full parallel Claude semantic extraction. Highest fidelity, highest cost | | ||
| | Update | `--update` | Reuses the SHA256 cache; rebuilds only changed files. Safe to combine with any mode | | ||
| | Watch | `--watch` | Daemon mode; rebuilds on file change | | ||
|
|
||
| For HVE Core's primary use case (analysing the artifact library itself), prefer `--mode standard --update`. | ||
|
|
||
| ## Edge Audit Tags | ||
|
|
||
| Every edge in `graph.json` carries one of: | ||
|
|
||
| | Tag | Meaning | | ||
| |-------------|----------------------------------------------------------------------------------| | ||
| | `EXTRACTED` | Derived deterministically from AST/tree-sitter — high confidence | | ||
| | `INFERRED` | Derived from LLM semantic extraction — medium confidence, has `confidence_score` | | ||
| | `AMBIGUOUS` | Multiple candidate interpretations — low confidence, surface to user | | ||
|
|
||
| When the `@graph-researcher` agent answers a question, it must report the audit tag of the edges its conclusion rests on. Do not collapse `INFERRED` and `EXTRACTED` evidence in summaries. | ||
|
|
||
| ## Cost and Safety Notes | ||
|
|
||
| * Deep-mode builds dispatch many parallel Claude calls. A first build over a 10k-file repo can run several USD; budget before enabling. | ||
| * Graphify uploads file *contents* to the Claude API during semantic extraction. Do not run deep mode against repositories containing secrets or content under upload restrictions. Use `--mode fast` (no LLM) for sensitive trees. | ||
| * The `cache/` directory under `graphify-out/` contains hashed file content snapshots. Treat it like build output — gitignore it. | ||
| * The MCP server reads `graph.json` from disk and exposes it over stdio. Do not commit `graph.json` to repos with private content. | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| | Symptom | Cause | Resolution | | ||
| |--------------------------------------------------|-----------------------------------------------------------|-----------------------------------------------------------------------------| | ||
| | `graphify: command not found` | Wrong package name installed | The PyPI distribution name is `graphifyy` (double y); the CLI is `graphify` | | ||
| | `ANTHROPIC_API_KEY is not set` | Deep mode invoked without API credentials | Export the key, or downgrade to `--mode fast` | | ||
| | `graphify-out/graph.json not found` | MCP server started before first build | Run `graphify <path>` once before reloading the VS Code window | | ||
| | MCP tools not visible in Copilot Chat | `.vscode/mcp.json` missing or VS Code not reloaded | Confirm file path, then `Developer: Reload Window` | | ||
| | Graph contains no edges | Repository contains only file types Graphify cannot parse | Verify with `graphify <path> --dry-run` to see detected file types | | ||
| | Stale results after edits | Cache hit on changed files | Run with `--update` (recommended) or delete `graphify-out/cache/` | | ||
| | Edge `INFERRED` confidence is low for many edges | Deep extraction over an unfamiliar codebase | Increase `--mode deep` budget or treat low-confidence edges as hypotheses | | ||
|
|
||
| ## Version Pinning Policy | ||
|
|
||
| The upstream `graphifyy` project is on default branch `v5` with frequent releases. This skill pins to a specific version. Bumps to the pinned version follow the standard `feat(skills)` / `fix(skills)` commit flow and require: | ||
|
|
||
| 1. A re-run of the skill's regression tests in [tests/](tests/). | ||
| 2. A diff review of the upstream `CHANGELOG` for breaking tool-name or output-shape changes that would invalidate `graph-researcher` agent assumptions. | ||
| 3. A note in the version-bump commit body referencing the upstream tag. | ||
|
|
||
| > Brought to you by microsoft/hve-core | ||
|
|
||
| *🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.* | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.