Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,11 @@
"collab",
"easyops",
"figjam",
"graphify",
"graphifyy",
"hideable",
"learning",
"safishamsi",
"smol",
"subcat",
"whiteboarding",
Expand Down
84 changes: 84 additions & 0 deletions .github/agents/experimental/graph-researcher.agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
name: Graph Researcher
description: "Answers structural questions about a codebase by querying a graphify-built knowledge graph through MCP tools, returning evidence-tagged findings"
---

# Graph Researcher

Specialized researcher that answers questions about a codebase using a pre-built [graphify](../../skills/experimental/graphify/SKILL.md) knowledge graph. Use when the user asks structural questions ("what depends on X", "what cluster is Y in", "what connects A and B") that grep cannot answer cleanly.

This agent does not build the graph. It assumes `graphify-out/graph.json` exists in the workspace and the `graphify` MCP server is registered in `.vscode/mcp.json`. If either is missing, surface the gap to the user with the exact remediation step from the skill.

Read and follow the conventions in [graphify.instructions.md](../../instructions/experimental/graphify.instructions.md) for working-directory layout, audit-tag reporting, and confidence-score handling.

## Required Phases

### Phase 1: Verify the graph is available

Before answering any question:

1. Confirm `graphify-out/graph.json` exists at the workspace root.
2. Confirm at least one `mcp_graphify_*` tool is available in the current Copilot Chat session.
3. If either check fails, stop and report exactly one of:
* "No `graphify-out/graph.json` found. Build the graph first: `graphify . --mode standard --update`."
* "Graphify MCP server not registered. Add the snippet from the [graphify skill Quick Start](../../skills/experimental/graphify/SKILL.md#quick-start) to `.vscode/mcp.json` and reload the window."

Do not proceed with speculative answers when the graph is unavailable.

### Phase 2: Pick the right tool for the question

Map the user's question to the smallest sufficient MCP tool:

| Question shape | Tool | Why |
|------------------------------------------|------------------------------|-------------------------------------------------------------|
| "What is X?" / "Tell me about X" | `mcp_graphify_get_node` | Direct node fetch with metadata |
| "What does X depend on / call / import?" | `mcp_graphify_get_neighbors` | Edge-typed neighbour lookup |
| "What connects A and B?" | `mcp_graphify_shortest_path` | Returns path nodes + edge types |
| "What are the central pieces here?" | `mcp_graphify_god_nodes` | High-centrality top-N |
| "What clusters / themes exist?" | `mcp_graphify_graph_stats` | Communities, density, clustering coefficient |
| "What community contains X?" | `mcp_graphify_get_community` | Returns the cluster X belongs to |
| Open-ended exploration | `mcp_graphify_query_graph` | Use last; expensive and less deterministic than typed tools |

Prefer typed tools over `query_graph` when the question fits a typed shape. Reserve `query_graph` for genuine exploration.

### Phase 3: Report findings with audit tags

Every answer must:

1. Name the MCP tool(s) used and the node IDs touched.
2. Tag each load-bearing edge in the answer with its audit tag (`EXTRACTED`, `INFERRED`, `AMBIGUOUS`) and confidence score where present.
3. Distinguish "the graph says" from "I conclude". The graph is evidence, not an oracle.
4. Surface `AMBIGUOUS` edges to the user as open questions, not facts.
5. When the graph contradicts the user's stated assumption, say so directly.

Example reporting shape:

```text
Tool: mcp_graphify_shortest_path(from="auth_middleware.py", to="legacy_session_store")
Path: auth_middleware.py -> session_manager.py -> legacy_session_store
Edge tags: EXTRACTED, INFERRED (confidence 0.71)
Conclusion: There is a 2-hop dependency, but the second hop is INFERRED — the
LLM saw a likely reference. Verify by reading session_manager.py.
```

### Phase 4: Suggest the next read

End every non-trivial answer with one suggested file or symbol the user should read next, picked from the graph result. This keeps the conversation grounded in the source rather than the graph.

## Required Protocol

1. Never invent edges or nodes. If a question cannot be answered from `mcp_graphify_*` tool output, say so and suggest a graph rebuild scope (e.g., "the docs aren't in this graph; rebuild with the `docs/` folder included").
2. Never trigger a graph rebuild yourself. Builds are user-initiated because they have cost and time implications.
3. Never claim a path or relationship without naming the MCP tool call that produced it.
4. When the user asks a question that grep would answer faster (e.g., "where is the string 'TODO'?"), say so and decline gracefully — this agent is for structural questions.
5. When `mcp_graphify_graph_stats` shows the graph has more than ~30% `INFERRED` or `AMBIGUOUS` edges, warn the user that conclusions are tentative and suggest re-running with `--mode deep`.
Comment thread
TechPreacher marked this conversation as resolved.

## Out of Scope

* Building or rebuilding the graph (use the [graphify skill](../../skills/experimental/graphify/SKILL.md) Quick Start directly).
* Editing source files in response to graph findings (use a separate implementor agent).
* Semantic code review (use a code-review agent — graph centrality is not the same as code quality).

> Brought to you by microsoft/hve-core

*🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.*
68 changes: 68 additions & 0 deletions .github/instructions/experimental/graphify.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
description: "Conventions for working with graphify-out/ directories and graph-derived evidence"
applyTo: '**/graphify-out/**'
---

# Graphify Instructions

Conventions that apply whenever Copilot reads, writes, or references files under any `graphify-out/` directory. These instructions govern the [graphify skill](../../skills/experimental/graphify/) and the [Graph Researcher agent](../../agents/experimental/graph-researcher.agent.md).

## Working Directory

A `graphify-out/` directory is generated build output. It contains:

```text
graphify-out/
├── graph.json # Canonical graph data — read-only for agents
├── graph.html # Interactive visualization
Comment thread
TechPreacher marked this conversation as resolved.
├── GRAPH_REPORT.md # God nodes, surprising connections, suggested questions
├── wiki/ # Per-community markdown articles
└── cache/ # SHA256 incremental cache (do not edit)
```

Rules:

* Treat every file under `graphify-out/` as build output. Do not edit by hand.
* Add `graphify-out/` to the target repository's `.gitignore` before the first build.
* When reading `graph.json`, prefer MCP queries over direct JSON parsing. The MCP server applies confidence filtering and edge typing that raw JSON does not.

## Audit-Tag Reporting

Every edge in a graphify graph carries an audit tag: `EXTRACTED`, `INFERRED`, or `AMBIGUOUS`. When summarizing graph findings:

| Tag | How to report |
|-------------|-----------------------------------------------------------------------------|
| `EXTRACTED` | State as fact: "X depends on Y." |
| `INFERRED` | Hedge with the confidence score: "X likely depends on Y (confidence 0.74)." |
| `AMBIGUOUS` | Surface as a question, not a claim: "It is unclear whether X depends on Y." |

Never collapse multiple audit tags into a single sentence without distinguishing them. A path through the graph that contains both `EXTRACTED` and `INFERRED` edges is an `INFERRED` path overall — the chain is only as strong as its weakest edge.

## Reading GRAPH_REPORT.md

`GRAPH_REPORT.md` is a generated summary. When the user asks an open-ended exploration question ("what's interesting in this codebase?"), prefer reading `GRAPH_REPORT.md` over running multiple MCP queries — the report already contains god-node, surprising-connection, and suggested-question sections that are cheaper to read than to recompute.

If `GRAPH_REPORT.md` is older than the most recent commit on the default branch, recommend a `graphify . --update` rebuild before relying on it.

## Cost Discipline

The deep-mode rebuild path issues many parallel Claude API calls. Agents must not trigger rebuilds autonomously. When a user's question would benefit from a fresher graph, surface the recommendation and the approximate cost-shape ("roughly N files changed since last build, expect a partial rebuild") and let the user decide.

## Privacy and Upload Discipline

Graphify's deep-extraction stage uploads file *contents* to the Claude API. Before recommending a deep rebuild, check:

* Does the target tree contain secrets, credentials, or `.env` files that are not gitignored?
* Does the tree contain content under upload restrictions (customer data, regulated material)?

If either is true, recommend `--mode fast` (no LLM, AST-only) instead, and note the reduced fidelity in the conversation.

## Out of Scope

These instructions do not cover:

* How to install or configure `graphifyy` — see the [skill](../../skills/experimental/graphify/SKILL.md).
* How to register the MCP server with Copilot Chat — see the skill Quick Start.
* General code-review or refactor practices — graph centrality is not a code-quality signal.

*🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.*
149 changes: 149 additions & 0 deletions .github/skills/experimental/graphify/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
---
name: graphify
description: 'Build and query knowledge graphs over a codebase using the graphifyy CLI and MCP server - Brought to you by microsoft/hve-core'
license: MIT
compatibility: 'Requires Python 3.10+, the graphifyy PyPI package (pinned), and an ANTHROPIC_API_KEY environment variable for the semantic-extraction stage. GitHub Copilot Chat uses the graphify MCP server for queries.'
metadata:
authors: "microsoft/hve-core"
spec_version: "1.0"
last_updated: "2026-04-29"
---

# Graphify Skill

Use this skill to build a knowledge graph over a folder of source code, documentation, PDFs, and images, and to query that graph from GitHub Copilot Chat. The graph surfaces structural relationships, high-centrality nodes, and clusters that are not visible from grep alone.

This skill wraps the upstream [`graphifyy`](https://pypi.org/project/graphifyy/) PyPI package — it does not reimplement Graphify. Pin a single version of `graphifyy` so behaviour stays stable as the upstream project iterates.

Comment thread
TechPreacher marked this conversation as resolved.
## Third-Party Attribution

Graphify is an MIT-licensed project by Safi Shamsi. See <https://github.com/safishamsi/graphify>. This skill orchestrates the upstream CLI and MCP server; no upstream source is vendored.

## Prerequisites

| Requirement | Notes |
|---------------------|----------------------------------------------------------------------------------------------------------|
| Python 3.10+ | Match the upstream `graphifyy` minimum |
| `graphifyy` | Install with `pip install graphifyy==0.5.4`. The CLI binary is `graphify` (single `y`) |
| `ANTHROPIC_API_KEY` | Required for deep-mode semantic extraction. Each build issues parallel Claude calls — budget accordingly |
| MCP-capable client | GitHub Copilot Chat in VS Code 1.97+ reads `.vscode/mcp.json` and surfaces tools as `mcp_graphify_*` |

Optional extras:

| Extra | Purpose |
|--------------------|--------------------------------------------------------------------|
| `graphifyy[video]` | Adds yt-dlp + Whisper for transcribing audio/video sources |
| Neo4j driver | Required only if pushing the graph to a Neo4j instance for queries |
| `obsidian` extras | Required only when exporting an Obsidian vault |

## Quick Start

### 1. Build a graph

```bash
graphify ./path/to/repo --mode deep --update
```

This writes outputs into `./path/to/repo/graphify-out/`:

| File | Purpose |
|-------------------|--------------------------------------------------------------------------|
| `graph.json` | Canonical graph data (nodes, edges, communities, audit tags) |
| `graph.html` | Interactive vis.js visualization |
| `GRAPH_REPORT.md` | God nodes, surprising connections, suggested questions, token-cost table |
| `wiki/` | One markdown article per community (agent-crawlable) |
| `cache/` | SHA256 incremental cache for `--update` and `--watch` |

The `graphify-out/` directory **must be gitignored** in target repositories. See [graphify.instructions.md](../../../instructions/experimental/graphify.instructions.md) for the canonical pattern.

### 2. Register the MCP server with Copilot Chat

Add `graphify` to the workspace's `.vscode/mcp.json`:

```json
{
"servers": {
"graphify": {
"command": "python3",
"args": ["-m", "graphify.serve", "graphify-out/graph.json"],
"type": "stdio"
}
}
}
```

Reload the VS Code window. Copilot Chat surfaces these tools (names follow GHCC's `mcp_<server>_<tool>` convention):

| Tool | Purpose |
|------------------------------|---------------------------------------------------------------|
| `mcp_graphify_query_graph` | Free-form natural-language query against graph + communities |
| `mcp_graphify_get_node` | Fetch a node by ID with metadata |
| `mcp_graphify_get_neighbors` | Direct neighbours of a node, optionally filtered by edge type |
| `mcp_graphify_get_community` | All nodes in a community (cluster) |
| `mcp_graphify_god_nodes` | High-centrality nodes (top connectors) |
| `mcp_graphify_graph_stats` | Counts, density, clustering coefficient |
| `mcp_graphify_shortest_path` | Shortest path between two nodes |

### 3. Ask Copilot Chat structural questions

Once the MCP server is registered, the `@graph-researcher` agent (this collection) can answer questions like:

* "What other modules are implicitly affected if I change `auth_middleware.py`?"
* "Which agents in `.github/agents/` are most connected to security artifacts?"
* "Show me the shortest path between `feature_x` and `legacy_config_y`."
* "What communities exist in this repo, and which one is the auth code in?"

## Build Modes

| Mode | Flag | When to use |
|--------------------|---------------|------------------------------------------------------------------------------------------------|
| Fast | `--mode fast` | AST/tree-sitter only. Deterministic, no LLM calls, no API key required. Use for CI smoke tests |
| Standard (default) | (no flag) | AST + selective semantic extraction. Reasonable cost, good coverage |
| Deep | `--mode deep` | Full parallel Claude semantic extraction. Highest fidelity, highest cost |
| Update | `--update` | Reuses the SHA256 cache; rebuilds only changed files. Safe to combine with any mode |
| Watch | `--watch` | Daemon mode; rebuilds on file change |

For HVE Core's primary use case (analysing the artifact library itself), prefer `--mode standard --update`.

## Edge Audit Tags

Every edge in `graph.json` carries one of:

| Tag | Meaning |
|-------------|----------------------------------------------------------------------------------|
| `EXTRACTED` | Derived deterministically from AST/tree-sitter — high confidence |
| `INFERRED` | Derived from LLM semantic extraction — medium confidence, has `confidence_score` |
| `AMBIGUOUS` | Multiple candidate interpretations — low confidence, surface to user |

When the `@graph-researcher` agent answers a question, it must report the audit tag of the edges its conclusion rests on. Do not collapse `INFERRED` and `EXTRACTED` evidence in summaries.

## Cost and Safety Notes

* Deep-mode builds dispatch many parallel Claude calls. A first build over a 10k-file repo can run several USD; budget before enabling.
* Graphify uploads file *contents* to the Claude API during semantic extraction. Do not run deep mode against repositories containing secrets or content under upload restrictions. Use `--mode fast` (no LLM) for sensitive trees.
* The `cache/` directory under `graphify-out/` contains hashed file content snapshots. Treat it like build output — gitignore it.
* The MCP server reads `graph.json` from disk and exposes it over stdio. Do not commit `graph.json` to repos with private content.

## Troubleshooting

| Symptom | Cause | Resolution |
|--------------------------------------------------|-----------------------------------------------------------|-----------------------------------------------------------------------------|
| `graphify: command not found` | Wrong package name installed | The PyPI distribution name is `graphifyy` (double y); the CLI is `graphify` |
| `ANTHROPIC_API_KEY is not set` | Deep mode invoked without API credentials | Export the key, or downgrade to `--mode fast` |
| `graphify-out/graph.json not found` | MCP server started before first build | Run `graphify <path>` once before reloading the VS Code window |
| MCP tools not visible in Copilot Chat | `.vscode/mcp.json` missing or VS Code not reloaded | Confirm file path, then `Developer: Reload Window` |
| Graph contains no edges | Repository contains only file types Graphify cannot parse | Verify with `graphify <path> --dry-run` to see detected file types |
| Stale results after edits | Cache hit on changed files | Run with `--update` (recommended) or delete `graphify-out/cache/` |
| Edge `INFERRED` confidence is low for many edges | Deep extraction over an unfamiliar codebase | Increase `--mode deep` budget or treat low-confidence edges as hypotheses |

## Version Pinning Policy

The upstream `graphifyy` project is on default branch `v5` with frequent releases. This skill pins to a specific version. Bumps to the pinned version follow the standard `feat(skills)` / `fix(skills)` commit flow and require:

1. A re-run of the skill's regression tests in [tests/](tests/).
2. A diff review of the upstream `CHANGELOG` for breaking tool-name or output-shape changes that would invalidate `graph-researcher` agent assumptions.
3. A note in the version-bump commit body referencing the upstream tag.

> Brought to you by microsoft/hve-core

*🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.*
Loading
Loading