Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ memex/__pycache__/
.memex/
.env
dist/
tests/__pycache__/
tests/__pycache__/
.mcp.json
13 changes: 9 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,12 @@ memex/
β”‚ β”œβ”€β”€ writer.py # Renders KnowledgeRecord to .md and commits it
β”‚ β”œβ”€β”€ action.py # GitHub Action entry point β€” reads env vars, orchestrates
β”‚ β”œβ”€β”€ adr.py # ADR parser β€” find_adr_files, parse_adr, index_adrs
β”‚ β”œβ”€β”€ cli.py # Click CLI β€” `memex configure/init/update/index/query`
β”‚ β”œβ”€β”€ cli.py # Click CLI β€” `memex configure/init/update/index/query/serve`
β”‚ β”œβ”€β”€ config.py # API key resolution β€” load_api_key, save_api_key, CONFIG_FILE
β”‚ β”œβ”€β”€ nudge.py # Low-confidence nudge comment β€” should_nudge, post_nudge_comment
β”‚ β”œβ”€β”€ init.py # `memex init` β€” bootstrap from repo scan
β”‚ └── update.py # `memex update` β€” incremental extraction from git history
β”‚ β”œβ”€β”€ update.py # `memex update` β€” incremental extraction from git history
β”‚ └── mcp_server.py # MCP server β€” memex_query, memex_get_decision, memex_list_recent
β”œβ”€β”€ tests/
β”‚ └── ...
β”œβ”€β”€ pyproject.toml
Expand Down Expand Up @@ -86,14 +87,16 @@ The index cache lives at:
| Structured output | `instructor` + `pydantic` | Guaranteed schema compliance, auto-retry |
| Vector search | `numpy` cosine similarity over `index.json` | No database needed at MVP scale (<5k records) |
| CLI | `click` | Standard, simple |
| MCP server | `mcp` (official SDK, `mcp.server.fastmcp`) | Exposes knowledge tools to AI coding agents via stdio |
| GitHub API | `gh` CLI in Actions, `PyGithub` if needed in Python | Already available in Actions runner |

**There is no database.** Knowledge records are markdown files in the repo.
The index is a JSON file. Do not introduce SQLite, PostgreSQL, Redis, or any other
persistence layer in Phase 1.

**There is no server.** The Action runs in GitHub's infrastructure. The CLI runs
locally. Do not introduce FastAPI, Flask, or any web framework in Phase 1.
**There is no HTTP server.** The Action runs in GitHub's infrastructure. The CLI runs
locally. The MCP server uses stdio transport (subprocess-based, no network port).
Do not introduce FastAPI, Flask, or any web framework in Phase 1.

**One API key.** Everything goes through `ANTHROPIC_API_KEY`. Do not introduce
OpenAI, Cohere, or any other LLM provider dependency.
Expand Down Expand Up @@ -213,6 +216,7 @@ memex index # embed all .md files in knowledge/,
memex query "why did we move off MongoDB" # cosine similarity search, top 3 results
memex query --min-score 0.5 "..." # broaden search by lowering the relevance threshold
memex query --expand "vague question" # rewrite query via Claude Haiku before embedding
memex serve # start the MCP server (stdio) for AI coding agents
```

`memex index` should be incremental β€” only embed files whose content has changed since
Expand Down Expand Up @@ -288,6 +292,7 @@ When you make any of the changes below, update CLAUDE.md **in the same commit**:
|---|---|
| New/removed/renamed `.py` in `memex/` | File structure section |
| New/removed `@cli.command()` in `cli.py` | CLI behaviour section |
| New/removed `@mcp.tool()` in `mcp_server.py` | File structure section |
| `model=` string in `extractor.py` or `init.py` | Tech stack table + decisions section |
| New dependency in `pyproject.toml` | Tech stack table |
| New `os.environ["VAR"]` in `action.py` | Environment variables table |
Expand Down
37 changes: 37 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -275,10 +275,47 @@ memex query QUESTION Semantic search over indexed knowledge
--top N Return top N results (default: 3)
--min-score F Hide results below this similarity score (default: 0.70)
--expand Rewrite query via Claude Haiku before searching
memex serve Start the MCP server (stdio) for AI coding agents
```

---

## MCP server (AI agent integration)

`memex serve` exposes your knowledge index as an [MCP](https://modelcontextprotocol.io) server, making it available to AI coding agents (Claude Code, Cursor, Copilot, Windsurf) as a set of callable tools. The agent can query your team's decisions automatically before suggesting architectural changes β€” without you having to prompt it.

Three tools are exposed:

| Tool | Description |
|---|---|
| `memex_query(question, top, min_score)` | Semantic search β€” same as `memex query` but callable by the agent; default `min_score` is 0.5 so borderline matches are surfaced with their score |
| `memex_get_decision(id)` | Fetch the full text of a specific record by file path or title slug |
| `memex_list_recent(domain, limit)` | List recent decisions, optionally filtered by a domain keyword (e.g. `"auth"`, `"database"`) |

### Setup

Run `memex index` first so the server has an index to query.

Create `.mcp.json` in your repo root (this file is git-ignored β€” paths are machine-specific):

```json
{
"mcpServers": {
"memex": {
"command": "/path/to/python3.12",
"args": ["-m", "memex.mcp_server"],
"cwd": "/path/to/your/repo"
}
}
}
```

Replace `/path/to/python3.12` with the Python 3.12+ interpreter that has `memex-oss` installed (`which python3.12` or `which python3`), and `/path/to/your/repo` with the absolute path to your repo.

Reload your editor and the three tools will appear in the agent's tool list automatically.

---

## Querying your knowledge

`memex query` runs a local semantic search β€” no data leaves your machine.
Expand Down
20 changes: 20 additions & 0 deletions memex/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,26 @@ def index(force, include_adrs):
click.echo(f"Done. {len(existing)} records total.")


@cli.command()
def serve():
"""Start the MCP server (stdio transport) for AI coding agents.

Exposes three tools: memex_query, memex_get_decision, memex_list_recent.

\b
Configure in .mcp.json or claude_desktop_config.json:
{
"mcpServers": {
"memex": {"command": "memex", "args": ["serve"], "cwd": "/your/repo"}
}
}

Run `memex index` first so the server has an index to query.
"""
from .mcp_server import mcp
mcp.run()


@cli.command()
@click.argument("query", nargs=-1)
@click.option("--top", default=3, help="Number of results")
Expand Down
160 changes: 160 additions & 0 deletions memex/mcp_server.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
"""MCP server for Memex β€” exposes institutional knowledge to AI coding agents.

Three tools:
memex_query β€” semantic search over indexed decisions
memex_get_decision β€” fetch a specific record by path/slug
memex_list_recent β€” browse recent decisions, optionally filtered by domain

Start with: memex serve
Configure in .mcp.json / claude_desktop_config.json:
{"mcpServers": {"memex": {"command": "memex", "args": ["serve"], "cwd": "/your/repo"}}}
"""
from __future__ import annotations

from pathlib import Path

from mcp.server.fastmcp import FastMCP

from .cli import (
KNOWLEDGE_DIR,
cosine_similarity,
embed,
extract_confidence,
load_index,
)

mcp = FastMCP("memex")

_NO_INDEX = (
"No index found. Run `memex index` first to embed your knowledge records."
)


@mcp.tool()
def memex_query(question: str, top: int = 3, min_score: float = 0.5) -> str:
"""Semantic search over indexed architectural decisions.

Returns the most relevant decisions matching the question.
Low-confidence records are included with their confidence score surfaced
so you can hedge your answer. Default min_score is 0.5 (lower than the CLI
default of 0.7 β€” agents benefit from seeing borderline matches).
Run `memex index` first if the index is empty.
"""
index = load_index()
if not index:
return _NO_INDEX

[query_embedding] = embed([question])

scored = [
(cosine_similarity(query_embedding, entry["embedding"]), entry)
for entry in index.values()
]
scored.sort(key=lambda x: x[0], reverse=True)
results = [(s, e) for s, e in scored if s >= min_score][:top]

if not results:
return (
f"No results above similarity threshold {min_score:.2f}. "
"Try a lower threshold or rephrase your question."
)

lines = [f"Results for: {question}\n"]
for i, (score, entry) in enumerate(results, 1):
confidence = entry.get("confidence", 1.0)
if confidence < 0.65:
conf_note = " ⚠️ limited rationale"
elif confidence < 0.80:
conf_note = " πŸ’‘ partial rationale"
else:
conf_note = ""

lines.append(f"{i}. {entry['title']} [score {score:.2f}{conf_note}]")
if entry.get("excerpt"):
lines.append(f" {entry['excerpt'][:300]}")
lines.append(f" {entry['path']}\n")

return "\n".join(lines)


@mcp.tool()
def memex_get_decision(id: str) -> str:
"""Fetch the full text of a specific decision record by path or title slug.

`id` can be an exact file path
(e.g. 'knowledge/decisions/2024-11-14-migrate-billing.md'),
a filename fragment (e.g. 'migrate-billing'), or any partial path suffix.
Returns the raw markdown including frontmatter.
"""
# Exact path first
exact = Path(id)
if exact.exists():
return exact.read_text()

# Match against indexed paths
index = load_index()
for path_str in index:
if id in path_str:
p = Path(path_str)
if p.exists():
return p.read_text()

# Fallback: glob knowledge dir directly (works even without an index)
matches = list(KNOWLEDGE_DIR.rglob(f"*{id}*.md"))
if matches:
return matches[0].read_text()

return (
f"No record found matching '{id}'. "
"Use memex_query to search by topic, then pass the returned path here."
)


@mcp.tool()
def memex_list_recent(domain: str = "", limit: int = 10) -> str:
"""List recent architectural decisions, optionally filtered by domain keyword.

`domain` is matched case-insensitively against each record's title and excerpt
(e.g. 'auth', 'database', 'api', 'infra'). Returns up to `limit` records
sorted most-recent first (by filename date).
"""
index = load_index()
if not index:
return _NO_INDEX

entries = list(index.values())

if domain:
kw = domain.lower()
entries = [
e for e in entries
if kw in e.get("title", "").lower() or kw in e.get("excerpt", "").lower()
]
if not entries:
return f"No decisions found matching domain '{domain}'."

def _date_key(entry: dict) -> str:
stem = Path(entry["path"]).stem
return stem[:10] if len(stem) >= 10 else "0000-00-00"

entries.sort(key=_date_key, reverse=True)
entries = entries[:limit]

header = "Recent decisions"
if domain:
header += f" in '{domain}'"
header += f" ({len(entries)} shown):\n"

lines = [header]
for entry in entries:
confidence = entry.get("confidence", 1.0)
conf_note = " ⚠️" if confidence < 0.65 else ""
date_str = _date_key(entry)
lines.append(f" {date_str} {entry['title']}{conf_note}")
lines.append(f" {entry['path']}\n")

return "\n".join(lines)


if __name__ == "__main__":
mcp.run()
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ dependencies = [
"pydantic>=2.8.0",
"numpy>=1.26.0",
"click>=8.1.0",
"mcp>=1.0.0",
]

[project.optional-dependencies]
Expand Down
Loading
Loading