This repository is a markdown-first knowledge base for articles, papers, repos, datasets, and derived research notes about building with AI.
It supports two operating modes:
- personal/local use through filesystem workflows and a local stdio MCP server
- shared/team use through git, pull requests, and a hosted Streamable HTTP MCP server
This repo treats AI research as a durable system rather than a chat transcript:
raw/stores source material and provenancewiki/stores synthesized understanding.kb/orKB_CACHE_DIRstores a rebuildable retrieval index- MCP exposes the KB to Codex, Claude Code, and other compatible clients
raw/articles/: source notes and imported material Source notes include a canonical repo-relativepathin frontmatter for agent-readable provenance.raw/images/: local images related to source materialwiki/concepts/: concept pages synthesized across many sourceswiki/summaries/: multi-source summaries, memos, and reportswiki/index/: maps of content and navigation pagesbin/: executable entrypoints likesearch.tsandmcp-http.tssrc/core/: indexing, ingest, lint, refresh, and search internalssrc/mcp/: MCP server construction and stdio transportsrc/http/: HTTP server, config, and route handlers__tests__/: test suites organized by subsystem.kb/: derived search index built from markdown files
- Ingest source material into
raw/articles/. - Synthesize durable notes into
wiki/. - Rebuild the KB index with
bun run kb:refresh. - Retrieve relevant notes with search or MCP tools.
- Keep the current canonical view active and mark older contradictory notes
status: superseded.
Install dependencies:
bun install
Refresh the KB:
bun run kb:refresh
Search by topic:
bun run kb:search --query "managed agents context engineering"
Search from active file context:
bun run kb:search --file /absolute/path/to/file.ts
Summarize repeated remote HTTP MCP search observations:
bun run kb:search-report
Run the local stdio MCP server:
bun run kb:mcp
Run the hosted Streamable HTTP MCP server:
bun run kb:mcp:http
Containerize it:
docker build -t ai-research-kb .
Railway is the recommended first host for the shared/team HTTP server:
- add this repo as a Railway service
- let Railway build from the included
Dockerfile - keep
KB_STATEFUL_SESSIONS=false - keep
KB_ENABLE_WRITES=false - expose the service with Railway Public Networking
Current shared MCP endpoint:
https://kb-production-1c43.up.railway.app/mcp
Attach it in Codex:
codex mcp add ai-research-kb-shared --url https://kb-production-1c43.up.railway.app/mcpAttach it in Claude Code:
claude mcp add --transport http --scope user ai-research-kb-shared https://kb-production-1c43.up.railway.app/mcpSanity-check the deployed service:
- root:
https://kb-production-1c43.up.railway.app/ - health:
https://kb-production-1c43.up.railway.app/health - MCP endpoint:
https://kb-production-1c43.up.railway.app/mcp
For hosted stateless deployments, POST /mcp is the normal MCP path. A plain GET /mcp may return 405 Method Not Allowed, which is expected when standalone SSE is disabled.
Hosted HTTP MCP search telemetry is enabled by default so the team can learn from real retrieval traffic. The observation log keeps query text, query-expansion diagnostics, and top-result diagnostics. For kb_search_file, it keeps the contextLabel and text size only, not the raw pasted file text. Review the log with bun run kb:search-report.
Optional hosted search telemetry environment variables:
KB_SEARCH_TELEMETRY_ENABLED=falsedisables HTTP MCP search observationsKB_SEARCH_OBSERVATION_LOG_PATH=/absolute/path/to/search-observations.ndjsonoverrides the default log pathKB_SEARCH_TELEMETRY_SALT=...enables privacy-safe client hashing so repeated bad queries can be grouped by caller without storing raw IPsKB_ADMIN_TOKEN=...enables protected admin telemetry endpoints for remote inspection
For shared deployment details, see docs/mcp-deployment.md. For the Railway-specific path, see docs/railway.md.
If KB_ADMIN_TOKEN is set, you can inspect remote telemetry without shell access to the host:
curl -H "Authorization: Bearer $KB_ADMIN_TOKEN" \
https://kb.example.com/admin/search-observations/report?format=textcurl -H "Authorization: Bearer $KB_ADMIN_TOKEN" \
"https://kb.example.com/admin/search-observations/export?format=json&tool=kb_search&limit=200"Tools:
kb_build_contextkb_find_gapskb_list_catalogkb_make_handoffkb_searchkb_search_filekb_read_notekb_refreshkb_trace_claimkb_ingest
Resources:
kb://statskb://catalogkb://catalog/page/{page}
kb://catalog is an overview resource, not a full corpus dump. Use kb_list_catalog or kb://catalog/page/{page} when you need to browse at scale.
The higher-level tools are intentionally grounded in this repo's own wiki model:
kb_build_contextcompiles a task-specific context pack from concept pages and source notes. Passcompact=truewhen you want a lighter-weight context pack that is less likely to flood client context windows.kb_find_gapsruns wiki health checks for orphan notes, thin concepts, uncovered tags, and unreviewed ingests.kb_trace_claimtraces a claim through synthesis notes and primary source paths.kb_make_handoffturns the current wiki view into a reusable long-running-agent handoff artifact.kb_search_fileaccepts either a host-localfilePathor rawtext, which makes it usable both locally and over shared HTTP MCP.
By default, search and catalog browsing exclude notes marked status: superseded. Include them only when you intentionally want historical contradictions.
Run the full repo quality check before opening a PR:
bun run check
This runs:
- Biome formatting and lint checks
- TypeScript typechecking
- Bun tests from
__tests__/ - KB refresh and linting
- Repo workflow and authoring rules: AGENTS.md
- Claude Code usage: CLAUDE.md
- Contributing guide: CONTRIBUTING.md
- Security policy: SECURITY.md
- Code of conduct: CODE_OF_CONDUCT.md
- MCP setup and commands: docs/mcp-server.md
- Shared deployment: docs/mcp-deployment.md
- Cross-repo access: docs/external-agent-access.md
- Release checklist: docs/release-checklist.md
If the KB contains contradictory knowledge, keep the current canonical note active. Preserve older material for provenance, but mark it status: superseded and point superseded_by at the current note so it drops out of default retrieval.