diff --git a/README.md b/README.md index 973eb39d..28c6357c 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,9 @@ [![CI](https://github.com/m1rl0k/Context-Engine/actions/workflows/ci.yml/badge.svg)](https://github.com/m1rl0k/Context-Engine/actions/workflows/ci.yml) +**Documentation:** README · [Configuration](docs/CONFIGURATION.md) · [IDE Clients](docs/IDE_CLIENTS.md) · [MCP API](docs/MCP_API.md) · [ctx CLI](docs/CTX_CLI.md) · [Memory Guide](docs/MEMORY_GUIDE.md) · [Architecture](docs/ARCHITECTURE.md) · [Multi-Repo](docs/MULTI_REPO_COLLECTIONS.md) · [Kubernetes](deploy/kubernetes/README.md) · [VS Code Extension](docs/vscode-extension.md) · [Troubleshooting](docs/TROUBLESHOOTING.md) · [Development](docs/DEVELOPMENT.md) + +--- + ## Context-Engine at a Glance Context-Engine is a plug-and-play MCP retrieval stack that unifies code indexing, hybrid search, and optional llama.cpp decoding so product teams can ship context-aware agents in minutes, not weeks. @@ -9,26 +13,36 @@ Context-Engine is a plug-and-play MCP retrieval stack that unifies code indexing

**Key differentiators** -- One-command bring-up delivers dual SSE/RMCP endpoints, seeded Qdrant, and live watch/reindex loops for fast local validation. -- ReFRAG-inspired micro-chunking, token budgeting, and gate-first filtering surface precise spans while keeping prompts lean. -- Shared memory/indexer schema and reranker tooling make it easy to mix dense, lexical, and semantic signals without bespoke glue code. -- **NEW: Performance optimizations** including connection pooling, intelligent caching, request deduplication, and async subprocess management that cut redundant calls and smooth spikes under load. -- Operational playbooks (prune, warm, health, cache) plus rich tests give teams confidence to take the stack from laptop to production. +- One-command bring-up delivers dual SSE/RMCP endpoints, seeded Qdrant, and live watch/reindex loops +- ReFRAG-inspired micro-chunking, token budgeting, and gate-first filtering surface precise spans +- Shared memory/indexer schema and reranker tooling for dense, lexical, and semantic signals +- **ctx CLI prompt enhancer** with multi-pass unicorn mode for code-grounded prompt rewriting +- VS Code extension with Prompt+ button and automatic workspace sync +- Kubernetes deployment with Kustomize for remote/scalable setups +- Performance optimizations: connection pooling, caching, deduplication, async subprocess management **Built for** -- AI platform and IDE tooling teams that need an MCP-compliant context layer without rebuilding indexing, embeddings, or retrieval heuristics. -- DevEx and documentation groups standing up internal assistants that must ingest large or fast-changing codebases with minimal babysitting. +- AI platform and IDE tooling teams needing an MCP-compliant context layer +- DevEx groups standing up internal assistants for large or fast-changing codebases -**Solves** -- Slow agent onboarding caused by fractured infra—ship a consistent stack for memory, search, and decoding under one config. -- Context drift in monorepos—automatic micro-chunking and watcher-driven reindexing keep embeddings aligned with reality. -- Fragmented client compatibility—serve both legacy SSE and modern HTTP RMCP clients from the same deployment. -- **NEW: Performance relief** via intelligent caching, connection pooling, and async I/O patterns that eliminate redundant processing. +## Supported Clients -## Context-Engine +| Client | Transport | Notes | +|--------|-----------|-------| +| Roo | SSE/RMCP | Both SSE and RMCP connections | +| Cline | SSE/RMCP | Both SSE and RMCP connections | +| Windsurf | SSE/RMCP | Both SSE and RMCP connections | +| Zed | SSE | Uses mcp-remote bridge | +| Kiro | SSE | Uses mcp-remote bridge | +| Qodo | RMCP | Direct HTTP endpoints | +| OpenAI Codex | RMCP | TOML config | +| Augment | SSE | Simple JSON configs | +| AmpCode | SSE | Simple URL for SSE endpoints | +| Claude Code CLI | SSE | Simple JSON configs | +> **See [docs/IDE_CLIENTS.md](docs/IDE_CLIENTS.md) for detailed configuration examples.** -## Context-Engine Quickstart (5 minutes) +## Quickstart (5 minutes) This gets you from zero to “search works” in under five minutes. @@ -67,85 +81,25 @@ HOST_INDEX_PATH=. COLLECTION_NAME=codebase docker compose run --rm indexer --roo - Ports: 8000/8001 (/sse) and 8002/8003 (/mcp) - Command: `INDEX_MICRO_CHUNKS=1 MAX_MICRO_CHUNKS_PER_FILE=200 make reset-dev-dual` -### Environment Configuration - -**Default Setup:** -- The repository includes `.env.example` with sensible defaults for local development -- On first run, copy it to `.env`: `cp .env.example .env` -- The `make reset-dev*` targets will use your `.env` settings automatically - -**Key Configuration Files:** -- `.env` — Your local environment variables (gitignored, safe to customize) -- `.env.example` — Template with documented defaults (committed to repo) -- `docker-compose.yml` — Service definitions that read from `.env` - -**Recommended Customizations:** - -1. **Enable micro-chunking** (better retrieval quality): - ```bash - INDEX_MICRO_CHUNKS=1 - MAX_MICRO_CHUNKS_PER_FILE=200 - ``` - -2. **Enable decoder for Q&A** (context_answer tool): - ```bash - REFRAG_DECODER=1 # Enable decoder (default: 1) - REFRAG_RUNTIME=llamacpp # Use llama.cpp (default) or glm - ``` +### Environment Setup -3. **GPU acceleration** (Apple Silicon Metal): - ```bash - # Option A: Use the toggle script (recommended) - scripts/gpu_toggle.sh gpu - scripts/gpu_toggle.sh start - - # Option B: Manual .env settings - USE_GPU_DECODER=1 - LLAMACPP_URL=http://host.docker.internal:8081 - LLAMACPP_GPU_LAYERS=32 # or -1 for all layers - ``` - -4. **Alternative: GLM API** (instead of local llama.cpp): - ```bash - REFRAG_RUNTIME=glm - GLM_API_KEY=your-api-key-here - GLM_MODEL=glm-4.6 # Optional, defaults to glm-4.6 - ``` - -5. **Collection name** (unified by default): - ```bash - COLLECTION_NAME=codebase # Default: single unified collection for all code - # Only change this if you need isolated collections per project - ``` - -**After changing `.env`:** -- Restart services: `docker compose restart mcp_indexer mcp_indexer_http` -- For indexing changes: `make reindex` or `make reindex-hard` -- For decoder changes: `docker compose up -d --force-recreate llamacpp` (or restart native server) - -### Switch decoder model (llama.cpp) -- Default tiny model: Granite 4.0 Micro (Q4_K_M GGUF) -- Change the model by overriding Make vars (downloads to ./models/model.gguf): ```bash -LLAMACPP_MODEL_URL="https://huggingface.co/ORG/MODEL/resolve/main/model.gguf" \ - INDEX_MICRO_CHUNKS=1 MAX_MICRO_CHUNKS_PER_FILE=200 make reset-dev-dual +cp .env.example .env # Copy template on first run ``` -- Want GPU acceleration? Set `LLAMACPP_USE_GPU=1` (optionally `LLAMACPP_GPU_LAYERS=-1`) in your `.env` before `docker compose up`, or simply run `scripts/gpu_toggle.sh gpu` (described below) to flip the switch for you. -- Embeddings: set EMBEDDING_MODEL in .env and reindex (make reindex) - -Decoder env toggles (set in `.env` and managed automatically by `scripts/gpu_toggle.sh`): +Key settings (see [docs/CONFIGURATION.md](docs/CONFIGURATION.md) for full reference): -| Variable | Description | Typical values | -|-----------------------|-------------------------------------------------------|----------------| -| `USE_GPU_DECODER` | Feature-flag for native Metal decoder | `0` (docker), `1` (native) | -| `LLAMACPP_URL` | Decoder endpoint containers should use | `http://llamacpp:8080` or `http://host.docker.internal:8081` | -| `LLAMACPP_GPU_LAYERS` | Number of layers to offload to GPU (`-1` = all) | `0`, `32`, `-1` | +| Setting | Purpose | Default | +|---------|---------|---------| +| `INDEX_MICRO_CHUNKS=1` | Enable micro-chunking | 0 | +| `REFRAG_DECODER=1` | Enable LLM decoder | 1 | +| `REFRAG_RUNTIME` | Decoder backend | llamacpp | +| `COLLECTION_NAME` | Qdrant collection | codebase | - -Alternative (compose only) +**GPU acceleration (Apple Silicon):** ```bash -HOST_INDEX_PATH="$(pwd)" FASTMCP_INDEXER_PORT=8001 docker compose up -d qdrant mcp mcp_indexer indexer watcher +scripts/gpu_toggle.sh gpu # Switch to native Metal +scripts/gpu_toggle.sh start # Start GPU decoder ``` ### Recommended development flow @@ -194,578 +148,117 @@ This re-enables the `llamacpp` container and resets `.env` to `http://llamacpp:8 ### CLI: ctx prompt enhancer -A thin CLI that retrieves code context and rewrites your input into a better, context-aware prompt using the local LLM decoder. Works with both questions and commands/instructions. By default it prints ONLY the improved prompt. - -Examples: -````bash -# Questions: Enhanced with specific details and multiple aspects -scripts/ctx.py "What is ReFRAG?" -# Output: Two detailed question paragraphs with file/line references - -# Commands: Enhanced with concrete targets and implementation details -scripts/ctx.py "Refactor ctx.py" -# Output: Two detailed instruction paragraphs with specific steps - -# Unicorn mode: staged 2–3 pass enhancement for best results -scripts/ctx.py "Refactor ctx.py" --unicorn - -# Via Make target (default improved prompt only) -make ctx Q="Explain the caching logic to me in detail" - -# Filter by language/path or adjust tokens -make ctx Q="Hybrid search details" ARGS="--language python --under scripts/ --limit 2 --rewrite-max-tokens 200" -```` - - -### Detail mode (short snippets) - -Include compact code snippets in the retrieved context for richer rewrites (trades a bit of speed for quality): - -````bash -# Enable detail mode (adds short snippets) - works with questions -scripts/ctx.py "Explain the caching logic" --detail - -# Detail mode with commands - gets more specific implementation details -scripts/ctx.py "Add error handling to ctx.py" --detail - -# Adjust snippet size if needed (default is 1 line when --detail is used) -make ctx Q="Explain hybrid search" ARGS="--detail --context-lines 2" -```` - -Notes: -- Default behavior is header-only (fastest). `--detail` adds short snippets. -- If `--detail` is set and `--context-lines` remains at its default (0), ctx.py automatically uses 1 line to keep snippets concise. Override with `--context-lines N`. -- Detail mode is optimized for speed: automatically clamps to max 4 results and 1 result per file. - -### Unicorn mode (staged multi-pass for best quality) - -Use `--unicorn` for the highest quality prompt enhancement with a staged 2-3 pass approach: - -````bash -# Unicorn mode with commands - produces exceptional, detailed instructions -scripts/ctx.py "refactor ctx.py" --unicorn - -# Unicorn mode with questions - produces highly intelligent, multi-faceted questions -scripts/ctx.py "what is ReFRAG and how does it work?" --unicorn +A CLI that retrieves code context and rewrites your input into a better, code-grounded prompt using the local LLM decoder. -# Works with all filters -scripts/ctx.py "add error handling" --unicorn --language python -```` +**Features:** +- **Unicorn mode** (`--unicorn`): Multi-pass enhancement with 2-3 refinement stages +- **Detail mode** (`--detail`): Include compact code snippets for richer context +- **Memory blending**: Falls back to stored memories when code search returns no hits +- **Streaming**: Real-time token output for instant feedback +- **Filters**: `--language`, `--under`, `--limit` to scope retrieval -**How it works:** +```bash +scripts/ctx.py "What is ReFRAG?" # Basic question +scripts/ctx.py "Refactor ctx.py" --unicorn # Multi-pass enhancement +scripts/ctx.py "Add error handling" --detail # With code snippets +make ctx Q="Explain caching" # Via Make target +```

Unicorn Usage

-Unicorn mode uses multiple LLM passes with progressively richer code context: - -1. **Pass 1 (Draft)**: Retrieves rich code snippets (8 lines of context per match) to understand the codebase and sharpen the intent -2. **Pass 2 (Refine)**: Retrieves even richer snippets (12 lines of context) based on the draft to ground the prompt with concrete code behaviors -3. **Pass 3 (Polish)**: Optional cleanup pass that runs only if the output appears generic or incomplete - -**Key features:** - -- **Code-grounded**: References actual code behaviors and patterns from your codebase, not file paths or line numbers -- **No hallucinations**: Only uses real code from your indexed repository - never invents references -- **Multi-paragraph output**: Produces detailed, comprehensive prompts that explore multiple aspects -- **Works with both questions and commands**: Enhances any type of prompt - -**When to use:** - -- **Normal mode**: Quick, everyday prompts (fastest) -- **--detail**: Richer context without multi-pass overhead (balanced) -- **--unicorn**: When you need the absolute best prompt quality (highest quality) - -### Advanced Features - -#### 1. Streaming Output (Default) - -All modes now stream tokens as they arrive for instant feedback: - -````bash -# Streaming is enabled by default - see output appear immediately -scripts/ctx.py "refactor ctx.py" --unicorn -```` - -To disable streaming (wait for full response): -- Set `"streaming": false` in `~/.ctx_config.json` - -#### 2. Memory Blending - -Automatically falls back to `context_search` with memories when repo search returns no hits: - -````bash -# If no code matches, ctx.py will search design docs and ADRs -scripts/ctx.py "What is our authentication strategy?" -```` - -This ensures you get relevant context even when the query doesn't match code directly. - -#### 3. Adaptive Context Sizing - -Automatically adjusts `limit` and `context_lines` based on query characteristics: - -- **Short/vague queries** → More context for richer grounding -- **Queries with file/function names** → Lighter settings for speed - -````bash -# Short query → auto-increases context -scripts/ctx.py "caching" - -# Specific query → optimized for speed -scripts/ctx.py "refactor fetch_context function in ctx.py" -```` +> **See [docs/CTX_CLI.md](docs/CTX_CLI.md) for full documentation.** -#### 4. Automatic Quality Assurance +## Index Another Codebase -Enhanced `_needs_polish()` heuristic automatically triggers a third polish pass when: - -- Output is too short (< 180 chars) -- Contains generic/vague language -- Missing concrete code references -- Lacks proper paragraph structure +```bash +# Index a specific path +make index-path REPO_PATH=/path/to/repo [RECREATE=1] -This happens transparently in `--unicorn` mode - no user action needed. +# Index current directory +cd /path/to/repo && make -C /path/to/Context-Engine index-here -#### 5. Personalized Templates +# Raw docker compose +docker compose run --rm -v /path/to/repo:/work indexer --root /work --recreate +``` -Create `~/.ctx_config.json` to customize prompt enhancement behavior: +> **See [docs/MULTI_REPO_COLLECTIONS.md](docs/MULTI_REPO_COLLECTIONS.md) for multi-repo architecture and remote deployment.** -````json -{ - "always_include_tests": true, - "prefer_bullet_commands": false, - "extra_instructions": "Always consider error handling and edge cases", - "streaming": true -} -```` +## Verify Endpoints -**Available preferences:** +```bash +curl -sSf http://localhost:6333/readyz && echo "Qdrant OK" +curl -sI http://localhost:8001/sse | head -n1 # SSE +curl -sI http://localhost:8003/mcp | head -n1 # RMCP +``` -- `always_include_tests`: Add testing considerations to all prompts -- `prefer_bullet_commands`: Format commands as bullet points -- `extra_instructions`: Custom instructions added to every rewrite -- `streaming`: Enable/disable streaming output (default: true) +--- -See `ctx_config.example.json` for a template. +## Documentation -GPU Acceleration (Apple Silicon): -For faster prompt rewriting, use the native Metal-accelerated decoder: -````bash -# 1. Set USE_GPU_DECODER=1 in your .env file (already set by default) -# 2. Start the native llama.cpp server with Metal GPU -scripts/gpu_toggle.sh start +| Topic | Description | +|-------|-------------| +| [Configuration](docs/CONFIGURATION.md) | Complete environment variable reference | +| [IDE Clients](docs/IDE_CLIENTS.md) | Setup for Roo, Cline, Windsurf, Zed, Kiro, Qodo, Codex, Augment | +| [MCP API](docs/MCP_API.md) | Full API reference for all MCP tools | +| [ctx CLI](docs/CTX_CLI.md) | Prompt enhancer CLI with unicorn mode | +| [Memory Guide](docs/MEMORY_GUIDE.md) | Memory patterns and metadata schema | +| [Architecture](docs/ARCHITECTURE.md) | System design and component interactions | +| [Multi-Repo](docs/MULTI_REPO_COLLECTIONS.md) | Multi-repository indexing and remote deployment | +| [Kubernetes](deploy/kubernetes/README.md) | Kubernetes deployment with Kustomize | +| [VS Code Extension](docs/vscode-extension.md) | Workspace uploader and Prompt+ integration | +| [Troubleshooting](docs/TROUBLESHOOTING.md) | Common issues and solutions | +| [Development](docs/DEVELOPMENT.md) | Contributing and development setup | -# Now ctx.py will automatically use the GPU decoder on port 8081 -make ctx Q="Explain the caching logic to me in detail" +--- -# Stop the native GPU server -scripts/gpu_toggle.sh stop +## Available MCP Tools -# To use Docker decoder instead, set USE_GPU_DECODER=0 in .env and restart: -docker compose up -d llamacpp -```` +**Memory MCP** (port 8000 SSE, 8002 RMCP): +- `store` — save memories with metadata +- `find` — hybrid memory search +- `set_session_defaults` — set default collection for session -Notes: -- Defaults to the Indexer HTTP RMCP endpoint at http://localhost:8003/mcp (override with MCP_INDEXER_URL) -- Decoder endpoint: automatically detects GPU mode via USE_GPU_DECODER env var (set by gpu_toggle.sh) -- Docker decoder (default): http://localhost:8080/completion -- GPU decoder (after gpu_toggle.sh gpu): http://localhost:8081/completion -- See also: `make ctx` +**Indexer MCP** (port 8001 SSE, 8003 RMCP): +- **Search**: `repo_search`, `code_search`, `context_search`, `context_answer` +- **Specialized**: `search_tests_for`, `search_config_for`, `search_callers_for`, `search_importers_for` +- **Indexing**: `qdrant_index_root`, `qdrant_index`, `qdrant_prune` +- **Status**: `qdrant_status`, `qdrant_list`, `workspace_info`, `list_workspaces`, `collection_map` +- **Utilities**: `expand_query`, `change_history_for_path`, `set_session_defaults` -## Index another codebase (outside this repo) +> **See [docs/MCP_API.md](docs/MCP_API.md) for complete API documentation.** -You can index any local folder by mounting it at /work. Three easy ways: +## Language Support -1) Make target: index a specific path -```bash -make index-path REPO_PATH=/abs/path/to/other/repo [RECREATE=1] [REPO_NAME=name] [COLLECTION=name] -``` -- RECREATE=1 drops and recreates the collection before indexing -- Defaults: REPO_NAME and COLLECTION fall back to the folder name +Python, JavaScript/TypeScript, Go, Java, Rust, Shell, Terraform, PowerShell, YAML, C#, PHP -2) Make target: index the current working directory -```bash -cd /abs/path/to/other/repo -make -C /Users/user/Desktop/Context-Engine index-here [RECREATE=1] [REPO_NAME=name] [COLLECTION=name] -``` +## Running Tests -3) Raw docker compose (one‑shot ingest without Make) ```bash -docker compose run --rm \ - -v /abs/path/to/other/repo:/work \ - indexer --root /work [--recreate] -``` -Notes: -- No need to bind-mount this repository; the images bake /app/scripts and set WORK_ROOTS="/work,/app" so utilities import correctly. -- MCP clients can connect to the running servers and operate on whichever folder is mounted at /work. - -## Supported IDE clients/extensions -- Roo (SSE/RMCP): supports both SSE and RMCP connections; see config examples below -- Cline (SSE/RMCP): supports both SSE and RMCP connections; see config examples below -- Windsurf (SSE/RMCP): supports both SSE and RMCP connections; see config examples below -- Zed (SSE): uses mcp-remote bridge via command/args; see config below -- Kiro (SSE): uses mcp-remote bridge via command/args; see config below -- Qodo (RMCP): connects directly to HTTP endpoints; add each tool individually -- OpenAI Codex (RMCP): TOML config for memory/indexer URLs -- Augment (SSE): simple JSON configs for both servers -- AmpCode (SSE): simple URL for both legacy sse endpoints -- Claude Code CLI(SSE): simple JSON configs for both servers - -3) Verify endpoints -````bash -# Qdrant DB -curl -sSf http://localhost:6333/readyz >/dev/null && echo "Qdrant OK" -# Decoder (llama.cpp sidecar) -curl -s http://localhost:8080/health -# SSE endpoints (Memory, Indexer) -curl -sI http://localhost:8000/sse | head -n1 -curl -sI http://localhost:8001/sse | head -n1 -# RMCP endpoints (HTTP JSON-RPC) -curl -sI http://localhost:8002/mcp | head -n1 -curl -sI http://localhost:8003/mcp | head -n1 -```` - -## Configuration reference (env vars) - -Core -- COLLECTION_NAME: Qdrant collection to use (defaults to repo name if unset in some flows) -- REPO_NAME: Logical name for the indexed repo; stored in payload for filtering -- HOST_INDEX_PATH: Absolute host path to index (mounted to /work in containers) - -Indexing / micro-chunks -- INDEX_MICRO_CHUNKS: 1 to enable micro‑chunking; off falls back to line chunks -- MAX_MICRO_CHUNKS_PER_FILE: Cap micro‑chunks per file (e.g., 200 default) -- TOKENIZER_URL, TOKENIZER_PATH: Hugging Face tokenizer.json URL and local path -- USE_TREE_SITTER: 1 to enable tree-sitter parsing (optional; off by default) - -Watcher -- WATCH_DEBOUNCE_SECS: Debounce between change events (e.g., 1.5) -- INDEX_UPSERT_BATCH / INDEX_UPSERT_RETRIES / INDEX_UPSERT_BACKOFF: Upsert tuning -- QDRANT_TIMEOUT: Request timeout in seconds for upserts/queries (e.g., 60–90) -- MCP_TOOL_TIMEOUT_SECS: Max duration for long-running MCP tools (index/prune); default 3600s - - -Reranker -- RERANKER_ONNX_PATH, RERANKER_TOKENIZER_PATH: Paths for local ONNX cross‑encoder -- RERANKER_ENABLED: 1/true to enable, 0/false to disable; default is enabled in server - - Timeouts/failures automatically fall back to hybrid results - -Decoder (llama.cpp / GLM) -- REFRAG_DECODER: 1 to enable decoder for context_answer; 0 to disable (default: 1) -- REFRAG_RUNTIME: llamacpp or glm (default: llamacpp) -- LLAMACPP_URL: llama.cpp server endpoint (default: http://llamacpp:8080 or http://host.docker.internal:8081 for GPU) -- LLAMACPP_TIMEOUT_SEC: Decoder request timeout in seconds (default: 300) -- DECODER_MAX_TOKENS: Max tokens for decoder responses (default: 4000) -- REFRAG_DECODER_MODE: prompt or soft (default: prompt; soft requires patched llama.cpp) -- GLM_API_KEY: API key for GLM provider (required when REFRAG_RUNTIME=glm) -- GLM_MODEL: GLM model name (default: glm-4.6) -- USE_GPU_DECODER: 1 for native Metal decoder on host, 0 for Docker (managed by gpu_toggle.sh) -- LLAMACPP_GPU_LAYERS: Number of layers to offload to GPU, -1 for all (default: 32) - -ReFRAG (micro-chunking and retrieval) -- REFRAG_MODE: 1 to enable micro-chunking and span budgeting (default: 1) -- REFRAG_GATE_FIRST: 1 to enable mini-vector gating before dense search (default: 1) -- REFRAG_CANDIDATES: Number of candidates for gate-first filtering (default: 200) -- MICRO_BUDGET_TOKENS: Global token budget for context_answer spans (default: 512) -- MICRO_OUT_MAX_SPANS: Max number of spans to return per query (default: 3) - -Ports -- FASTMCP_PORT (SSE/RMCP): Override Memory MCP ports (defaults: 8000/8002) -- FASTMCP_INDEXER_PORT (SSE/RMCP): Override Indexer MCP ports (defaults: 8001/8003) - - -### Env var quick table - -| Name | Description | Default | -|------|-------------|---------| -| COLLECTION_NAME | Qdrant collection name (unified across all repos) | codebase | -| REPO_NAME | Logical repo tag stored in payload for filtering | auto-detect from git/folder | -| HOST_INDEX_PATH | Host path mounted at /work in containers | current repo (.) | -| QDRANT_URL | Qdrant base URL | container: http://qdrant:6333; local: http://localhost:6333 | -| INDEX_MICRO_CHUNKS | Enable token-based micro-chunking | 0 (off) | -| HYBRID_EXPAND | Enable heuristic multi-query expansion | 0 (off) | -| MAX_MICRO_CHUNKS_PER_FILE | Cap micro-chunks per file | 200 | -| TOKENIZER_URL | HF tokenizer.json URL (for Make download) | n/a (use Make target) | -| TOKENIZER_PATH | Local path where tokenizer is saved (Make) | models/tokenizer.json | -| TOKENIZER_JSON | Runtime path for tokenizer (indexer) | models/tokenizer.json | -| USE_TREE_SITTER | Enable tree-sitter parsing (py/js/ts) | 0 (off) | -| WATCH_DEBOUNCE_SECS | Debounce between FS events (watcher) | 1.5 | -| INDEX_UPSERT_BATCH | Upsert batch size (watcher) | 128 | -| INDEX_UPSERT_RETRIES | Retry count (watcher) | 5 | -| MCP_TOOL_TIMEOUT_SECS | Max duration for long-running MCP tools | 3600 | -| INDEX_UPSERT_BACKOFF | Seconds between retries (watcher) | 0.5 | -| QDRANT_TIMEOUT | HTTP timeout seconds | watcher: 60; search: 20 | -| RERANKER_ONNX_PATH | Local ONNX cross-encoder model path | unset (see make setup-reranker) | -| RERANKER_TOKENIZER_PATH | Tokenizer path for reranker | unset | -| RERANKER_ENABLED | Enable reranker by default | 1 (enabled) | -| FASTMCP_PORT | Memory MCP server port (SSE/RMCP) | 8000 (container-internal) | -| FASTMCP_INDEXER_PORT | Indexer MCP server port (SSE/RMCP) | 8001 (container-internal) | -| FASTMCP_HTTP_PORT | Memory RMCP host port mapping | 8002 | -| FASTMCP_INDEXER_HTTP_PORT | Indexer RMCP host port mapping | 8003 | -| FASTMCP_HEALTH_PORT | Health port (memory/indexer) | memory: 18000; indexer: 18001 | -| LLM_EXPAND_MAX | Max alternate queries generated via LLM | 0 | -| REFRAG_DECODER | Enable decoder for context_answer | 1 (enabled) | -| REFRAG_RUNTIME | Decoder backend: llamacpp or glm | llamacpp | -| LLAMACPP_URL | llama.cpp server endpoint | http://llamacpp:8080 or http://host.docker.internal:8081 | -| LLAMACPP_TIMEOUT_SEC | Decoder request timeout | 300 | -| DECODER_MAX_TOKENS | Max tokens for decoder responses | 4000 | -| GLM_API_KEY | API key for GLM provider | unset | -| GLM_MODEL | GLM model name | glm-4.6 | -| USE_GPU_DECODER | Native Metal decoder (1) vs Docker (0) | 0 (docker) | -| REFRAG_MODE | Enable micro-chunking and span budgeting | 1 (enabled) | -| REFRAG_GATE_FIRST | Enable mini-vector gating | 1 (enabled) | -| REFRAG_CANDIDATES | Candidates for gate-first filtering | 200 | -| MICRO_BUDGET_TOKENS | Token budget for context_answer | 512 | - -## Running tests - -Local (recommended) -- Python 3.11+ -- Create venv and install deps: -````bash -python3 -m venv .venv -source .venv/bin/activate +python3 -m venv .venv && source .venv/bin/activate pip install -r requirements.txt -```` -- Run the full suite: -````bash pytest -q -```` -- Run a single file or test: -````bash -pytest tests/test_ingest_micro_chunks.py -q -pytest tests/test_php_support.py::test_imports -q -```` -- Tips: - - RERANKER_ENABLED=0 can speed up some tests locally; functionality still validated via hybrid fallback. - - Some integration tests may start ephemeral containers via testcontainers; ensure Docker is running. - -Inside Docker (optional, ad-hoc) -- You can run tests in the indexer image by overriding the entrypoint: -````bash -docker compose run --rm --entrypoint pytest mcp-indexer -q -```` -Note: the provided dev images focus on runtime; local venv is faster for iterative testing. - - -## Language support -- Python, JavaScript/TypeScript, Go, Java, Rust, Shell, Terraform, PowerShell, YAML, C#, PHP - -## Watcher behavior and tips -- Handles delete and move: removes/migrates points to avoid stale entries -- Live reloads ignore patterns: changes to .qdrantignore are applied without restart -- path_glob matches against relative paths (e.g., src/**/*.py), not absolute /work paths -- If upserts time out, lower INDEX_UPSERT_BATCH (e.g., 96) or raise QDRANT_TIMEOUT (e.g., 90) -- For very large files, reduce MAX_MICRO_CHUNKS_PER_FILE (e.g., 200) during dev - -## Expected HTTP behaviors -- GET /mcp may return 400 (normal): the RMCP endpoint is POST-only for JSON-RPC -- SSE requires a session handshake; raw POST /messages without it will error (expected) - -```bash -curl -sSf http://localhost:6333/readyz >/dev/null && echo "Qdrant OK" -curl -sI http://localhost:8000/sse | head -n1 -curl -sI http://localhost:8001/sse | head -n1 -``` - -4) Single command to index + search -```bash -# Fresh index of your repo and a quick hybrid example -make reindex-hard -make qdrant-status -make hybrid ARGS="--query 'async file watcher' --limit 5 --include-snippet" ``` -5) Example MCP client configurations - -Kiro (SSE): -Create `.kiro/settings/mcp.json` in your workspace: -````json -{ - "mcpServers": { - "qdrant-indexer": { "command": "npx", "args": ["mcp-remote", "http://localhost:8001/sse", "--transport", "sse-only"] }, - "memory": { "command": "npx", "args": ["mcp-remote", "http://localhost:8000/sse", "--transport", "sse-only"] } - } -} -```` - -Zed (SSE): -Add to your Zed `settings.json` (accessed via Command Palette → "Settings: Open Settings (JSON)"): -````json -{ - /// The name of your MCP server - "qdrant-indexer": { - /// The command which runs the MCP server - "command": "npx", - /// The arguments to pass to the MCP server - "args": [ - "mcp-remote", - "http://localhost:8001/sse", - "--transport", - "sse-only" - ], - /// The environment variables to set - "env": {} - } -} -```` -Notes: -- Zed expects MCP servers at the root level of settings.json -- Uses command/args (stdio). mcp-remote bridges to remote SSE endpoints -- If npx prompts, add `-y` right after npx: `"command": "npx", "args": ["-y", "mcp-remote", ...]` -- Alternative: Use direct HTTP connection if mcp-remote has issues: - ```json - { - "qdrant-indexer": { - "type": "http", - "url": "http://localhost:8001/sse" - } - } - ``` -- For Qodo (RMCP) clients, see "Qodo Integration (RMCP config)" below for the direct `url`-based snippet. - -6) Common troubleshooting -- Tree-sitter not found or parser errors: - - Feature is optional. If you set USE_TREE_SITTER=1 and see errors, unset it or install tree-sitter deps, then reindex. -- Tokenizer missing for micro-chunks: - - Run make tokenizer or set TOKENIZER_JSON to a valid tokenizer.json; otherwise we fall back to line-based chunking. -- SSE “Invalid session ID” when POSTing /messages directly: - - Expected if you didn’t initiate an SSE session first. Use an MCP client (e.g., mcp-remote) to handle the handshake. -- llama.cpp platform warning on Apple Silicon: - - Prefer the native path above (`scripts/gpu_toggle.sh gpu`). If you stick with Docker, add `platform: linux/amd64` to the service or ignore the warning during local dev. -- Indexing feels stuck on very large files: - - Use MAX_MICRO_CHUNKS_PER_FILE=200 during dev runs. - - -- Watcher timeouts (-9) or Qdrant "ResponseHandlingException: timed out": - - Set watcher-safe defaults to reduce payload size and add headroom during upserts: - - ````ini - # Watcher-safe defaults (compose already applies these to the watcher service) - QDRANT_TIMEOUT=60 - MAX_MICRO_CHUNKS_PER_FILE=200 - INDEX_UPSERT_BATCH=128 - INDEX_UPSERT_RETRIES=5 - INDEX_UPSERT_BACKOFF=0.5 - WATCH_DEBOUNCE_SECS=1.5 - ```` - - - - If issues persist, try lowering INDEX_UPSERT_BATCH to 96 or raising QDRANT_TIMEOUT to 90. - -ReFRAG background: https://arxiv.org/abs/2509.01092 - -Endpoints - -| Component | URL | -|-------------|------------------------------| -| Memory MCP | http://localhost:8000/sse | -| Indexer MCP | http://localhost:8001/sse | -| Qdrant DB | http://localhost:6333 | - - -### Streamable HTTP (RMCP) endpoints + OpenAI Codex config - -- Memory HTTP (RMCP): http://localhost:8002/mcp -- Indexer HTTP (RMCP): http://localhost:8003/mcp +> **See [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md) for full development setup.** -OpenAI Codex config (RMCP client): +## Endpoints -````toml -experimental_use_rmcp_client = true +| Component | SSE | RMCP | +|-----------|-----|------| +| Memory MCP | http://localhost:8000/sse | http://localhost:8002/mcp | +| Indexer MCP | http://localhost:8001/sse | http://localhost:8003/mcp | +| Qdrant DB | http://localhost:6333 | - | +| Decoder | http://localhost:8080 | - | -[mcp_servers.memory_http] -url = "http://127.0.0.1:8002/mcp" +> **See [docs/IDE_CLIENTS.md](docs/IDE_CLIENTS.md) for client setup and [docs/TROUBLESHOOTING.md](docs/TROUBLESHOOTING.md) for common issues.** -[mcp_servers.qdrant_indexer_http] -url = "http://127.0.0.1:8003/mcp" -```` - - -### Kiro Integration (workspace config) - -Add this to your workspace-level Kiro config at `.kiro/settings/mcp.json` (restart Kiro after saving): - -````json -{ - "mcpServers": { - "qdrant-indexer": { "command": "npx", "args": ["mcp-remote", "http://localhost:8001/sse", "--transport", "sse-only"] }, - "memory": { "command": "npx", "args": ["mcp-remote", "http://localhost:8000/sse", "--transport", "sse-only"] } - } -} -```` - -Notes: -- Kiro expects command/args (stdio). `mcp-remote` bridges to remote SSE endpoints. -- If `npx` prompts in your environment, add `-y` right after `npx`. -- Workspace config overrides user-level config (`~/.kiro/settings/mcp.json`). - -Troubleshooting: -- Error: “Enabled MCP Server must specify a command, ignoring.” - - Fix: Use the `command`/`args` form above; do not use `type:url` in Kiro. -- ImportError: `deps: No module named 'scripts'` when calling `memory_store` on the indexer MCP - - Fix applied: server now adds `/work` and `/app` to `sys.path`. Restart `mcp_indexer`. - -## Available MCP tools - -Memory MCP (8000 SSE, 8002 RMCP): -- store(information, metadata?, collection?) — write a memory entry into the default collection (dual vectors: dense + lexical) -- find(query, limit=5, collection?, top_k?) — hybrid memory search over memory-like entries - -Indexer/Search MCP (8001 SSE, 8003 RMCP): -- repo_search — hybrid code search (dense + lexical + optional reranker) -- context_search — search that can also blend memory results (include_memories) -- context_answer — natural-language Q&A with retrieval + local LLM (llama.cpp or GLM) -- code_search — alias of repo_search -- repo_search_compat — permissive wrapper that normalizes q/text/queries/top_k payloads -- context_answer_compat — permissive wrapper for context_answer with lenient argument handling -- expand_query(query, max_new?) — LLM-assisted query expansion (generates 1-2 alternates) -- qdrant_index_root — index /work (mounted repo root) with safe defaults -- qdrant_index(subdir?, recreate?, collection?) — index a subdir or recreate collection -- qdrant_prune — remove points for missing files or file_hash mismatch -- qdrant_list — list Qdrant collections -- qdrant_status — collection counts and recent ingestion timestamps -- workspace_info(workspace_path?) — read .codebase/state.json and resolve default collection -- list_workspaces(search_root?) — scan for multiple workspaces in multi-repo environments -- memory_store — convenience memory store from the indexer (uses default collection) -- search_tests_for — intent wrapper for test files -- search_config_for — intent wrapper for likely config files -- search_callers_for — intent wrapper for probable callers/usages -- search_importers_for — intent wrapper for files importing a module/symbol -- change_history_for_path(path) — summarize recent changes using stored metadata -- collection_map - return collection↔repo mappings -- default_collection - set the collection to use for the session - -Notes: -- Most search tools accept filters like language, under, path_glob, kind, symbol, ext. -- Reranker enabled by default; timeouts fall back to hybrid results. -- context_answer requires decoder enabled (REFRAG_DECODER=1) with llama.cpp or GLM backend. - -### Qodo Integration (RMCP config) - -Add this to your Qodo MCP settings to target the RMCP (HTTP) endpoints: +ReFRAG background: https://arxiv.org/abs/2509.01092 -````json -{ - "mcpServers": { - "memory": { "url": "http://localhost:8002/mcp" }, - "qdrant-indexer": { "url": "http://localhost:8003/mcp" } - } -} -```` +--- -Note: Qodo can talk to the RMCP endpoints directly, so no `mcp-remote` wrapper is required. - - -## Architecture overview - -- Agents connect via MCP over SSE: - - Memory MCP: http://localhost:8000/sse - - Indexer MCP: http://localhost:8001/sse -- Both MCP servers talk to Qdrant inside Docker at http://qdrant:6333 (DB HTTP API) -- Supporting jobs (indexer, watcher, init_payload) write to/read from Qdrant directly +## Architecture ```mermaid flowchart LR @@ -791,804 +284,11 @@ flowchart LR class G opt ``` -## Production-ready local development -## One-line bring-up (ship-ready) - -Start Qdrant, the Memory MCP (8000), the Indexer MCP (8001), and run a fresh index of your current repo: - -```bash -HOST_INDEX_PATH="$(pwd)" FASTMCP_INDEXER_PORT=8001 docker compose up -d qdrant mcp mcp_indexer indexer watcher -``` - -Then wire your MCP-aware IDE/tooling to: -- Memory MCP: http://localhost:8000/sse -- Indexer MCP: http://localhost:8001/sse - -Tip: add `watcher` to the command if you want live reindex-on-save. - -### SSE Memory Server (port 8000) - -- URL: http://localhost:8000/sse -- Tools: `store`, `find` -- Env (used by the indexer to blend memory): - - `MEMORY_SSE_ENABLED=true` - - `MEMORY_MCP_URL=http://mcp:8000/sse` - - `MEMORY_MCP_TIMEOUT=6` - -IDE/Agent config (recommended): - -```json -{ - "mcpServers": { - "memory": { "type": "sse", "url": "http://localhost:8000/sse", "disabled": false }, - "qdrant-indexer": { "type": "sse", "url": "http://localhost:8001/sse", "disabled": false } - } -} -``` - -Blended search: - -## Memory usage patterns (how to get the most from memories) - -### When to use memories vs code search -- Use memories when the information isn’t in your repository or is transient/user-authored: conventions, runbooks, decisions, links, known issues, FAQs, “how we do X here”. -- Use code search for facts that live in the repo: APIs, functions/classes, configuration, and cross-file relationships. -- Blend both for tasks like “how to run E2E tests” where instructions (memory) reference scripts in the repo (code). -- Rule of thumb: if you’d write it in a team wiki or ticket comment, store it as a memory; if you’d grep for it, use code search. - -### Recommended metadata schema (best practices) -We store memory entries as points in Qdrant with a small, consistent payload. Recommended keys: -- kind: "memory" (string) – required. Enables filtering and blending. -- topic: short category string (e.g., "dev-env", "release-process"). -- tags: list of strings (e.g., ["qdrant", "indexing", "prod"]). -- source: where this came from (e.g., "chat", "manual", "tool", "issue-123"). -- author: who added it (e.g., username or email). -- created_at: ISO8601 timestamp (UTC). -- expires_at: ISO8601 timestamp if this memory should be pruned later. -- repo: optional repo identifier if sharing a Qdrant instance across repos. -- link: optional URL to docs, tickets, or dashboards. -- priority: 0.0–1.0 weight that clients can use to bias ranking when blending. - -Notes: -- Keep values small (short strings, small lists). Don’t store large blobs in payload; put details in the `information` text. -- Use lowercase snake_case keys for consistency. -- For secrets/PII: do not store plaintext. Store references or vault paths instead. - -### Example memory operations -Store a memory (via MCP Memory server tool `store` – use your MCP client): -``` -{ - "information": "Run full reset: INDEX_MICRO_CHUNKS=1 MAX_MICRO_CHUNKS_PER_FILE=200 make reset-dev", - "metadata": { - "kind": "memory", - "topic": "dev-env", - "tags": ["make", "reset"], - "source": "chat" - } -} -``` - -Find memories (via MCP Memory server tool `find`): -``` -{ - "query": "reset-dev", - "limit": 5 -} -``` - -Blend memories into code search (Indexer MCP `context_search`): -``` -{ - "query": "async file watcher", - "include_memories": true, - "limit": 5, - "include_snippet": true -} -``` - -Tips: -- Use precise queries (2–5 tokens). Add a couple synonyms if needed; the server supports multiple phrasings. -- Combine `topic`/`tags` in your memory text to make them easier to find (they also live in payload for filtering). - -### Backup and migration (advanced) -For production-grade backup/migration strategies, see the official Qdrant documentation for snapshots and export/import. For local development, we recommend relying on Docker volumes and reindexing when needed. - -Operational notes: -- Collection name comes from `COLLECTION_NAME` (see .env). This stack defaults to a single collection for both code and memories; filtering uses `metadata.kind`. -- If you switch to a dedicated memory collection, update the MCP Memory server and the Indexer's memory blending env to point at it. -- Consider pruning expired memories by filtering `expires_at < now`. - -- Call `context_search` on :8001 (SSE) or :8003 (RMCP) with `{ "include_memories": true }` to return both memory and code results. - -### Collection Naming Strategies - -Different hash lengths are used for different workspace types: - -**Local Workspaces:** `repo-name-8charhash` -- Example: `Anesidara-e8d0f5fc` -- Used by local indexer/watcher -- Assumes unique repo names within workspace - -**Remote Uploads:** `folder-name-16charhash-8charhash` -- Example: `testupload2-04e680d5939dd035-b8b8d4cc` -- Collision avoidance for duplicate folder names for different codebases -- 16-char hash identifies workspace, 8-char hash identifies collection - - -### Enable memory blending (for context_search) - -1) Ensure the Memory MCP is running on :8000 (default in compose). -2) Enable SSE memory blending on the Indexer MCP by setting these env vars for the mcp_indexer service (docker-compose.yml): - - -````yaml -services: - mcp_indexer: - environment: - - MEMORY_SSE_ENABLED=true - - MEMORY_MCP_URL=http://mcp:8000/sse - - MEMORY_MCP_TIMEOUT=6 -```` - - -3) Restart the indexer service: - -````bash -docker compose up -d mcp_indexer -```` - - -4) Validate by calling context_search with include_memories=true for a query that matches a stored memory: - - -````json -{ - "query": "your test memory text", - "include_memories": true, - "limit": 5 -} -```` - - -Expected: non-zero results with blended items; memory hits will have memory-like payloads (e.g., metadata.kind = "memory"). - - -- Idempotent + incremental indexing out of the box: - - Skips unchanged files automatically using a file content hash stored in payload (metadata.file_hash) - - De-duplicates per-file points by deleting prior entries for the same path before insert - - Payload indexes are auto-created on first run (metadata.language, metadata.path_prefix, metadata.repo, metadata.kind, metadata.symbol, metadata.symbol_path, metadata.imports, metadata.calls) -- Commands: - - Full rebuild: `make reindex` - - Fast incremental: `make index` (skips unchanged files) - - Health check: `make health` (verifies collection vector name/dim, HNSW, and filtered queries with kind/symbol) - - Hybrid search: `make hybrid` (dense + lexical bump with RRF) -- Bootstrap all services + index + checks: `make bootstrap` -- Discover commands: `make help` lists all targets and descriptions - -- Ingest Git history: `make history` (messages + file lists) - - If the repo has no local commits yet, the history ingester will shallow-fetch from the remote (default: origin) and use its HEAD. Configure with `--remote` and `--fetch-depth`. -- Local reranker (ONNX): `make rerank-local` (set RERANKER_ONNX_PATH and RERANKER_TOKENIZER_PATH) -- Setup ONNX reranker quickly: `make setup-reranker ONNX_URL=... TOKENIZER_URL=...` (updates .env paths) -- Enable Tree-sitter parsing (more accurate symbols/scopes): set `USE_TREE_SITTER=1` in `.env` then reindex - -- Flags (advanced): - - Disable de-duplication: `docker compose run --rm indexer --root /work --no-dedupe` - - Disable unchanged skipping: `docker compose run --rm indexer --root /work --no-skip-unchanged` - -Notes: -- Named vector remains aligned with the MCP server (fast-bge-base-en-v1.5). If you change EMBEDDING_MODEL, run `make reindex` to recreate the collection. -- For very large repos, consider running `make index` on a schedule (or pre-commit) to keep Qdrant warm without full reingestion. - -### Multi-repo indexing (unified search) - -The stack uses a **single unified `codebase` collection** by default, making multi-repo search seamless: - -**Index another repo into the same collection:** -```bash -# From your qdrant directory -make index-here HOST_INDEX_PATH=/path/to/other/repo REPO_NAME=other-repo - -# Or with full control: -HOST_INDEX_PATH=/path/to/other/repo \ -COLLECTION_NAME=codebase \ -REPO_NAME=other-repo \ -docker compose run --rm indexer --root /work -``` - -**What happens:** -- Files from the other repo get indexed into the unified `codebase` collection -- Each file is tagged with `metadata.repo = "other-repo"` for filtering -- Search across all repos by default, or filter by specific repo - -**Search examples:** -```bash -# Search across all indexed repos -make hybrid QUERY="authentication logic" - -# Filter by specific repo -python scripts/hybrid_search.py \ - --query "authentication logic" \ - --repo other-repo - -# Filter by repo + language -python scripts/hybrid_search.py \ - --query "authentication logic" \ - --repo other-repo \ - --language python -``` - -**Benefits:** -- One collection = unified search across all your code -- No fragmentation or collection management overhead -- Filter by repo when you need isolation -- All repos share the same vector space for better semantic search - -### Multi-query re-ranker (no new deps) - -- Run a fused query with several phrasings and metadata-aware boosts: - -```bash -make rerank -``` - -- Customize: - - Add more `--query` flags - - Prefer language: `--language python` - - Prefer under path: `--under /work/scripts` - -### Watch mode (incremental indexing) - -- Reindex changed files on save (runs until Ctrl+C): - -```bash -make watch -``` - -### HNSW recall tuning - -- Collection creation is tuned for higher recall: `m=16`, `ef_construct=256`. -- If you change embeddings, run `make reindex` to recreate the collection with the tuned HNSW settings. - -### Warm start (optional) - -- Preload the embedding model and warm Qdrant's HNSW search path to reduce first-query latency and improve recall: - -```bash -make warm -``` - - - - - - - -Or, since this stack already exposes SSE, you can configure the client to use `http://localhost:8000/sse` directly (recommended for Cursor/Windsurf). - -### Search filters (repo_search/context_search) - -Most MCP clients let you pass structured tool arguments. The Indexer/search MCP supports applying server-side filters in repo_search/context_search when these keys are present: -- `language`: value matches `metadata.language` -- `path_prefix`: value matches `metadata.path_prefix` (e.g., `/work/src`) -- `kind`: value matches `metadata.kind` (e.g., `function`, `class`, `method`) - -Tip: Combine multiple query phrasings and apply these filters for best precision on large codebases. - - -## Notes - -## Index your repository (code search quality) - -We added a dockerized indexer that chunks code, embeds with `BAAI/bge-base-en-v1.5`, and stores metadata (`path`, `path_prefix`, `language`, `start_line`, `end_line`, `code`) in Qdrant. This boosts recall and relevance for the MCP tools. - -```bash -# Index current workspace (does not drop data) -make index - -# Full reindex (drops existing points in the collection) -make reindex - -### Companion MCP: Index/Prune/List (Option B) - -A second MCP server runs alongside the search MCP and exposes tools: -- qdrant-list: list collections -- qdrant-index: index the mounted path (/work or subdir) -- qdrant-prune: prune stale points for the mounted path - -Configuration -- FASTMCP_INDEXER_PORT (default 8001) -- HOST_INDEX_PATH bind-mounts the target repo into /work (read-only) - -Add to your agent as a separate MCP endpoint (SSE): -- URL: http://localhost:8001/sse - -Example calls (semantics vary by client): -- qdrant-index with args {"subdir":"scripts","recreate":true} - -### MCP client configuration examples - -Roo (SSE/RMCP): - -```json -{ - "mcpServers": { - "memory": { "type": "sse", "url": "http://localhost:8000/sse", "disabled": false }, - "qdrant-indexer": { "type": "sse", "url": "http://localhost:8001/sse", "disabled": false } - } -} -``` - -Cline (SSE/RMCP): - -```json -{ - "mcpServers": { - "memory": { "type": "sse", "url": "http://localhost:8000/sse", "disabled": false }, - "qdrant-indexer": { "type": "sse", "url": "http://localhost:8001/sse", "disabled": false } - } -} -``` - -Windsurf (SSE/RMCP): - -```json -{ - "mcpServers": { - "memory": { "type": "sse", "url": "http://localhost:8000/sse", "disabled": false }, - "qdrant-indexer": { "type": "sse", "url": "http://localhost:8001/sse", "disabled": false } - } -} -``` - -Windsurf/Cursor (stdio for search + SSE for indexer): - -```json -{ - "mcpServers": { - "qdrant": { - "command": "uvx", - "args": ["mcp-server-qdrant"], - "env": { - "QDRANT_URL": "http://localhost:6333", - "COLLECTION_NAME": "my-collection", - "EMBEDDING_MODEL": "BAAI/bge-base-en-v1.5" - }, - "disabled": false - } - } -} -``` - -Augment (SSE for both servers – recommended): - -```json -{ - "mcpServers": { - "memory": { "type": "sse", "url": "http://localhost:8000/sse", "disabled": false }, - "qdrant-indexer": { "type": "sse", "url": "http://localhost:8001/sse", "disabled": false } - } -} -``` - -Qodo (RMCP; add each tool individually): - -**Note**: In Qodo, you must add each MCP tool separately through the UI, not as a single JSON config. - -For each tool, use this format: - -**Tool 1 - memory:** -```json -{ - "memory": { "url": "http://localhost:8002/mcp" } -} -``` - -**Tool 2 - qdrant-indexer:** -```json -{ - "qdrant-indexer": { "url": "http://localhost:8003/mcp" } -} -``` - -#### Important for IDE agents (Cursor/Windsurf/Augment) -- Do not send null values to MCP tools. Omit the field or pass an empty string "" instead. -- qdrant-index examples: - - {"subdir":"","recreate":false,"collection":"my-collection","repo_name":"workspace"} - - {"subdir":"scripts","recreate":true} -- For indexing the repo root with no params, use the zero-arg tool `qdrant_index_root` (new) or call `qdrant-index` with `subdir:""`. - - -##### Zero-config search tool (new) -- repo_search: run code search without filters or config. - - Structured fields supported (parity with DSL): language, under, kind, symbol, ext, not_, case, path_regex, path_glob, not_glob - - Response shaping: compact (bool) returns only path/start_line/end_line - - Smart default: compact=true when query is an array with multiple queries (unless explicitly set) - - If include_snippet is true, compact is forced off so snippet fields are returned - - - Glob fields accept a single string or an array; you can also pass a comma-separated string which will be split - - Query parsing: accepts query or queries; JSON arrays, JSON-stringified arrays, comma-separated strings; also supports q/text aliases - - - Parity note: path_glob/not_glob list handling works in both modes — in-process and subprocess — with OR semantics for path_glob and reject-on-any for not_glob. - - Examples: - - {"query": "semantic chunking"} - - {"query": ["function to split code", "overlapping chunks"], "limit": 15, "per_path": 3} - - {"query": "watcher debounce", "language": "python", "under": "scripts/", "include_snippet": true, "context_lines": 2} - - {"query": "parser", "ext": "ts", "path_regex": "/services/.+", "compact": true} - - {"query": "adapter", "path_glob": ["**/src/**", "**/pkg/**"], "not_glob": "**/tests/**"} - - Returns structured results: score, path, symbol, start_line, end_line, and optional snippet; or compact form. -- code_search: alias of repo_search (same args) for easier discovery in some clients. - -- qdrant_status: return collection size and last index times (safe, read-only). - - {"collection": "my-collection"} - - -Verification: -- You should see tools from both servers (e.g., `store`, `find`, `repo_search`, `code_search`, `context_search`, `qdrant_list`, `qdrant_index`, `qdrant_prune`, `qdrant_status`). -- Call `qdrant_list` to confirm Qdrant connectivity. -- Call `qdrant_index` with args like `{ "subdir": "scripts", "recreate": true }` to (re)index the mounted repo. -- Call `context_search` with `{ "include_memories": true }` to blend memory+code (requires enabling MEMORY_SSE_ENABLED on the indexer service). - -- qdrant_list with no args -- qdrant_prune with no args - - -Notes: -- The indexer reads env from `.env` (QDRANT_URL, COLLECTION_NAME, EMBEDDING_MODEL). -- Default chunking: ~120 lines with 20-line overlap. -- Skips typical build/venv directories. -- Populates `metadata.kind`, `metadata.symbol`, and `metadata.symbol_path` for Python/JS/TS/Go/Java/Rust/Terraform (best-effort), per chunk. -- Uses the same collection as the MCP server. - -### Exclusions (.qdrantignore) and defaults - -- The indexer now supports a `.qdrantignore` file at the repo root (similar to `.gitignore`). Use it to exclude directories/files from indexing. -- Sensible defaults are excluded automatically (overridable): `/models`, `/node_modules`, `/dist`, `/build`, `/.venv`, `/venv`, `/__pycache__`, `/.git`, and files matching `*.onnx`, `*.bin`, `*.safetensors`, `tokenizer.json`, `*.whl`, `*.tar.gz`. -- Override via env or flags: - - Env: `QDRANT_DEFAULT_EXCLUDES=0` to disable defaults; `QDRANT_IGNORE_FILE=.myignore`; `QDRANT_EXCLUDES='tokenizer.json,*.onnx,/third_party'` - - CLI examples: - - `docker compose run --rm indexer --root /work --ignore-file .qdrantignore` - - `docker compose run --rm indexer --root /work --no-default-excludes --exclude '/vendor' --exclude '*.bin'` - -### Scaling and tuning (small → large codebases) - -- Chunking and batching are tunable via env or flags: - - `INDEX_CHUNK_LINES` (default 120), `INDEX_CHUNK_OVERLAP` (default 20) - - `INDEX_BATCH_SIZE` (default 64) - - `INDEX_PROGRESS_EVERY` (default 200 files; 0 disables) -### Prune stale points (optional) - -If files were deleted or significantly changed outside the indexer, remove stale points safely: - -```bash -make prune -``` - -- CLI equivalents: `--chunk-lines`, `--chunk-overlap`, `--batch-size`, `--progress-every`. -- Recommendations: - - Small repos (<100 files): chunk 80–120, overlap 16–24, batch-size 32–64 - - Medium (100s–1k files): chunk 120–160, overlap ~20, batch-size 64–128 - - Large monorepos (1k+): start with defaults; consider `INDEX_PROGRESS_EVERY=200` for visibility and `INDEX_BATCH_SIZE=128` if RAM allows - - -## ReFRAG micro-chunking (retrieval-side, production-ready) - -ReFRAG-lite is enabled in this repo and can be toggled via env. It provides: -- Token-level micro-chunking at ingest (tiny k-token windows with stride) -- Compact vector gating and optional gate-first candidate restriction -- Span compaction and a global token budget at search time - -Enable and tune: - -````ini -# Enable compressed retrieval with micro-chunks -REFRAG_MODE=1 -INDEX_MICRO_CHUNKS=1 - -# Micro windowing -MICRO_CHUNK_TOKENS=16 -MICRO_CHUNK_STRIDE=8 - -# Output shaping and budget -MICRO_OUT_MAX_SPANS=3 -MICRO_MERGE_LINES=4 -MICRO_BUDGET_TOKENS=512 -MICRO_TOKENS_PER_LINE=32 - -# Optional: gate-first using mini vectors to prefilter dense search -REFRAG_GATE_FIRST=0 -REFRAG_CANDIDATES=200 -```` - -Reindex after changing chunking: - -````bash -# Recreate collection (safe for local dev) -docker compose exec mcp_indexer python -c "from scripts.mcp_indexer_server import qdrant_index_root; qdrant_index_root(recreate=True)" -```` - -What results look like (context_search / code_search return shape): - -````json -{ - "score": 0.9234, - "path": "scripts/ingest_code.py", - "start_line": 120, - "end_line": 148, - "span_budgeted": true, - "budget_tokens_used": 224, - "components": { "dense": 0.78, "lex": 0.35, "mini": 0.81 }, - "why": ["dense", "mini"] -} -```` - -Notes: -- span_budgeted=true indicates adjacent micro hits were merged and counted toward the global token budget. -- Tune MICRO_* to control prompt footprint. Increase MICRO_MERGE_LINES to merge looser spans; reduce MICRO_OUT_MAX_SPANS for more file diversity. -- Gate-first reduces dense search compute on large collections; keep off for tiny repos. - - -## Decoder-path ReFRAG (feature-flagged) - -This stack ships a feature-flagged decoder integration path via a llama.cpp sidecar. -It is production-safe by default (off) and can run in a fallback “prompt” mode -that uses a compressed textual context. A future “soft” mode will inject projected -chunk embeddings into a patched llama.cpp server. - - -### Decoder-path dataflow (compress → sense → expand) - -```mermaid -flowchart LR - %% Retrieval side - Q[Query] --> R[Hybrid search + span budgeting] - R --> S[Selected micro-spans] - - %% Projection (φ) and modes - S -->|project via φ| P[(Soft embeddings)] - S -. prompt compress .-> C[Compressed prompt] - - %% Decoder service - subgraph Decoder - G[[llama.cpp :8080]] - end - - %% Mode routing - P -->|soft mode| G - C -->|prompt mode| G - - %% Output - G --> O[Completion] - - %% Notes - classDef opt stroke-dasharray: 5 5 - class C opt -``` - -Enable (safe default is off): - -````ini -REFRAG_DECODER=1 -REFRAG_RUNTIME=llamacpp -LLAMACPP_URL=http://llamacpp:8080 -REFRAG_DECODER_MODE=prompt # prompt|soft (soft requires patched llama.cpp) -REFRAG_ENCODER_MODEL=BAAI/bge-base-en-v1.5 -REFRAG_PHI_PATH=/work/models/refrag_phi_768_to_dmodel.json -```` - -Bring up llama.cpp sidecar (optional): - -````bash -docker compose up -d llamacpp -```` - -Make-based provisioning (recommended): - -````bash -# downloads a tiny GGUF to ./models/model.gguf (override URL via LLAMACPP_MODEL_URL) -make llamacpp-up -# or just fetch the model without starting the service -make llama-model -```` - -Optional: bake the model into the image (no host volume required): - -````bash -# builds an image that includes the model specified by MODEL_URL -make llamacpp-build-image LLAMACPP_MODEL_URL=https://huggingface.co/.../tiny.gguf -# then in docker-compose.yml, either remove the ./models volume for llamacpp -# or override the service to use image: context-llamacpp:tiny -```` - - -Programmatic use: - -````python -from scripts.refrag_llamacpp import LlamaCppRefragClient -c = LlamaCppRefragClient() # uses LLAMACPP_URL -text = c.generate_with_soft_embeddings("Question: ...\n", soft_embeddings=None, max_tokens=128) -```` - - -Notes: -- φ file format: JSON 2D array with shape (d_in, d_model). See scripts/refrag_phi.py. Set REFRAG_PHI_PATH to your JSON file. - -- In prompt mode, the client calls /completion on the llama.cpp server with a compressed prompt. -- In soft mode, the client will require a patched server to accept soft embeddings. The flag ensures no breakage. - - -### Alternative: GLM API Provider - -Instead of running llama.cpp locally, you can use the GLM API (ZhipuAI) as your decoder backend: - -**Setup:** -```bash -# In .env -REFRAG_DECODER=1 -REFRAG_RUNTIME=glm # Switch from llamacpp to glm -GLM_API_KEY=your-api-key # Required -GLM_MODEL=glm-4.6 # Optional, defaults to glm-4.6 -``` - -**How it works:** -- Uses OpenAI SDK with `base_url="https://api.z.ai/api/paas/v4/"` -- Supports prompt mode only (soft embeddings ignored) -- Handles GLM-4.6's reasoning mode (`reasoning_content` field) -- Drop-in replacement for llama.cpp—same interface, no code changes needed - -**Switch back to llama.cpp:** -```bash -REFRAG_RUNTIME=llamacpp -``` - -The GLM provider is implemented in `scripts/refrag_glm.py` and automatically selected when `REFRAG_RUNTIME=glm`. - - -## How context_answer works (with decoder) - -The `context_answer` MCP tool answers natural-language questions using retrieval + a decoder sidecar. - -- Inputs (most relevant): `query`, `limit`, `per_path`, `budget_tokens`, `include_snippet`, `collection`, `language`, `path_glob/not_glob` -- Outputs: - - `answer` (string) - - `citations`: `[ { path, start_line, end_line, container_path? }, ... ]` - - `query`: list of query strings actually used - - `used`: `{ "gate_first": true|false, "refrag": true|false }` - -Pipeline -1) Hybrid search (gate-first): Uses MINI-vector gating when `REFRAG_GATE_FIRST=1` to prefilter candidates, then runs dense+lexical fusion -2) Micro-span budgeting: Merges adjacent micro hits and applies a global token budget (`REFRAG_MODE=1`, `MICRO_BUDGET_TOKENS`, `MICRO_OUT_MAX_SPANS`) -3) Prompt assembly: Builds compact context blocks and a “Sources” footer -4) Decoder call: When `REFRAG_DECODER=1`, calls the configured runtime (`REFRAG_RUNTIME=llamacpp` or `glm`) to synthesize the final answer -5) Return: Answer + citations + usage flags; errors keep citations for debugging - -Environment toggles -- Retrieval: `REFRAG_MODE=1`, `REFRAG_GATE_FIRST=1`, `REFRAG_CANDIDATES=200` -- Budgeting/output: `MICRO_BUDGET_TOKENS`, `MICRO_OUT_MAX_SPANS` -- Decoder: `REFRAG_DECODER=1`, `LLAMACPP_URL=http://localhost:8080` - -Fallbacks and safety -- If gate-first yields 0 items and no strict language filter is set, the tool automatically retries without gating -- If the decoder call fails, the response contains `{ "error": "..." }` plus `citations`, so you can still inspect sources - -Quick health + example -```bash -# Decoder health (llama.cpp sidecar) -curl -s http://localhost:8080/health - -# Qdrant -curl -sSf http://localhost:6333/readyz >/dev/null && echo "Qdrant OK" -``` - -```python -# Minimal local call (uses the running MCP indexer server code) -import os, asyncio -os.environ.update( - QDRANT_URL="http://localhost:6333", - COLLECTION_NAME="my-collection", - REFRAG_MODE="1", REFRAG_GATE_FIRST="1", - REFRAG_DECODER="1", LLAMACPP_URL="http://localhost:8080", -) -from scripts import mcp_indexer_server as srv -async def t(): - out = await srv.context_answer(query="How does hybrid search work?", limit=5) - print(out["used"], len(out.get("citations", [])), len(out.get("answer", ""))) -asyncio.run(t()) -``` - -Implementation -- See `scripts/mcp_indexer_server.py` (`context_answer` tool) for the full pipeline, env knobs, and debug flags (`DEBUG_CONTEXT_ANSWER=1`). - -### MCP search filtering (language, path, kind) - -- The indexer creates payload indexes for efficient filtering. -- When querying (via MCP client or scripts), you can filter by: - - `metadata.language` (e.g., python, typescript, javascript, go, rust) - - `metadata.path_prefix` (e.g., `/work/src`) - - `metadata.kind` (e.g., function, class, method) -- Example: in the provided reranker script you can do: - -```bash -make rerank ARGS="--language python --under /work/scripts" - -### Operational safeguards and troubleshooting - -- Tokenizer for micro-chunking: set TOKENIZER_JSON to a valid tokenizer.json path (default: models/tokenizer.json). If missing, the indexer falls back to line-based chunking. -- Cap micro-chunks per file: MAX_MICRO_CHUNKS_PER_FILE (default 2000) to prevent runaway chunk counts on very large files. -- Qdrant client timeout: QDRANT_TIMEOUT (seconds, default 20) applies to all MCP Qdrant calls. -- Memory auto-detect caching: MEMORY_AUTODETECT=1 by default with MEMORY_COLLECTION_TTL_SECS (default 300s) to avoid repeatedly sampling all collections. -- Schema repair: ensure_collection now repairs missing named vectors (lex, and mini when REFRAG_MODE=1) on existing collections. - -``` - -- Direct Qdrant filter example is shown below; most MCP clients allow passing tool args that map to server-side filters. If your client supports adding structured args to `qdrant-find`, prefer these filters to reduce noise. - - -### Payload indexes (created for you) - -We create payload indexes to accelerate filtered searches: -- `metadata.language` (keyword) -- `metadata.path_prefix` (keyword) -- `metadata.repo` (keyword) -- `metadata.kind` (keyword) -- `metadata.symbol` (keyword) -- `metadata.symbol_path` (keyword) -- `metadata.imports` (keyword) -- `metadata.calls` (keyword) -- `metadata.file_hash` (keyword) -- `metadata.ingested_at` (keyword) -- Git history fields available in payload: `commit_id`, `author_name`, `authored_date`, `message`, `files` - -Payload indexes enable fast server-side filters (e.g., language, path_prefix, kind, symbol). Prefer using the MCP tools repo_search/context_search with filter arguments rather than raw Qdrant REST/Python snippets. See the Qdrant documentation if you need low-level API examples. -### Best-practice querying - -- Use precise intent + language: “python chunking function for Qdrant indexing” -- Add path hints when you know the area: “under scripts or ingestion code” -- Try 2–3 alternative phrasings (multi-query) and pick the consensus -- Prefer results where `metadata.language` matches your target file -- For navigation, prefer results where `metadata.path_prefix` matches your directory - -Client tips: -- MCP tools: issue multiple finds with variant phrasings and re-rank by score + metadata match -- Direct Qdrant: use `vector={name: ..., vector: ...}` with the named vector above -- Data persists in the `qdrant_storage` Docker volume. -- The MCP server uses SSE transport and will auto-create the collection if it doesn't exist. -- Only FastEmbed models are supported at this time. - -## Troubleshooting - -### Collection Health & Cache Sync - -The stack includes automatic health checks that detect and fix cache/collection sync issues: - -**Check collection health:** -```bash -python scripts/collection_health.py --workspace . --collection codebase -``` - -**Auto-heal cache issues:** -```bash -python scripts/collection_health.py --workspace . --collection codebase --auto-heal -``` - -**What it detects:** -- Empty collection with cached files (cache thinks files are indexed but they're not) -- Significant mismatch between cached files and actual collection contents -- Missing metadata in collection points +> **See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for detailed system design.** -**When to use:** -- After manually deleting collections -- If searches return no results despite indexing -- After Qdrant crashes or data loss -- When switching between collection names +--- -**Automatic healing:** -- Health checks run automatically on watcher and indexer startup -- Cache is cleared when sync issues are detected -- Files are reindexed on next run +## License -### General Issues +MIT -- If the MCP servers can’t reach Qdrant, confirm both containers are up: `make ps`. -- If the SSE port collides, change `FASTMCP_PORT` in `.env` and the mapped port in `docker-compose.yml`. -- If you customize tool descriptions, restart: `make restart`. -- If searches return no results, check collection health (see above). diff --git a/deploy/kubernetes/README.md b/deploy/kubernetes/README.md index 3573917e..5adfeb8c 100644 --- a/deploy/kubernetes/README.md +++ b/deploy/kubernetes/README.md @@ -1,5 +1,9 @@ # Kubernetes Deployment Guide +**Documentation:** [README](../../README.md) · [Configuration](../../docs/CONFIGURATION.md) · [IDE Clients](../../docs/IDE_CLIENTS.md) · [MCP API](../../docs/MCP_API.md) · [ctx CLI](../../docs/CTX_CLI.md) · [Memory Guide](../../docs/MEMORY_GUIDE.md) · [Architecture](../../docs/ARCHITECTURE.md) · [Multi-Repo](../../docs/MULTI_REPO_COLLECTIONS.md) · Kubernetes · [VS Code Extension](../../docs/vscode-extension.md) · [Troubleshooting](../../docs/TROUBLESHOOTING.md) · [Development](../../docs/DEVELOPMENT.md) + +--- + ## Overview This directory contains Kubernetes manifests for deploying Context Engine on a remote cluster using **Kustomize**. This enables: diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 71172431..fa59d133 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -1,5 +1,18 @@ # Context Engine Architecture +**Documentation:** [README](../README.md) · [Configuration](CONFIGURATION.md) · [IDE Clients](IDE_CLIENTS.md) · [MCP API](MCP_API.md) · [ctx CLI](CTX_CLI.md) · [Memory Guide](MEMORY_GUIDE.md) · [Architecture](ARCHITECTURE.md) · [Multi-Repo](MULTI_REPO_COLLECTIONS.md) · [Kubernetes](../deploy/kubernetes/README.md) · [VS Code Extension](vscode-extension.md) · [Troubleshooting](TROUBLESHOOTING.md) · [Development](DEVELOPMENT.md) + +--- + +**On this page:** +- [Overview](#overview) +- [Core Principles](#core-principles) +- [System Architecture](#system-architecture) +- [Data Flow](#data-flow) +- [ReFRAG Pipeline](#refrag-pipeline) + +--- + ## Overview Context Engine is a production-ready MCP (Model Context Protocol) retrieval stack that unifies code indexing, hybrid search, and optional LLM decoding. It enables teams to ship context-aware AI agents by providing sophisticated semantic and lexical search capabilities with dual-transport compatibility. diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md new file mode 100644 index 00000000..dda96fdd --- /dev/null +++ b/docs/CONFIGURATION.md @@ -0,0 +1,161 @@ +# Configuration Reference + +Complete environment variable reference for Context Engine. + +**Documentation:** [README](../README.md) · [Configuration](CONFIGURATION.md) · [IDE Clients](IDE_CLIENTS.md) · [MCP API](MCP_API.md) · [ctx CLI](CTX_CLI.md) · [Memory Guide](MEMORY_GUIDE.md) · [Architecture](ARCHITECTURE.md) · [Multi-Repo](MULTI_REPO_COLLECTIONS.md) · [Kubernetes](../deploy/kubernetes/README.md) · [VS Code Extension](vscode-extension.md) · [Troubleshooting](TROUBLESHOOTING.md) · [Development](DEVELOPMENT.md) + +--- + +**On this page:** +- [Core Settings](#core-settings) +- [Indexing & Micro-Chunks](#indexing--micro-chunks) +- [Watcher Settings](#watcher-settings) +- [Reranker](#reranker) +- [Decoder (llama.cpp / GLM)](#decoder-llamacpp--glm) +- [ReFRAG](#refrag) +- [Ports](#ports) +- [Search & Expansion](#search--expansion) +- [Memory Blending](#memory-blending) + +--- + +## Core Settings + +| Name | Description | Default | +|------|-------------|---------| +| COLLECTION_NAME | Qdrant collection name (unified across all repos) | codebase | +| REPO_NAME | Logical repo tag stored in payload for filtering | auto-detect from git/folder | +| HOST_INDEX_PATH | Host path mounted at /work in containers | current repo (.) | +| QDRANT_URL | Qdrant base URL | container: http://qdrant:6333; local: http://localhost:6333 | + +## Indexing & Micro-Chunks + +| Name | Description | Default | +|------|-------------|---------| +| INDEX_MICRO_CHUNKS | Enable token-based micro-chunking | 0 (off) | +| MAX_MICRO_CHUNKS_PER_FILE | Cap micro-chunks per file | 200 | +| TOKENIZER_URL | HF tokenizer.json URL (for Make download) | n/a | +| TOKENIZER_PATH | Local path where tokenizer is saved (Make) | models/tokenizer.json | +| TOKENIZER_JSON | Runtime path for tokenizer (indexer) | models/tokenizer.json | +| USE_TREE_SITTER | Enable tree-sitter parsing (py/js/ts) | 0 (off) | +| INDEX_CHUNK_LINES | Lines per chunk (non-micro mode) | 120 | +| INDEX_CHUNK_OVERLAP | Overlap lines between chunks | 20 | +| INDEX_BATCH_SIZE | Upsert batch size | 64 | +| INDEX_PROGRESS_EVERY | Log progress every N files | 200 | + +## Watcher Settings + +| Name | Description | Default | +|------|-------------|---------| +| WATCH_DEBOUNCE_SECS | Debounce between FS events | 1.5 | +| INDEX_UPSERT_BATCH | Upsert batch size (watcher) | 128 | +| INDEX_UPSERT_RETRIES | Retry count | 5 | +| INDEX_UPSERT_BACKOFF | Seconds between retries | 0.5 | +| QDRANT_TIMEOUT | HTTP timeout seconds | watcher: 60; search: 20 | +| MCP_TOOL_TIMEOUT_SECS | Max duration for long-running MCP tools | 3600 | + +## Reranker + +| Name | Description | Default | +|------|-------------|---------| +| RERANKER_ONNX_PATH | Local ONNX cross-encoder model path | unset | +| RERANKER_TOKENIZER_PATH | Tokenizer path for reranker | unset | +| RERANKER_ENABLED | Enable reranker by default | 1 (enabled) | + +## Decoder (llama.cpp / GLM) + +| Name | Description | Default | +|------|-------------|---------| +| REFRAG_DECODER | Enable decoder for context_answer | 1 (enabled) | +| REFRAG_RUNTIME | Decoder backend: llamacpp or glm | llamacpp | +| LLAMACPP_URL | llama.cpp server endpoint | http://llamacpp:8080 or http://host.docker.internal:8081 | +| LLAMACPP_TIMEOUT_SEC | Decoder request timeout | 300 | +| DECODER_MAX_TOKENS | Max tokens for decoder responses | 4000 | +| REFRAG_DECODER_MODE | prompt or soft (soft requires patched llama.cpp) | prompt | +| GLM_API_KEY | API key for GLM provider | unset | +| GLM_MODEL | GLM model name | glm-4.6 | +| USE_GPU_DECODER | Native Metal decoder (1) vs Docker (0) | 0 (docker) | +| LLAMACPP_GPU_LAYERS | Number of layers to offload to GPU, -1 for all | 32 | + +## ReFRAG (Micro-Chunking & Retrieval) + +| Name | Description | Default | +|------|-------------|---------| +| REFRAG_MODE | Enable micro-chunking and span budgeting | 1 (enabled) | +| REFRAG_GATE_FIRST | Enable mini-vector gating | 1 (enabled) | +| REFRAG_CANDIDATES | Candidates for gate-first filtering | 200 | +| MICRO_BUDGET_TOKENS | Token budget for context_answer | 512 | +| MICRO_OUT_MAX_SPANS | Max spans returned per query | 3 | +| MICRO_CHUNK_TOKENS | Tokens per micro-chunk window | 16 | +| MICRO_CHUNK_STRIDE | Stride between windows | 8 | +| MICRO_MERGE_LINES | Lines to merge adjacent spans | 4 | +| MICRO_TOKENS_PER_LINE | Estimated tokens per line | 32 | + +## Ports + +| Name | Description | Default | +|------|-------------|---------| +| FASTMCP_PORT | Memory MCP server port (SSE) | 8000 | +| FASTMCP_INDEXER_PORT | Indexer MCP server port (SSE) | 8001 | +| FASTMCP_HTTP_PORT | Memory RMCP host port mapping | 8002 | +| FASTMCP_INDEXER_HTTP_PORT | Indexer RMCP host port mapping | 8003 | +| FASTMCP_HEALTH_PORT | Health port (memory/indexer) | memory: 18000; indexer: 18001 | + +## Search & Expansion + +| Name | Description | Default | +|------|-------------|---------| +| HYBRID_EXPAND | Enable heuristic multi-query expansion | 0 (off) | +| LLM_EXPAND_MAX | Max alternate queries via LLM | 0 | + +## Memory Blending + +| Name | Description | Default | +|------|-------------|---------| +| MEMORY_SSE_ENABLED | Enable SSE memory blending | false | +| MEMORY_MCP_URL | Memory MCP endpoint for blending | http://mcp:8000/sse | +| MEMORY_MCP_TIMEOUT | Timeout for memory queries | 6 | +| MEMORY_AUTODETECT | Auto-detect memory collection | 1 | +| MEMORY_COLLECTION_TTL_SECS | Cache TTL for collection detection | 300 | + +--- + +## Exclusions (.qdrantignore) + +The indexer supports a `.qdrantignore` file at the repo root (similar to `.gitignore`). + +**Default exclusions** (overridable): +- `/models`, `/node_modules`, `/dist`, `/build` +- `/.venv`, `/venv`, `/__pycache__`, `/.git` +- `*.onnx`, `*.bin`, `*.safetensors`, `tokenizer.json`, `*.whl`, `*.tar.gz` + +**Override via env or flags:** +```bash +# Disable defaults +QDRANT_DEFAULT_EXCLUDES=0 + +# Custom ignore file +QDRANT_IGNORE_FILE=.myignore + +# Additional excludes +QDRANT_EXCLUDES='tokenizer.json,*.onnx,/third_party' +``` + +**CLI examples:** +```bash +docker compose run --rm indexer --root /work --ignore-file .qdrantignore +docker compose run --rm indexer --root /work --no-default-excludes --exclude '/vendor' --exclude '*.bin' +``` + +--- + +## Scaling Recommendations + +| Repo Size | Chunk Lines | Overlap | Batch Size | +|-----------|------------|---------|------------| +| Small (<100 files) | 80-120 | 16-24 | 32-64 | +| Medium (100s-1k files) | 120-160 | ~20 | 64-128 | +| Large (1k+ files) | 120 (default) | 20 | 128+ | + +For large monorepos, set `INDEX_PROGRESS_EVERY=200` for visibility. + diff --git a/docs/CTX_CLI.md b/docs/CTX_CLI.md new file mode 100644 index 00000000..2a0f620c --- /dev/null +++ b/docs/CTX_CLI.md @@ -0,0 +1,166 @@ +# ctx.py - Prompt Enhancer CLI + +A thin CLI that retrieves code context and rewrites your input into a better, context-aware prompt using the local LLM decoder. Works with both questions and commands/instructions. + +**Documentation:** [README](../README.md) · [Configuration](CONFIGURATION.md) · [IDE Clients](IDE_CLIENTS.md) · [MCP API](MCP_API.md) · [ctx CLI](CTX_CLI.md) · [Memory Guide](MEMORY_GUIDE.md) · [Architecture](ARCHITECTURE.md) · [Multi-Repo](MULTI_REPO_COLLECTIONS.md) · [Kubernetes](../deploy/kubernetes/README.md) · [VS Code Extension](vscode-extension.md) · [Troubleshooting](TROUBLESHOOTING.md) · [Development](DEVELOPMENT.md) + +--- + +**On this page:** +- [Basic Usage](#basic-usage) +- [Detail Mode](#detail-mode) +- [Unicorn Mode](#unicorn-mode) +- [Advanced Features](#advanced-features) +- [GPU Acceleration](#gpu-acceleration) +- [Configuration](#configuration) + +--- + +## Basic Usage + +```bash +# Questions: Enhanced with specific details and multiple aspects +scripts/ctx.py "What is ReFRAG?" + +# Commands: Enhanced with concrete targets and implementation details +scripts/ctx.py "Refactor ctx.py" + +# Via Make target +make ctx Q="Explain the caching logic to me in detail" + +# Filter by language/path or adjust tokens +make ctx Q="Hybrid search details" ARGS="--language python --under scripts/ --limit 2 --rewrite-max-tokens 200" +``` + +## Detail Mode + +Include compact code snippets in the retrieved context for richer rewrites (trades speed for quality): + +```bash +# Enable detail mode (adds short snippets) +scripts/ctx.py "Explain the caching logic" --detail + +# Detail mode with commands +scripts/ctx.py "Add error handling to ctx.py" --detail + +# Adjust snippet size (default is 1 line when --detail is used) +make ctx Q="Explain hybrid search" ARGS="--detail --context-lines 2" +``` + +**Notes:** +- Default behavior is header-only (fastest). `--detail` adds short snippets. +- Detail mode is optimized for speed: automatically clamps to max 4 results and 1 result per file. + +## Unicorn Mode + +Use `--unicorn` for the highest quality prompt enhancement with a staged 2-3 pass approach: + +```bash +# Unicorn mode with commands +scripts/ctx.py "refactor ctx.py" --unicorn + +# Unicorn mode with questions +scripts/ctx.py "what is ReFRAG and how does it work?" --unicorn + +# Works with all filters +scripts/ctx.py "add error handling" --unicorn --language python +``` + +**How it works:** + +1. **Pass 1 (Draft)**: Retrieves rich code snippets (8 lines of context) to understand the codebase +2. **Pass 2 (Refine)**: Retrieves even richer snippets (12 lines) to ground the prompt with concrete code +3. **Pass 3 (Polish)**: Optional cleanup pass if output appears generic or incomplete + +**Key features:** +- **Code-grounded**: References actual code behaviors and patterns +- **No hallucinations**: Only uses real code from your indexed repository +- **Multi-paragraph output**: Produces detailed, comprehensive prompts +- **Works with both questions and commands** + +**When to use:** +- **Normal mode**: Quick, everyday prompts (fastest) +- **--detail**: Richer context without multi-pass overhead (balanced) +- **--unicorn**: When you need the absolute best prompt quality + +## Advanced Features + +### Streaming Output (Default) + +All modes stream tokens as they arrive for instant feedback: + +```bash +scripts/ctx.py "refactor ctx.py" --unicorn +``` + +To disable streaming, set `"streaming": false` in `~/.ctx_config.json` + +### Memory Blending + +Automatically falls back to `context_search` with memories when repo search returns no hits: + +```bash +# If no code matches, ctx.py will search design docs and ADRs +scripts/ctx.py "What is our authentication strategy?" +``` + +### Adaptive Context Sizing + +Automatically adjusts `limit` and `context_lines` based on query characteristics: +- **Short/vague queries** → More context for richer grounding +- **Queries with file/function names** → Lighter settings for speed + +### Automatic Quality Assurance + +Enhanced `_needs_polish()` heuristic triggers a third polish pass when: +- Output is too short (< 180 chars) +- Contains generic/vague language +- Missing concrete code references +- Lacks proper paragraph structure + +### Personalized Templates + +Create `~/.ctx_config.json` to customize behavior: + +```json +{ + "always_include_tests": true, + "prefer_bullet_commands": false, + "extra_instructions": "Always consider error handling and edge cases", + "streaming": true +} +``` + +**Available preferences:** +- `always_include_tests`: Add testing considerations to all prompts +- `prefer_bullet_commands`: Format commands as bullet points +- `extra_instructions`: Custom instructions added to every rewrite +- `streaming`: Enable/disable streaming output (default: true) + +See `ctx_config.example.json` for a template. + +## GPU Acceleration + +For faster prompt rewriting, use the native Metal-accelerated decoder: + +```bash +# Start the native llama.cpp server with Metal GPU +scripts/gpu_toggle.sh start + +# Now ctx.py will automatically use the GPU decoder on port 8081 +make ctx Q="Explain the caching logic" + +# Stop the native GPU server +scripts/gpu_toggle.sh stop +``` + +## Configuration + +| Setting | Description | Default | +|---------|-------------|---------| +| MCP_INDEXER_URL | Indexer HTTP RMCP endpoint | http://localhost:8003/mcp | +| USE_GPU_DECODER | Auto-detect GPU mode | 0 | +| LLAMACPP_URL | Docker decoder endpoint | http://localhost:8080 | + +GPU decoder (after `gpu_toggle.sh gpu`): http://localhost:8081/completion + diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md index 75c32172..9f44357f 100644 --- a/docs/DEVELOPMENT.md +++ b/docs/DEVELOPMENT.md @@ -2,6 +2,19 @@ This guide covers setting up a development environment, understanding the codebase structure, and contributing to Context Engine. +**Documentation:** [README](../README.md) · [Configuration](CONFIGURATION.md) · [IDE Clients](IDE_CLIENTS.md) · [MCP API](MCP_API.md) · [ctx CLI](CTX_CLI.md) · [Memory Guide](MEMORY_GUIDE.md) · [Architecture](ARCHITECTURE.md) · [Multi-Repo](MULTI_REPO_COLLECTIONS.md) · [Kubernetes](../deploy/kubernetes/README.md) · [VS Code Extension](vscode-extension.md) · [Troubleshooting](TROUBLESHOOTING.md) · [Development](DEVELOPMENT.md) + +--- + +**On this page:** +- [Prerequisites](#prerequisites) +- [Quick Start](#quick-start) +- [Project Structure](#project-structure) +- [Testing](#testing) +- [Docker Development](#docker-development) + +--- + ## Prerequisites ### Required Software diff --git a/docs/IDE_CLIENTS.md b/docs/IDE_CLIENTS.md new file mode 100644 index 00000000..2988577f --- /dev/null +++ b/docs/IDE_CLIENTS.md @@ -0,0 +1,193 @@ +# IDE & Client Configuration + +Configuration examples for connecting various IDEs and MCP clients to Context Engine. + +**Documentation:** [README](../README.md) · [Configuration](CONFIGURATION.md) · [IDE Clients](IDE_CLIENTS.md) · [MCP API](MCP_API.md) · [ctx CLI](CTX_CLI.md) · [Memory Guide](MEMORY_GUIDE.md) · [Architecture](ARCHITECTURE.md) · [Multi-Repo](MULTI_REPO_COLLECTIONS.md) · [Kubernetes](../deploy/kubernetes/README.md) · [VS Code Extension](vscode-extension.md) · [Troubleshooting](TROUBLESHOOTING.md) · [Development](DEVELOPMENT.md) + +--- + +**On this page:** +- [Supported Clients](#supported-clients) +- [SSE Clients](#sse-clients-port-80008001) +- [RMCP Clients](#rmcp-clients-port-80028003) +- [Mixed Transport](#mixed-transport-examples) +- [Verification](#verification) + +--- + +## Supported Clients + +| Client | Transport | Notes | +|--------|-----------|-------| +| Roo | SSE/RMCP | Both SSE and RMCP connections | +| Cline | SSE/RMCP | Both SSE and RMCP connections | +| Windsurf | SSE/RMCP | Both SSE and RMCP connections | +| Zed | SSE | Uses mcp-remote bridge | +| Kiro | SSE | Uses mcp-remote bridge | +| Qodo | RMCP | Direct HTTP endpoints | +| OpenAI Codex | RMCP | TOML config | +| Augment | SSE | Simple JSON configs | +| AmpCode | SSE | Simple URL for SSE endpoints | +| Claude Code CLI | SSE | Simple JSON configs | + +--- + +## SSE Clients (port 8000/8001) + +### Roo / Cline / Windsurf + +```json +{ + "mcpServers": { + "memory": { "type": "sse", "url": "http://localhost:8000/sse", "disabled": false }, + "qdrant-indexer": { "type": "sse", "url": "http://localhost:8001/sse", "disabled": false } + } +} +``` + +### Augment + +```json +{ + "mcpServers": { + "memory": { "type": "sse", "url": "http://localhost:8000/sse", "disabled": false }, + "qdrant-indexer": { "type": "sse", "url": "http://localhost:8001/sse", "disabled": false } + } +} +``` + +### Kiro + +Create `.kiro/settings/mcp.json` in your workspace: + +```json +{ + "mcpServers": { + "qdrant-indexer": { "command": "npx", "args": ["mcp-remote", "http://localhost:8001/sse", "--transport", "sse-only"] }, + "memory": { "command": "npx", "args": ["mcp-remote", "http://localhost:8000/sse", "--transport", "sse-only"] } + } +} +``` + +**Notes:** +- Kiro expects command/args (stdio). `mcp-remote` bridges to remote SSE endpoints. +- If `npx` prompts in your environment, add `-y` right after `npx`. +- Workspace config overrides user-level config (`~/.kiro/settings/mcp.json`). + +**Troubleshooting:** +- Error: "Enabled MCP Server must specify a command, ignoring." → Use the command/args form; do not use type:url in Kiro. + +### Zed + +Add to your Zed `settings.json` (Command Palette → "Settings: Open Settings (JSON)"): + +```json +{ + "qdrant-indexer": { + "command": "npx", + "args": ["mcp-remote", "http://localhost:8001/sse", "--transport", "sse-only"], + "env": {} + } +} +``` + +**Notes:** +- Zed expects MCP servers at the root level of settings.json +- Uses command/args (stdio). mcp-remote bridges to remote SSE endpoints +- If npx prompts, add `-y` right after npx: `"args": ["-y", "mcp-remote", ...]` + +**Alternative (direct HTTP):** +```json +{ + "qdrant-indexer": { + "type": "http", + "url": "http://localhost:8001/sse" + } +} +``` + +--- + +## RMCP Clients (port 8002/8003) + +### Qodo + +Add each MCP tool separately through the UI: + +**Tool 1 - memory:** +```json +{ + "memory": { "url": "http://localhost:8002/mcp" } +} +``` + +**Tool 2 - qdrant-indexer:** +```json +{ + "qdrant-indexer": { "url": "http://localhost:8003/mcp" } +} +``` + +**Note:** Qodo can talk to RMCP endpoints directly, no `mcp-remote` wrapper needed. + +### OpenAI Codex + +TOML configuration: + +```toml +experimental_use_rmcp_client = true + +[mcp_servers.memory_http] +url = "http://127.0.0.1:8002/mcp" + +[mcp_servers.qdrant_indexer_http] +url = "http://127.0.0.1:8003/mcp" +``` + +--- + +## Mixed Transport (stdio + SSE) + +### Windsurf/Cursor + +```json +{ + "mcpServers": { + "qdrant": { + "command": "uvx", + "args": ["mcp-server-qdrant"], + "env": { + "QDRANT_URL": "http://localhost:6333", + "COLLECTION_NAME": "my-collection", + "EMBEDDING_MODEL": "BAAI/bge-base-en-v1.5" + }, + "disabled": false + } + } +} +``` + +--- + +## Important Notes for IDE Agents + +- **Do not send null values** to MCP tools. Omit the field or pass an empty string "" instead. +- **qdrant-index examples:** + - `{"subdir":"","recreate":false,"collection":"my-collection","repo_name":"workspace"}` + - `{"subdir":"scripts","recreate":true}` +- For indexing repo root with no params, use `qdrant_index_root` (zero-arg) or call `qdrant-index` with `subdir:""`. + +--- + +## Verification + +After configuring, you should see tools from both servers: +- `store`, `find` (Memory) +- `repo_search`, `code_search`, `context_search`, `context_answer` (Indexer) +- `qdrant_list`, `qdrant_index`, `qdrant_prune`, `qdrant_status` (Indexer) + +Test connectivity: +- Call `qdrant_list` to confirm Qdrant connectivity +- Call `qdrant_index` with `{ "subdir": "scripts", "recreate": true }` to test indexing +- Call `context_search` with `{ "include_memories": true }` to test memory blending + diff --git a/docs/MCP_API.md b/docs/MCP_API.md index 490c3dfc..73b1b8ef 100644 --- a/docs/MCP_API.md +++ b/docs/MCP_API.md @@ -2,6 +2,19 @@ This document provides comprehensive API documentation for all MCP (Model Context Protocol) tools exposed by Context Engine's dual-server architecture. +**Documentation:** [README](../README.md) · [Configuration](CONFIGURATION.md) · [IDE Clients](IDE_CLIENTS.md) · [MCP API](MCP_API.md) · [ctx CLI](CTX_CLI.md) · [Memory Guide](MEMORY_GUIDE.md) · [Architecture](ARCHITECTURE.md) · [Multi-Repo](MULTI_REPO_COLLECTIONS.md) · [Kubernetes](../deploy/kubernetes/README.md) · [VS Code Extension](vscode-extension.md) · [Troubleshooting](TROUBLESHOOTING.md) · [Development](DEVELOPMENT.md) + +--- + +**On this page:** +- [Overview](#overview) +- [Memory Server API](#memory-server-api) - `store()`, `find()` +- [Indexer Server API](#indexer-server-api) - `repo_search()`, `context_search()`, `context_answer()`, etc. +- [Response Schemas](#response-schemas) +- [Error Handling](#error-handling) + +--- + ## Overview Context Engine exposes two MCP servers: @@ -504,6 +517,110 @@ Generate alternative query variations using local LLM (requires decoder enabled) } ``` +### code_search() + +Exact alias of `repo_search()` for discoverability. Same parameters and return shape. + +### qdrant_index_root() + +Index the entire workspace root (`/work`). + +**Parameters:** +- `recreate` (bool, default false): Drop and recreate collection before indexing +- `collection` (str, optional): Target collection name + +**Returns:** Subprocess result with indexing status. + +### search_tests_for() + +Find test files related to a query. Presets common test file globs. + +**Parameters:** +- `query` (str or list[str], required): Search query +- `limit` (int, optional): Max results +- `include_snippet` (bool, optional): Include code snippets +- `language` (str, optional): Filter by language + +**Returns:** Same shape as `repo_search()`. + +### search_config_for() + +Find configuration files related to a query. Presets config file globs (yaml/json/toml/etc). + +**Parameters:** Same as `search_tests_for()`. + +**Returns:** Same shape as `repo_search()`. + +### search_callers_for() + +Heuristic search for callers/usages of a symbol. + +**Parameters:** +- `query` (str, required): Symbol name to find callers for +- `limit` (int, optional): Max results +- `language` (str, optional): Filter by language + +**Returns:** Same shape as `repo_search()`. + +### search_importers_for() + +Find files likely importing or referencing a module/symbol. + +**Parameters:** Same as `search_callers_for()`. + +**Returns:** Same shape as `repo_search()`. + +### change_history_for_path() + +Summarize recent change metadata for a file path from the index. + +**Parameters:** +- `path` (str, required): Relative path under /work +- `collection` (str, optional): Target collection +- `max_points` (int, optional): Cap on scanned points + +**Returns:** +```json +{ + "ok": true, + "summary": { + "path": "scripts/ctx.py", + "last_modified": "2025-01-15T14:22:00" + } +} +``` + +### collection_map() + +Return collection↔repo mappings with optional Qdrant payload samples. + +**Parameters:** +- `search_root` (str, optional): Directory to scan +- `collection` (str, optional): Filter by collection +- `repo_name` (str, optional): Filter by repo +- `include_samples` (bool, optional): Include payload samples +- `limit` (int, optional): Max entries + +**Returns:** Mapping of collections to repositories. + +### set_session_defaults() (Indexer) + +Set default collection for subsequent calls on the same session. + +**Parameters:** +- `collection` (str, optional): Default collection name +- `session` (str, optional): Session token for cross-connection reuse + +**Returns:** +```json +{ + "ok": true, + "session": "abc123", + "defaults": {"collection": "codebase"}, + "applied": "connection" +} +``` + ## Error Handling All API methods follow consistent error handling patterns: diff --git a/docs/MEMORY_GUIDE.md b/docs/MEMORY_GUIDE.md new file mode 100644 index 00000000..c0904335 --- /dev/null +++ b/docs/MEMORY_GUIDE.md @@ -0,0 +1,171 @@ +# Memory Usage Guide + +Best practices for using Context Engine's memory system effectively. + +**Documentation:** [README](../README.md) · [Configuration](CONFIGURATION.md) · [IDE Clients](IDE_CLIENTS.md) · [MCP API](MCP_API.md) · [ctx CLI](CTX_CLI.md) · [Memory Guide](MEMORY_GUIDE.md) · [Architecture](ARCHITECTURE.md) · [Multi-Repo](MULTI_REPO_COLLECTIONS.md) · [Kubernetes](../deploy/kubernetes/README.md) · [VS Code Extension](vscode-extension.md) · [Troubleshooting](TROUBLESHOOTING.md) · [Development](DEVELOPMENT.md) + +--- + +**On this page:** +- [When to Use Memories vs Code Search](#when-to-use-memories-vs-code-search) +- [Recommended Metadata Schema](#recommended-metadata-schema) +- [Example Operations](#example-operations) +- [Memory Blending](#memory-blending) +- [Collection Naming](#collection-naming) + +--- + +## When to Use Memories vs Code Search + +| Use Memories For | Use Code Search For | +|------------------|---------------------| +| Conventions, runbooks, decisions | APIs, functions, classes | +| Links, known issues, FAQs | Configuration files | +| "How we do X here" notes | Cross-file relationships | +| Team wiki-style content | Anything you'd grep for | + +**Blend both** for tasks like "how to run E2E tests" where instructions (memory) reference scripts in the repo (code). + +--- + +## Recommended Metadata Schema + +Memory entries are stored as points in Qdrant with a consistent payload: + +| Key | Type | Description | +|-----|------|-------------| +| `kind` | string | **Required.** Always "memory" to enable filtering/blending | +| `topic` | string | Short category (e.g., "dev-env", "release-process") | +| `tags` | list[str] | Searchable tags (e.g., ["qdrant", "indexing", "prod"]) | +| `source` | string | Origin (e.g., "chat", "manual", "tool", "issue-123") | +| `author` | string | Who added it (username or email) | +| `created_at` | string | ISO8601 timestamp (UTC) | +| `expires_at` | string | ISO8601 timestamp if memory should be pruned later | +| `repo` | string | Optional repo identifier for shared instances | +| `link` | string | Optional URL to docs, tickets, or dashboards | +| `priority` | float | 0.0-1.0 weight for ranking when blending | + +**Tips:** +- Keep values small (short strings, small lists) +- Put details in the `information` text, not payload +- Use lowercase snake_case keys +- For secrets/PII: store references or vault paths, never plaintext + +--- + +## Example Operations + +### Store a Memory + +Via MCP Memory server tool `store`: + +```json +{ + "information": "Run full reset: INDEX_MICRO_CHUNKS=1 MAX_MICRO_CHUNKS_PER_FILE=200 make reset-dev", + "metadata": { + "kind": "memory", + "topic": "dev-env", + "tags": ["make", "reset"], + "source": "chat" + } +} +``` + +### Find Memories + +Via MCP Memory server tool `find`: + +```json +{ + "query": "reset-dev", + "limit": 5 +} +``` + +### Blend Memories into Code Search + +Via Indexer MCP `context_search`: + +```json +{ + "query": "async file watcher", + "include_memories": true, + "limit": 5, + "include_snippet": true +} +``` + +--- + +## Query Tips + +- Use precise queries (2-5 tokens) +- Add synonyms if needed; the server supports multiple phrasings +- Combine `topic`/`tags` in your memory text to make them easier to find + +--- + +## Enable Memory Blending + +1. Ensure the Memory MCP is running on :8000 (default in compose) + +2. Enable SSE memory blending on the Indexer MCP by setting these env vars: + +```yaml +services: + mcp_indexer: + environment: + - MEMORY_SSE_ENABLED=true + - MEMORY_MCP_URL=http://mcp:8000/sse + - MEMORY_MCP_TIMEOUT=6 +``` + +3. Restart the indexer: + +```bash +docker compose up -d mcp_indexer +``` + +4. Validate with `context_search`: + +```json +{ + "query": "your test memory text", + "include_memories": true, + "limit": 5 +} +``` + +Expected: non-zero results with blended items; memory hits will have `metadata.kind = "memory"`. + +--- + +## Collection Naming Strategies + +Different hash lengths for different workspace types: + +**Local Workspaces:** `repo-name-8charhash` +- Example: `Anesidara-e8d0f5fc` +- Used by local indexer/watcher +- Assumes unique repo names within workspace + +**Remote Uploads:** `folder-name-16charhash-8charhash` +- Example: `testupload2-04e680d5939dd035-b8b8d4cc` +- Collision avoidance for duplicate folder names +- 16-char hash identifies workspace, 8-char hash identifies collection + +--- + +## Operational Notes + +- Collection name comes from `COLLECTION_NAME` (see .env) +- This stack defaults to a single collection for both code and memories +- Filtering uses `metadata.kind` to distinguish memory from code +- Consider pruning expired memories by filtering `expires_at < now` + +--- + +## Backup and Migration + +For production-grade backup/migration strategies, see the official Qdrant documentation for snapshots and export/import. For local development, rely on Docker volumes and reindexing when needed. + diff --git a/docs/MULTI_REPO_COLLECTIONS.md b/docs/MULTI_REPO_COLLECTIONS.md index e43a5d60..991d2cac 100644 --- a/docs/MULTI_REPO_COLLECTIONS.md +++ b/docs/MULTI_REPO_COLLECTIONS.md @@ -1,5 +1,18 @@ # Multi-Repository Collection Architecture +**Documentation:** [README](../README.md) · [Configuration](CONFIGURATION.md) · [IDE Clients](IDE_CLIENTS.md) · [MCP API](MCP_API.md) · [ctx CLI](CTX_CLI.md) · [Memory Guide](MEMORY_GUIDE.md) · [Architecture](ARCHITECTURE.md) · [Multi-Repo](MULTI_REPO_COLLECTIONS.md) · [Kubernetes](../deploy/kubernetes/README.md) · [VS Code Extension](vscode-extension.md) · [Troubleshooting](TROUBLESHOOTING.md) · [Development](DEVELOPMENT.md) + +--- + +**On this page:** +- [Overview](#overview) +- [Architecture Principles](#architecture-principles) +- [Indexing Multiple Repositories](#indexing-multiple-repositories) +- [Filtering by Repository](#filtering-by-repository) +- [Remote Deployment](#remote-deployment) + +--- + ## Overview Context Engine supports first-class multi-repository operation through a unified collection architecture. This enables: diff --git a/docs/TROUBLESHOOTING.md b/docs/TROUBLESHOOTING.md new file mode 100644 index 00000000..34913d72 --- /dev/null +++ b/docs/TROUBLESHOOTING.md @@ -0,0 +1,161 @@ +# Troubleshooting Guide + +Common issues and solutions for Context Engine. + +**Documentation:** [README](../README.md) · [Configuration](CONFIGURATION.md) · [IDE Clients](IDE_CLIENTS.md) · [MCP API](MCP_API.md) · [ctx CLI](CTX_CLI.md) · [Memory Guide](MEMORY_GUIDE.md) · [Architecture](ARCHITECTURE.md) · [Multi-Repo](MULTI_REPO_COLLECTIONS.md) · [Kubernetes](../deploy/kubernetes/README.md) · [VS Code Extension](vscode-extension.md) · [Troubleshooting](TROUBLESHOOTING.md) · [Development](DEVELOPMENT.md) + +--- + +**On this page:** +- [Collection Health & Cache Sync](#collection-health--cache-sync) +- [Common Issues](#common-issues) +- [Connectivity Issues](#connectivity-issues) +- [Endpoint Verification](#endpoint-verification) +- [Debug Logging](#debug-logging) + +--- + +## Collection Health & Cache Sync + +The stack includes automatic health checks that detect and fix cache/collection sync issues. + +### Check collection health +```bash +python scripts/collection_health.py --workspace . --collection codebase +``` + +### Auto-heal cache issues +```bash +python scripts/collection_health.py --workspace . --collection codebase --auto-heal +``` + +### What it detects +- Empty collection with cached files (cache thinks files are indexed but they're not) +- Significant mismatch between cached files and actual collection contents +- Missing metadata in collection points + +### When to use +- After manually deleting collections +- If searches return no results despite indexing +- After Qdrant crashes or data loss +- When switching between collection names + +### Automatic healing +- Health checks run automatically on watcher and indexer startup +- Cache is cleared when sync issues are detected +- Files are reindexed on next run + +--- + +## Common Issues + +### Tree-sitter not found or parser errors +Feature is optional. If you set `USE_TREE_SITTER=1` and see errors, unset it or install tree-sitter deps, then reindex. + +### Tokenizer missing for micro-chunks +Run `make tokenizer` or set `TOKENIZER_JSON` to a valid tokenizer.json. Otherwise, falls back to line-based chunking. + +### SSE "Invalid session ID" when POSTing /messages directly +Expected if you didn't initiate an SSE session first. Use an MCP client (e.g., mcp-remote) to handle the handshake. + +### llama.cpp platform warning on Apple Silicon +Prefer the native path (`scripts/gpu_toggle.sh gpu`). If you stick with Docker, add `platform: linux/amd64` to the service or ignore the warning during local dev. + +### Indexing feels stuck on very large files +Use `MAX_MICRO_CHUNKS_PER_FILE=200` during dev runs. + +### Watcher timeouts (-9) or Qdrant "ResponseHandlingException: timed out" +Set watcher-safe defaults to reduce payload size and add headroom during upserts: + +```ini +QDRANT_TIMEOUT=60 +MAX_MICRO_CHUNKS_PER_FILE=200 +INDEX_UPSERT_BATCH=128 +INDEX_UPSERT_RETRIES=5 +INDEX_UPSERT_BACKOFF=0.5 +WATCH_DEBOUNCE_SECS=1.5 +``` + +If issues persist, try lowering `INDEX_UPSERT_BATCH` to 96 or raising `QDRANT_TIMEOUT` to 90. + +--- + +## Connectivity Issues + +### MCP servers can't reach Qdrant +Confirm both containers are up: `make ps`. + +### SSE port collides +Change `FASTMCP_PORT` in `.env` and the mapped port in `docker-compose.yml`. + +### Searches return no results +Check collection health (see above). + +### Tool descriptions out of date +Restart: `make restart`. + +--- + +## Verify Endpoints + +```bash +# Qdrant DB +curl -sSf http://localhost:6333/readyz >/dev/null && echo "Qdrant OK" + +# Decoder (llama.cpp sidecar) +curl -s http://localhost:8080/health + +# SSE endpoints (Memory, Indexer) +curl -sI http://localhost:8000/sse | head -n1 +curl -sI http://localhost:8001/sse | head -n1 + +# RMCP endpoints (HTTP JSON-RPC) +curl -sI http://localhost:8002/mcp | head -n1 +curl -sI http://localhost:8003/mcp | head -n1 +``` + +--- + +## Expected HTTP Behaviors + +- **GET /mcp returns 400**: Normal - the RMCP endpoint is POST-only for JSON-RPC +- **SSE requires session handshake**: Raw POST /messages without it will error (expected) + +--- + +## Operational Safeguards + +| Setting | Purpose | Default | +|---------|---------|---------| +| TOKENIZER_JSON | Tokenizer for micro-chunking | models/tokenizer.json | +| MAX_MICRO_CHUNKS_PER_FILE | Prevent runaway chunk counts | 2000 | +| QDRANT_TIMEOUT | HTTP timeout for MCP Qdrant calls | 20s | +| MEMORY_AUTODETECT | Auto-detect memory collection | 1 | +| MEMORY_COLLECTION_TTL_SECS | Cache TTL for collection detection | 300s | + +**Schema repair:** `ensure_collection` now repairs missing named vectors (lex, mini when REFRAG_MODE=1) on existing collections. + +--- + +## Debug Logging + +Enable debug environment variables for detailed logging: + +```bash +export DEBUG_CONTEXT_ANSWER=1 +export HYBRID_DEBUG=1 +export CACHE_DEBUG=1 + +# Restart services +docker-compose restart +``` + +--- + +## Getting Help + +1. Check this troubleshooting guide +2. Review logs: `docker compose logs mcp_indexer` +3. Verify health: `make health` +4. Check Qdrant status: `make qdrant-status` + diff --git a/docs/vscode-extension.md b/docs/vscode-extension.md index ec86cc16..b83cf772 100644 --- a/docs/vscode-extension.md +++ b/docs/vscode-extension.md @@ -1,8 +1,29 @@ -Context Engine Uploader VS Code Extension -========================================= +# VS Code Extension -Build Prerequisites -------------------- +Context Engine Uploader extension for automatic workspace sync and Prompt+ integration. + +**Documentation:** [README](../README.md) · [Configuration](CONFIGURATION.md) · [IDE Clients](IDE_CLIENTS.md) · [MCP API](MCP_API.md) · [ctx CLI](CTX_CLI.md) · [Memory Guide](MEMORY_GUIDE.md) · [Architecture](ARCHITECTURE.md) · [Multi-Repo](MULTI_REPO_COLLECTIONS.md) · [Kubernetes](../deploy/kubernetes/README.md) · [VS Code Extension](vscode-extension.md) · [Troubleshooting](TROUBLESHOOTING.md) · [Development](DEVELOPMENT.md) + +--- + +**On this page:** +- [Features](#features) +- [Installation](#installation) +- [Configuration](#configuration) +- [Commands](#commands-and-lifecycle) + +--- + +## Features + +- **Auto-sync**: Force sync on startup + watch mode keeps your workspace indexed +- **Prompt+ button**: Status bar button to enhance selected text with unicorn mode +- **Output channel**: Real-time logs for force-sync and watch operations +- **GPU decoder support**: Configure llama.cpp, Ollama, or GLM as decoder backend + +## Installation + +### Build Prerequisites - Node.js 18+ and npm - Python 3 available on PATH for runtime testing - VS Code Extension Manager `vsce` (`npm install -g @vscode/vsce`) or run via `npx`