Add multi-collection support and remote delta upload tooling by voarsh2 · Pull Request #20 · Context-Engine-AI/Context-Engine

voarsh2 · 2025-11-15T06:57:40Z

TL;DR:

Summary

add multi-collection (per-repo) indexing alongside the existing single-collection default; expose sticky collection selection in the MCP search/memory tools
remote upload pipeline: core client (living in-repo) plus a standalone one-off script (can be ran anywhere outside of this repo) both stream file deltas to the upload service so a remote watcher can re-embed code from a LAN workstation
ship a lightweight memory backup/restore helper to avoid data loss when collections are wiped

Multi repo collection for separation (focused searching on specific codebases, less overwhelming for LLM's), single collection support maintained.
Search/memory tools can search all collections or a specific collection (and memory store in specified collection) - sticky collection (set session default) allows using a collection for queries without having to re-specify.

Remote upload client uploads code changes to remotely running stack (ideally LAN, not over internet), processed by watcher) - clone repo on your local env - run the remote upload script with path args, server address/port with watch mode to upload on file changes to the upload service -> watcher will see the file changes and re-embed.

Includes a mini memory backup/restore script - nice to have - code can be reindexed, memories will be lost if you clear a collection..... adding some safety if you use this feature a lot with no backups (aside from docker volume backup, Kubernetes scripted cronjob can make use of the script)

Stack assumes to be running in containers, like Kubernetes, with storage for repo code and metadata (RWX), MCP to connect via nodeport IP:port from your local thin /CLI tool/ide.

Ref #11

Add comprehensive Kubernetes deployment configuration for Context-Engine: - Complete service manifests converted from docker-compose - Persistent storage for Qdrant database - ConfigMaps with environment variables (local-first defaults) - NodePort services for external access - Optional Ingress configuration for domain-based access - Automated deployment and cleanup scripts - Makefile for development and management - Comprehensive documentation and troubleshooting guide Key features: - Maintains local development defaults - Optional remote hosting capabilities - Health checks and resource limits - Scalable MCP server deployments - Support for both SSE and HTTP transports - Optional Llama.cpp integration Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>

- Add missing QDRANT_URL to ConfigMap for proper service discovery - Fix healthcheck paths from /health to /readyz to match MCP server endpoints - Standardize QDRANT_URL environment variable references across all deployments - Update mcp-memory, mcp-indexer, mcp-http, and indexer-services manifests Resolves localhost fallback issues in Kubernetes deployment where services were defaulting to localhost:6333 instead of using proper service names. Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>

Add 4 missing environment variables from docker-compose.yml to Kubernetes ConfigMap: - QDRANT_API_KEY: For Qdrant Cloud/remote authentication (optional) - REPO_NAME: Repository name for payload tracking - FASTMCP_SERVER_NAME: MCP server identifier - HOST_INDEX_PATH: Work directory mounting path This ensures full compatibility between docker-compose and Kubernetes deployments, allowing all services to reference the same environment variables regardless of deployment method. Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>

…025-0052 Resolves merge conflict in configmap.yaml by combining: - QDRANT_URL configuration for proper service discovery - Additional environment variables for full compatibility Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>

…vice-specific images - Add comprehensive build-images.sh script with registry support - Update all deployment manifests to use service-specific image names - Replace hardcoded context-engine:latest with proper image names - Add image override generation for Kubernetes deployment - Support separate images for better maintainability and scaling Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>

…to kubernetes

- Replace hardcoded 'fast-ssd' storageClassName with commented configuration - QDRANT StatefulSet will now use cluster's default storage class - Users can uncomment and specify custom storage class if needed - Ensures better compatibility across different Kubernetes clusters Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>

Implements comprehensive Git-based source code synchronization to solve the critical issue of source code distribution in remote Kubernetes deployments. ### Key Features: - Git sync sidecar containers for automatic source code synchronization - Flexible deployment modes: local (hostPath) vs Git-based - Support for public and private Git repositories - SSH and HTTPS authentication methods - Automated deployment script with mode selection - Comprehensive documentation and setup guides ### Files Added: - deploy/kubernetes/deploy-with-source.sh - Smart deployment script - deploy/kubernetes/mcp-indexer-git.yaml - Git-enabled indexer deployment - deploy/kubernetes/mcp-memory-git.yaml - Git-enabled memory server deployment - deploy/kubernetes/GIT_SYNC_SETUP.md - Comprehensive setup documentation ### Files Modified: - deploy/kubernetes/configmap.yaml - Added Git configuration variables - deploy/kubernetes/README.md - Updated with Git sync documentation ### Configuration Variables Added: - SOURCE_CODE_MODE: Switch between 'local' and 'git' modes - GIT_REPO_URL: Git repository URL for synchronization - GIT_BRANCH: Git branch to checkout - GIT_SYNC_PERIOD: Synchronization frequency - GIT_USERNAME/GIT_PASSWORD: HTTPS authentication - GIT_SSH_KEY: SSH authentication configuration This solution enables production-ready Kubernetes deployments with automatic source code management, eliminating the need for manual code distribution across cluster nodes while maintaining compatibility with existing local development workflows. Resolves the critical remote source code access issue identified in issue #1. Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>

Combine environment variable configuration from kubernetes branch with Git sync functionality from claude/issue-1-20251026-0047: - QDRANT_URL and complete environment variable coverage - Source code mode configuration (local/git) - Git repository settings for remote source code access - Authentication support for private repositories Resolves merge conflict by integrating both configuration sets. Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>

Add complete delta upload system enabling real-time code synchronization across distributed environments. The system includes: - **Upload Service**: FastAPI-based HTTP service for receiving and processing delta bundles with integration to existing indexing pipeline - **Remote Upload Client**: Python client for creating delta bundles, detecting file changes (create/update/delete/move), and uploading with retry logic and sequence tracking - **Enhanced Watch System**: Extended watch_index.py to support both local and remote modes with automatic fallback - **Development Environment**: Complete docker-compose.dev-remote.yml setup simulating Kubernetes CephFS RWX behavior with shared volumes - **Kubernetes Deployment**: Production-ready manifests with persistent volumes, health checks, and proper resource limits - **Comprehensive Documentation**: Architecture docs, design specifications, setup guides, and usage documentation - **Build Tooling**: Development setup script and Make targets for remote upload workflows The delta upload system uses efficient tarball bundles with JSON metadata to transmit only changed files, supporting move detection, hash-based change tracking, and robust error handling with exponential backoff retries.

- Simulates Kubernetes-hosted environment locally - Enables per-collection repositories and search - Maintains backward compatibility via env var - Supports both single and multi-collection modes - Adds memory search capabilities per collection

…orkspaces - Add collection_map MCP tool to enumerate collection↔repo mappings with optional Qdrant payload samples - Implement origin metadata persistence in workspace_state.py for remote source tracking - Enhance remote upload client with mapping summary and --show-mapping option - Add source_path parameter to upload service for complete origin tracking - Simplify watch_index.py by removing remote mode complexity and focusing on local indexing - Update workspace state functions to support collection mappings enumeration These changes provide comprehensive visibility into collection mappings across local and remote workspaces, enabling better tracking and management of distributed indexing operations.

Add continuous file monitoring capability with --watch flag that automatically detects changes and uploads delta bundles at configurable intervals. Also introduce standalone_upload_client.py as a self-contained version that includes embedded dependencies, allowing delta uploads without requiring the full repository.

Streamline upload client implementations by: - Removing complex jitter calculations in favor of simple exponential backoff - Consolidating error response formatting and dictionary structures - Simplifying exception handling across upload and status check methods - Reducing code verbosity while maintaining identical functionality - Making error messages more concise and consistent

Add comprehensive utilities for backing up and restoring memories (non-code points) from Qdrant collections. The backup utility exports user-added notes and context to JSON with optional vector embeddings, while the restore utility can import these backups to existing or new collections with support for re-embedding when vectors are not included in the backup. Both tools provide batch processing, CLI interfaces, and robust error handling for production use.

Add documentation for new collection mapping features and detailed explanation of collection naming strategies for local workspaces versus remote uploads. Includes information about collision avoidance and hash lengths used for different workspace types.

- remove REMOTE_UPLOAD_ENABLED guard from standalone_upload_client - do the same for remote_upload_client so both run without extra env setup

Add multi-collection support and remote delta upload tooling

voarsh2 and others added 20 commits October 25, 2025 01:06

Add Claude Code GitHub Actions workflow

621c4fc

Merge remote-tracking branch 'origin/claude/issue-1-20251025-0052' in…

9de8c4a

…to kubernetes

Merge remote-tracking branch 'github/test'

79b89eb

chore: drop remote upload env gate

5888b2e

- remove REMOTE_UPLOAD_ENABLED guard from standalone_upload_client - do the same for remote_upload_client so both run without extra env setup

Remove claude workflow for upstream

b45b582

voarsh2 requested a review from m1rl0k November 15, 2025 06:57

voarsh2 self-assigned this Nov 15, 2025

voarsh2 mentioned this pull request Nov 15, 2025

Proposal: Remote deployment + multi-repo collections (watcher/indexer + MCP) compatibility #11

Closed

voarsh2 changed the title ~~Multi repo support (multi collection)~~ Add multi-collection support and remote delta upload tooling Nov 15, 2025

m1rl0k approved these changes Nov 15, 2025

View reviewed changes

m1rl0k marked this pull request as ready for review November 15, 2025 13:44

m1rl0k merged commit 619e4f8 into Context-Engine-AI:test Nov 15, 2025
1 check passed

voarsh2 mentioned this pull request Nov 17, 2025

Question #19

Closed

voarsh2 deleted the multi-repo-support-collections-11 branch December 10, 2025 04:09

m1rl0k added a commit that referenced this pull request Mar 1, 2026

Merge pull request #20 from voarsh2/multi-repo-support-collections-11

d6a2200

Add multi-collection support and remote delta upload tooling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-collection support and remote delta upload tooling#20

Add multi-collection support and remote delta upload tooling#20
m1rl0k merged 20 commits intoContext-Engine-AI:testfrom
voarsh2:multi-repo-support-collections-11

voarsh2 commented Nov 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

voarsh2 commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

voarsh2 commented Nov 15, 2025 •

edited

Loading