Add multi-collection support and remote delta upload tooling#20
Merged
m1rl0k merged 20 commits intoContext-Engine-AI:testfrom Nov 15, 2025
Merged
Conversation
Add comprehensive Kubernetes deployment configuration for Context-Engine: - Complete service manifests converted from docker-compose - Persistent storage for Qdrant database - ConfigMaps with environment variables (local-first defaults) - NodePort services for external access - Optional Ingress configuration for domain-based access - Automated deployment and cleanup scripts - Makefile for development and management - Comprehensive documentation and troubleshooting guide Key features: - Maintains local development defaults - Optional remote hosting capabilities - Health checks and resource limits - Scalable MCP server deployments - Support for both SSE and HTTP transports - Optional Llama.cpp integration Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
- Add missing QDRANT_URL to ConfigMap for proper service discovery - Fix healthcheck paths from /health to /readyz to match MCP server endpoints - Standardize QDRANT_URL environment variable references across all deployments - Update mcp-memory, mcp-indexer, mcp-http, and indexer-services manifests Resolves localhost fallback issues in Kubernetes deployment where services were defaulting to localhost:6333 instead of using proper service names. Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
Add 4 missing environment variables from docker-compose.yml to Kubernetes ConfigMap: - QDRANT_API_KEY: For Qdrant Cloud/remote authentication (optional) - REPO_NAME: Repository name for payload tracking - FASTMCP_SERVER_NAME: MCP server identifier - HOST_INDEX_PATH: Work directory mounting path This ensures full compatibility between docker-compose and Kubernetes deployments, allowing all services to reference the same environment variables regardless of deployment method. Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
…025-0052 Resolves merge conflict in configmap.yaml by combining: - QDRANT_URL configuration for proper service discovery - Additional environment variables for full compatibility Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
…vice-specific images - Add comprehensive build-images.sh script with registry support - Update all deployment manifests to use service-specific image names - Replace hardcoded context-engine:latest with proper image names - Add image override generation for Kubernetes deployment - Support separate images for better maintainability and scaling Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
- Replace hardcoded 'fast-ssd' storageClassName with commented configuration - QDRANT StatefulSet will now use cluster's default storage class - Users can uncomment and specify custom storage class if needed - Ensures better compatibility across different Kubernetes clusters Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
Implements comprehensive Git-based source code synchronization to solve the critical issue of source code distribution in remote Kubernetes deployments. ### Key Features: - Git sync sidecar containers for automatic source code synchronization - Flexible deployment modes: local (hostPath) vs Git-based - Support for public and private Git repositories - SSH and HTTPS authentication methods - Automated deployment script with mode selection - Comprehensive documentation and setup guides ### Files Added: - deploy/kubernetes/deploy-with-source.sh - Smart deployment script - deploy/kubernetes/mcp-indexer-git.yaml - Git-enabled indexer deployment - deploy/kubernetes/mcp-memory-git.yaml - Git-enabled memory server deployment - deploy/kubernetes/GIT_SYNC_SETUP.md - Comprehensive setup documentation ### Files Modified: - deploy/kubernetes/configmap.yaml - Added Git configuration variables - deploy/kubernetes/README.md - Updated with Git sync documentation ### Configuration Variables Added: - SOURCE_CODE_MODE: Switch between 'local' and 'git' modes - GIT_REPO_URL: Git repository URL for synchronization - GIT_BRANCH: Git branch to checkout - GIT_SYNC_PERIOD: Synchronization frequency - GIT_USERNAME/GIT_PASSWORD: HTTPS authentication - GIT_SSH_KEY: SSH authentication configuration This solution enables production-ready Kubernetes deployments with automatic source code management, eliminating the need for manual code distribution across cluster nodes while maintaining compatibility with existing local development workflows. Resolves the critical remote source code access issue identified in issue #1. Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
Combine environment variable configuration from kubernetes branch with Git sync functionality from claude/issue-1-20251026-0047: - QDRANT_URL and complete environment variable coverage - Source code mode configuration (local/git) - Git repository settings for remote source code access - Authentication support for private repositories Resolves merge conflict by integrating both configuration sets. Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
Add complete delta upload system enabling real-time code synchronization across distributed environments. The system includes: - **Upload Service**: FastAPI-based HTTP service for receiving and processing delta bundles with integration to existing indexing pipeline - **Remote Upload Client**: Python client for creating delta bundles, detecting file changes (create/update/delete/move), and uploading with retry logic and sequence tracking - **Enhanced Watch System**: Extended watch_index.py to support both local and remote modes with automatic fallback - **Development Environment**: Complete docker-compose.dev-remote.yml setup simulating Kubernetes CephFS RWX behavior with shared volumes - **Kubernetes Deployment**: Production-ready manifests with persistent volumes, health checks, and proper resource limits - **Comprehensive Documentation**: Architecture docs, design specifications, setup guides, and usage documentation - **Build Tooling**: Development setup script and Make targets for remote upload workflows The delta upload system uses efficient tarball bundles with JSON metadata to transmit only changed files, supporting move detection, hash-based change tracking, and robust error handling with exponential backoff retries.
- Simulates Kubernetes-hosted environment locally - Enables per-collection repositories and search - Maintains backward compatibility via env var - Supports both single and multi-collection modes - Adds memory search capabilities per collection
…orkspaces - Add collection_map MCP tool to enumerate collection↔repo mappings with optional Qdrant payload samples - Implement origin metadata persistence in workspace_state.py for remote source tracking - Enhance remote upload client with mapping summary and --show-mapping option - Add source_path parameter to upload service for complete origin tracking - Simplify watch_index.py by removing remote mode complexity and focusing on local indexing - Update workspace state functions to support collection mappings enumeration These changes provide comprehensive visibility into collection mappings across local and remote workspaces, enabling better tracking and management of distributed indexing operations.
Add continuous file monitoring capability with --watch flag that automatically detects changes and uploads delta bundles at configurable intervals. Also introduce standalone_upload_client.py as a self-contained version that includes embedded dependencies, allowing delta uploads without requiring the full repository.
Streamline upload client implementations by: - Removing complex jitter calculations in favor of simple exponential backoff - Consolidating error response formatting and dictionary structures - Simplifying exception handling across upload and status check methods - Reducing code verbosity while maintaining identical functionality - Making error messages more concise and consistent
Add comprehensive utilities for backing up and restoring memories (non-code points) from Qdrant collections. The backup utility exports user-added notes and context to JSON with optional vector embeddings, while the restore utility can import these backups to existing or new collections with support for re-embedding when vectors are not included in the backup. Both tools provide batch processing, CLI interfaces, and robust error handling for production use.
Add documentation for new collection mapping features and detailed explanation of collection naming strategies for local workspaces versus remote uploads. Includes information about collision avoidance and hash lengths used for different workspace types.
- remove REMOTE_UPLOAD_ENABLED guard from standalone_upload_client - do the same for remote_upload_client so both run without extra env setup
m1rl0k
approved these changes
Nov 15, 2025
Closed
m1rl0k
added a commit
that referenced
this pull request
Mar 1, 2026
Add multi-collection support and remote delta upload tooling
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR:
Summary
Multi repo collection for separation (focused searching on specific codebases, less overwhelming for LLM's), single collection support maintained.
Search/memory tools can search all collections or a specific collection (and memory store in specified collection) - sticky collection (set session default) allows using a collection for queries without having to re-specify.
Remote upload client uploads code changes to remotely running stack (ideally LAN, not over internet), processed by watcher) - clone repo on your local env - run the remote upload script with path args, server address/port with watch mode to upload on file changes to the upload service -> watcher will see the file changes and re-embed.
Includes a mini memory backup/restore script - nice to have - code can be reindexed, memories will be lost if you clear a collection..... adding some safety if you use this feature a lot with no backups (aside from docker volume backup, Kubernetes scripted cronjob can make use of the script)
Stack assumes to be running in containers, like Kubernetes, with storage for repo code and metadata (RWX), MCP to connect via nodeport IP:port from your local thin /CLI tool/ide.
Ref #11