Skip to content

Add multi-collection support and remote delta upload tooling#20

Merged
m1rl0k merged 20 commits intoContext-Engine-AI:testfrom
voarsh2:multi-repo-support-collections-11
Nov 15, 2025
Merged

Add multi-collection support and remote delta upload tooling#20
m1rl0k merged 20 commits intoContext-Engine-AI:testfrom
voarsh2:multi-repo-support-collections-11

Conversation

@voarsh2
Copy link
Contributor

@voarsh2 voarsh2 commented Nov 15, 2025

TL;DR:

Summary

  • add multi-collection (per-repo) indexing alongside the existing single-collection default; expose sticky collection selection in the MCP search/memory tools
  • remote upload pipeline: core client (living in-repo) plus a standalone one-off script (can be ran anywhere outside of this repo) both stream file deltas to the upload service so a remote watcher can re-embed code from a LAN workstation
  • ship a lightweight memory backup/restore helper to avoid data loss when collections are wiped

Multi repo collection for separation (focused searching on specific codebases, less overwhelming for LLM's), single collection support maintained.
Search/memory tools can search all collections or a specific collection (and memory store in specified collection) - sticky collection (set session default) allows using a collection for queries without having to re-specify.

Remote upload client uploads code changes to remotely running stack (ideally LAN, not over internet), processed by watcher) - clone repo on your local env - run the remote upload script with path args, server address/port with watch mode to upload on file changes to the upload service -> watcher will see the file changes and re-embed.

Includes a mini memory backup/restore script - nice to have - code can be reindexed, memories will be lost if you clear a collection..... adding some safety if you use this feature a lot with no backups (aside from docker volume backup, Kubernetes scripted cronjob can make use of the script)

Stack assumes to be running in containers, like Kubernetes, with storage for repo code and metadata (RWX), MCP to connect via nodeport IP:port from your local thin /CLI tool/ide.

Ref #11

voarsh2 and others added 20 commits October 25, 2025 01:06
Add comprehensive Kubernetes deployment configuration for Context-Engine:
- Complete service manifests converted from docker-compose
- Persistent storage for Qdrant database
- ConfigMaps with environment variables (local-first defaults)
- NodePort services for external access
- Optional Ingress configuration for domain-based access
- Automated deployment and cleanup scripts
- Makefile for development and management
- Comprehensive documentation and troubleshooting guide

Key features:
- Maintains local development defaults
- Optional remote hosting capabilities
- Health checks and resource limits
- Scalable MCP server deployments
- Support for both SSE and HTTP transports
- Optional Llama.cpp integration

Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
- Add missing QDRANT_URL to ConfigMap for proper service discovery
- Fix healthcheck paths from /health to /readyz to match MCP server endpoints
- Standardize QDRANT_URL environment variable references across all deployments
- Update mcp-memory, mcp-indexer, mcp-http, and indexer-services manifests

Resolves localhost fallback issues in Kubernetes deployment where services
were defaulting to localhost:6333 instead of using proper service names.

Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
Add 4 missing environment variables from docker-compose.yml to Kubernetes ConfigMap:
- QDRANT_API_KEY: For Qdrant Cloud/remote authentication (optional)
- REPO_NAME: Repository name for payload tracking
- FASTMCP_SERVER_NAME: MCP server identifier
- HOST_INDEX_PATH: Work directory mounting path

This ensures full compatibility between docker-compose and Kubernetes deployments,
allowing all services to reference the same environment variables regardless of deployment method.

Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
…025-0052

Resolves merge conflict in configmap.yaml by combining:
- QDRANT_URL configuration for proper service discovery
- Additional environment variables for full compatibility

Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
…vice-specific images

- Add comprehensive build-images.sh script with registry support
- Update all deployment manifests to use service-specific image names
- Replace hardcoded context-engine:latest with proper image names
- Add image override generation for Kubernetes deployment
- Support separate images for better maintainability and scaling

Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
- Replace hardcoded 'fast-ssd' storageClassName with commented configuration
- QDRANT StatefulSet will now use cluster's default storage class
- Users can uncomment and specify custom storage class if needed
- Ensures better compatibility across different Kubernetes clusters

Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
Implements comprehensive Git-based source code synchronization to solve
the critical issue of source code distribution in remote Kubernetes deployments.

### Key Features:
- Git sync sidecar containers for automatic source code synchronization
- Flexible deployment modes: local (hostPath) vs Git-based
- Support for public and private Git repositories
- SSH and HTTPS authentication methods
- Automated deployment script with mode selection
- Comprehensive documentation and setup guides

### Files Added:
- deploy/kubernetes/deploy-with-source.sh - Smart deployment script
- deploy/kubernetes/mcp-indexer-git.yaml - Git-enabled indexer deployment
- deploy/kubernetes/mcp-memory-git.yaml - Git-enabled memory server deployment
- deploy/kubernetes/GIT_SYNC_SETUP.md - Comprehensive setup documentation

### Files Modified:
- deploy/kubernetes/configmap.yaml - Added Git configuration variables
- deploy/kubernetes/README.md - Updated with Git sync documentation

### Configuration Variables Added:
- SOURCE_CODE_MODE: Switch between 'local' and 'git' modes
- GIT_REPO_URL: Git repository URL for synchronization
- GIT_BRANCH: Git branch to checkout
- GIT_SYNC_PERIOD: Synchronization frequency
- GIT_USERNAME/GIT_PASSWORD: HTTPS authentication
- GIT_SSH_KEY: SSH authentication configuration

This solution enables production-ready Kubernetes deployments with automatic
source code management, eliminating the need for manual code distribution
across cluster nodes while maintaining compatibility with existing local
development workflows.

Resolves the critical remote source code access issue identified in issue #1.

Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
Combine environment variable configuration from kubernetes branch with
Git sync functionality from claude/issue-1-20251026-0047:

- QDRANT_URL and complete environment variable coverage
- Source code mode configuration (local/git)
- Git repository settings for remote source code access
- Authentication support for private repositories

Resolves merge conflict by integrating both configuration sets.

Co-authored-by: voarsh2 <voarsh2@users.noreply.github.com>
Add complete delta upload system enabling real-time code synchronization across distributed environments. The system includes:

- **Upload Service**: FastAPI-based HTTP service for receiving and processing delta bundles with integration to existing indexing pipeline
- **Remote Upload Client**: Python client for creating delta bundles, detecting file changes (create/update/delete/move), and uploading with retry logic and sequence tracking
- **Enhanced Watch System**: Extended watch_index.py to support both local and remote modes with automatic fallback
- **Development Environment**: Complete docker-compose.dev-remote.yml setup simulating Kubernetes CephFS RWX behavior with shared volumes
- **Kubernetes Deployment**: Production-ready manifests with persistent volumes, health checks, and proper resource limits
- **Comprehensive Documentation**: Architecture docs, design specifications, setup guides, and usage documentation
- **Build Tooling**: Development setup script and Make targets for remote upload workflows

The delta upload system uses efficient tarball bundles with JSON metadata to transmit only changed files, supporting move detection, hash-based change tracking, and robust error handling with exponential backoff retries.
- Simulates Kubernetes-hosted environment locally
- Enables per-collection repositories and search
- Maintains backward compatibility via env var
- Supports both single and multi-collection modes
- Adds memory search capabilities per collection
…orkspaces

- Add collection_map MCP tool to enumerate collection↔repo mappings with optional Qdrant payload samples
- Implement origin metadata persistence in workspace_state.py for remote source tracking
- Enhance remote upload client with mapping summary and --show-mapping option
- Add source_path parameter to upload service for complete origin tracking
- Simplify watch_index.py by removing remote mode complexity and focusing on local indexing
- Update workspace state functions to support collection mappings enumeration

These changes provide comprehensive visibility into collection mappings across local and remote workspaces, enabling better tracking and management of distributed indexing operations.
Add continuous file monitoring capability with --watch flag that automatically
detects changes and uploads delta bundles at configurable intervals. Also
introduce standalone_upload_client.py as a self-contained version that
includes embedded dependencies, allowing delta uploads without requiring the full repository.
Streamline upload client implementations by:
- Removing complex jitter calculations in favor of simple exponential backoff
- Consolidating error response formatting and dictionary structures
- Simplifying exception handling across upload and status check methods
- Reducing code verbosity while maintaining identical functionality
- Making error messages more concise and consistent
Add comprehensive utilities for backing up and restoring memories (non-code points)
from Qdrant collections. The backup utility exports user-added notes and context
to JSON with optional vector embeddings, while the restore utility can import
these backups to existing or new collections with support for re-embedding
when vectors are not included in the backup. Both tools provide batch processing,
CLI interfaces, and robust error handling for production use.
Add documentation for new collection mapping features and detailed
explanation of collection naming strategies for local workspaces versus
remote uploads. Includes information about collision avoidance and hash
lengths used for different workspace types.
- remove REMOTE_UPLOAD_ENABLED guard from standalone_upload_client
- do the same for remote_upload_client so both run without extra env setup
@voarsh2 voarsh2 requested a review from m1rl0k November 15, 2025 06:57
@voarsh2 voarsh2 self-assigned this Nov 15, 2025
@voarsh2 voarsh2 changed the title Multi repo support (multi collection) Add multi-collection support and remote delta upload tooling Nov 15, 2025
@m1rl0k m1rl0k marked this pull request as ready for review November 15, 2025 13:44
@m1rl0k m1rl0k merged commit 619e4f8 into Context-Engine-AI:test Nov 15, 2025
1 check passed
@voarsh2 voarsh2 mentioned this pull request Nov 17, 2025
@voarsh2 voarsh2 deleted the multi-repo-support-collections-11 branch December 10, 2025 04:09
m1rl0k added a commit that referenced this pull request Mar 1, 2026
Add multi-collection support and remote delta upload tooling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants