feat: Add embedding server daemon with auto-unload#5
Merged
Conversation
Implements a persistent daemon that shares embedding models across multiple repository sessions, reducing memory footprint and improving processing time. Features: - Unix socket server with lazy model loading - SentenceTransformer-compatible client proxy - Automatic GPU/CPU detection - Graceful fallback (works without daemon) - CLI commands: embed start/stop/status Performance improvements (tested): - Memory: 50-80% savings with multiple repos (1.1GB vs 2.3GB for 2 repos) - Speed: 16-24x faster after warmup (4.9s → 0.3s) - Data isolation: Complete separation between repos verified Changes: - Add sia_code/embed_server/ package (protocol, daemon, client) - Modify usearch_backend.py to use client when available - Add 'embed' command group to CLI - Add psutil dependency for memory monitoring Tests: - Unit tests: Protocol, daemon lifecycle, client availability - Integration tests: 2 repos with speed and data separation verification - All tests passed (see TEST_RESULTS.md)
Implements automatic model unloading after idle timeout (default: 1 hour) to save memory while keeping daemon running for instant reload. Features: - Track last request time for each model - Background cleanup thread checks idle models every 10 minutes - Auto-unload models idle > timeout (default 3600s = 1 hour) - Models reload automatically on next request (2-3s) - Configurable timeout via --idle-timeout flag - Enhanced status command shows idle time per model Benefits: - Memory efficiency: 58 MB idle vs 1164 MB active - No manual management: daemon auto-manages itself - Transparent: models reload automatically when needed - Flexible: configurable timeout for different workflows CLI additions: - sia-code embed start --idle-timeout N (default: 3600) - sia-code embed status -v (shows idle times) Testing: - test_auto_unload.py: Verifies unload/reload cycle - Tested with 10s timeout: model unloads and reloads successfully - Initial load: 5.08s, cached: 0.01s, reload: 2.13s Documentation: - DAEMON_USAGE_GUIDE.md: Complete usage guide with examples
Comprehensive summary covering: - Answers to original questions (when to run, auto-unload) - Implementation details (2 commits) - Test results (all passing) - Performance metrics (50-97% memory savings, 20x speed) - Usage examples and best practices - CLI reference and architecture diagrams Ready for merge to main.
- Remove unused imports: time, timedelta, EmbedRequest, HealthRequest, Any - Fix f-string without placeholders in stop_daemon - All ruff checks now pass
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements a persistent daemon that shares embedding models across multiple repository sessions, reducing memory footprint and improving processing time. Includes automatic model unloading after idle timeout to optimize memory usage.
Features
Core Daemon (Commit 1: 40a67ce)
sia-code embed start/stop/statusAuto-Unload (Commit 2: 7ff3223)
--idle-timeoutflagPerformance Improvements
Memory Efficiency
Speed Improvement
Test Results
Unit Tests ✅
Integration Tests (2 Repos) ✅
Auto-Unload Test ✅
Usage
Start Daemon
```bash
Default: 1 hour idle timeout
sia-code embed start
Custom: 2 hours
sia-code embed start --idle-timeout 7200
Foreground (debugging)
sia-code embed start --foreground
```
Check Status
```bash
Basic status
sia-code embed status
Detailed (shows idle times)
sia-code embed status -v
```
Use in Multiple Repos
```bash
cd ~/project-1 && sia-code search "authentication"
cd ~/project-2 && sia-code search "http server"
cd ~/project-3 && sia-code search "database query"
All searches < 100ms after warmup! ⚡
```
Architecture
Model Sharing (Memory)
```
┌──────────────────────────┐
│ sia-embed daemon │
│ Model: 1164 MB (shared) │ ← ONE MODEL FOR ALL
└──────────┬───────────────┘
│
┌──────┼──────┐
▼ ▼ ▼
Repo A Repo B Repo C
(0 MB) (0 MB) (0 MB)
```
Data Separation (Storage)
```
Repo A: .sia-code/index.db (separate)
Repo B: .sia-code/index.db (separate)
Repo C: .sia-code/index.db (separate)
Daemon: Only computes embeddings (stateless)
```
Auto-Unload Cycle
```
Active → Requests → Model loaded (1164 MB)
Idle 1h → Auto-unload → 58 MB
Next request → Auto-reload (2-3s) → Fast again
```
Files Changed
Documentation
Breaking Changes
None - daemon is optional, all existing functionality works unchanged.
Migration Guide
No migration needed. To use the new features:
All tests passing ✅
Ready for merge 🚀