[Feature Request]: Persistent queue backend for semantic/embedding processing

### Feature Description

The semantic and embedding processing queues are currently backed by AGFS in-memory state. When the server restarts (or is killed due to stuck shutdown from in-flight jobs), all pending queue items are lost. This creates significant operational pain during bulk imports.

### Problem

When importing large batches of resources (e.g., 1,500+ session transcripts), the embedding queue processes relatively quickly, but the semantic queue (VLM-based L0/L1 abstract generation) takes hours at moderate concurrency. If the server needs to restart for any reason during this time — config change, OOM, stuck shutdown — all pending semantic jobs are silently lost.

**Specific issues encountered:**

1. **Config changes require restart**: Adjusting `vlm.max_concurrent` requires editing `ov.conf` and restarting the server. This kills all in-flight queue items. There is no hot-reload for config changes.

2. **Stuck shutdown on restart**: When `systemctl restart` is issued while semantic jobs are in-flight, the server hangs on shutdown waiting for jobs to complete. The only option is `SIGKILL`, which guarantees queue loss.

3. **No way to detect or re-queue missing work**: After restart, there is no command to identify resources that have embeddings but are missing L0/L1 abstracts, and no `reindex` or `reprocess` command to re-trigger semantic processing for existing resources.

4. **Workaround is destructive**: The only way to re-trigger semantic processing is to `ov rm -r` each affected resource and re-import it from the original source file. This works but is slow and requires maintaining a mapping of viking URIs back to source paths.

### Proposed Solutions

**Option A: Persistent queue backend**
Add a SQLite or file-based queue backend option (alongside the current in-memory AGFS queue). Pending items survive restarts and are automatically resumed.

**Option B: Startup recovery scan**  
On server start, scan for resources that have stored content but are missing `.abstract.md` / `.overview.md` files, and automatically re-enqueue them for semantic processing.

**Option C: Manual reindex command**  
Add `ov reindex [URI]` or `ov system reprocess` CLI command that identifies resources missing L0/L1 content and re-queues them.

**Option D: Hot config reload**  
Support `SIGHUP` or API endpoint to reload `ov.conf` without restarting, avoiding the need to restart during active queue processing.

### Environment

- OpenViking v0.2.6 (pip install)
- Rust CLI (cargo install)
- Server mode (openviking-server on port 1933, systemd user service)
- VLM via OpenAI-compatible proxy (LiteLLM → Kimi-K2.5)
- Embedding via OpenAI-compatible proxy (LiteLLM → Gemini Embedding 2, 3072d)
- ~1,500 resources imported in batch, ~700 semantic jobs lost on restart

### Additional Context

The AGFS transaction system (#431) provides journal + crash recovery for file operations, but the semantic/embedding processing queue does not benefit from this. The queue uses AGFS enqueue/dequeue primitives, but since AGFS local backend state is volatile, queue durability depends on the AGFS backend — which for most self-hosted deployments is in-memory.

This is likely a common issue for anyone doing bulk imports or running OpenViking as a long-lived service that may need occasional restarts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Persistent queue backend for semantic/embedding processing #613

Feature Description

Problem

Proposed Solutions

Environment

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request]: Persistent queue backend for semantic/embedding processing #613

Description

Feature Description

Problem

Proposed Solutions

Environment

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions