nyxCore-Systems · SimplyLiz · Apr 15, 2026 · Apr 13, 2026 · Apr 15, 2026 · Apr 15, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,47 @@ All notable changes to this project are documented here.
 
 ---
 
+## [2.1.0] — 2026-04-15
+
+### Added
+
+**v2.1 — Streaming context + forward-compat primitives**
+
+**Streaming**
+
+- **`StreamContext { file_uri, cursor_position, max_tokens, model? }`** — new streaming wire message. Daemon ranks symbols relevant to the cursor and emits one `SymbolInfo { symbol_info, relevance_score, token_cost }` frame at a time, terminating with exactly one `EndStream { reason, emitted, total_candidates, error? }` frame. Reasons: `budget_reached`, `exhausted`, `error`. Replaces the broken "fetch top-k, locally truncate to prompt budget" pattern with stream-until-full. Spec §9.2.
+- **Relevance ordering** (spec §2.3): direct symbol at cursor → callers (from blast-radius CPG walk) → callees / references → related types. Conservative token-cost estimate `ceil((len(signature) + len(documentation)) / 4) + 8` per symbol. Daemon does not buffer ahead of the socket; `BrokenPipe` from a closing client aborts the ranking walk cleanly. `StreamContext` is rejected from `Batch` / `BatchQuery`.
+- **`protocol_version` bumped from `1` → `2`** in `HandshakeResult`. Clients can detect streaming support via handshake.
+- **`lip stream-context <file_uri> <line:col> --max-tokens N [--model M]`** — new CLI subcommand prints frames as JSON for manual testing.
+
+**New primitives**
+
+- **`EmbedText { text, model? }`** — embed an arbitrary text string and return the raw vector. Closes the gap left by `EmbeddingBatch` (URI-only) and `QueryNearestByText` (embeds internally but discards the vector). Callers re-ranking with their own scoring (centroid arithmetic, federated nearest-neighbour, lexical-then-semantic re-rank) get the embedding directly instead of building a centroid out of nearest-neighbour seeds. Returns `EmbedTextResult { vector: Vec<f32>, embedding_model: String }`. Not permitted inside `BatchQuery` (requires async HTTP).
+- **`RegisterTier3Source { source: Tier3Source }`** + **`IndexStatusResult.tier3_sources`** — expose provenance for Tier 3 ingestion batches (SCIP imports). `Tier3Source { source_id, tool_name, tool_version, project_root, imported_at_ms }` records *what* producer generated the symbols and *when* the daemon accepted them. Re-registering the same `source_id` overwrites in place, refreshing `imported_at_ms`. The daemon deliberately does no staleness detection: stale Tier 3 symbols remain in the graph at their original confidence until the caller re-imports. Surfacing provenance lets clients decide when to warn a user that imported data has aged (e.g. `scip-rust imported 3 days ago`). `lip import --push-to-daemon` now sends this before streaming SCIP deltas, with `source_id = sha256(tool_name + ":" + project_root)`. `IndexStatusResult.tier3_sources` is `#[serde(default)]`; older daemons yield an empty vector. Ack'd with `DeltaAck`. Not permitted inside `BatchQuery` (mutation).
+- **`lip import --no-provenance`** — opt out of Tier 3 provenance registration for ephemeral or test imports that should not pollute a long-lived daemon's `tier3_sources` list. No effect on the default EventStream-JSON output path.
+
+**Forward-compat & capability discovery**
+
+- **`HandshakeResult.supported_messages: Vec<String>`** — handshake response now lists every `ClientMessage` `type` tag this daemon understands. Lets clients probe for an individual message (e.g. `stream_context`, `embed_text`) without writing "handshake then pray" code or comparing `protocol_version` integers. Field is `#[serde(default)]`; older daemons yield an empty vector, which clients should treat as "fall back to `protocol_version`."
+- **`ServerMessage::UnknownMessage { message_type, supported }`** — when a client sends a well-formed JSON envelope whose `type` tag is unknown, the daemon now replies with `UnknownMessage` (carrying the tag plus the same supported list as handshake) *and keeps the socket open*, instead of closing after a generic parse `Error`. Lets forward-compatible clients downgrade gracefully to a supported call instead of reconnecting.
+- **`ServerMessage::Error { message, code }`** — `code: ErrorCode` is a stable, machine-readable category. Clients branch on this instead of string-matching `message`. `#[serde(default)]`; older daemons deserialize as `ErrorCode::Internal`.
+- **`ErrorCode`** enum — small, stable set: `unknown_message_type`, `unknown_model`, `embedding_not_configured`, `no_embedding`, `cursor_out_of_range`, `index_locked`, `invalid_request`, `internal` (default). Adding a code is non-breaking; renaming or removing one is breaking.
+  - `embedding_not_configured` — daemon has no embedding service (`LIP_EMBEDDING_URL` unset).
+  - `no_embedding` — URI has no cached embedding yet; call `EmbeddingBatch` first.
+  - `unknown_model` — the embedding endpoint rejected the requested model. Emitted by the daemon when the HTTP backend returns 404 or a 4xx body matching `model_not_found` / `"unknown model"` / `"model … not found/invalid/unsupported"`. Transport, rate-limit, and auth errors stay on `internal` — retrying with the same model only makes sense after a real config change. Classification lives in `daemon/embedding.rs::classify_http_error`.
+  - `invalid_request` — request was well-formed on the wire but used incorrectly (e.g. nested `Batch`, or `StreamContext` inside a `Batch`). Distinct from `internal` so clients can avoid retry loops on caller-side mistakes.
+
+**Drift guard**
+
+- **`ClientMessage::variant_tag`** + `supported_messages_covers_all_variants` test — exhaustive-match helper plus paired test that fails compilation when a new `ClientMessage` variant is added without being advertised in `supported_messages()`. Prevents capability-list drift from silently shrinking the handshake surface.
+
+### Fixed
+
+- **`QueryExpansion` handler contract pinned by a db-level test.** The post-embedding ranking is now encapsulated in `LipDatabase::query_expansion_terms(query_vec, actual_model, top_k)`, which the handler calls in one line. A regression that drops the model filter would cause `query_expansion_terms_rejects_cross_model_scoring` (db.rs) to fail, closing the earlier gap where the fix shipped without a paired assertion.
+- **`QueryExpansion` now honors the caller's model pin.** Previously the handler embedded the query with the requested model but then ranked candidates across *all* stored symbol embeddings regardless of which model produced them — cross-model cosine scores are not meaningful, so the returned "expansion terms" were effectively noise whenever the index held mixed-model vectors. Handler now captures the actual model returned by `embed_texts` and passes it through a new `model_filter: Option<&str>` parameter on `LipDatabase::nearest_symbol_by_vector`, restricting candidates to symbols embedded with the same model. `SimilarSymbols` (which resolves from a URI's own cached embedding) keeps the old unfiltered behavior by passing `None`.
+
+---
+
 ## [2.0.1] — 2026-04-13
 
 ### Changed

diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -7,7 +7,7 @@ members  = [
 ]
 
 [workspace.package]
-version       = "2.0.1"
+version       = "2.1.0"
 edition       = "2021"
 rust-version  = "1.78"
 authors       = ["Lisa Welsch <lisa@nyxcore.cloud>"]

diff --git a/README.md b/README.md
@@ -379,7 +379,7 @@ Requires Rust 1.78+. No system `protoc` required.
 
 ## Status
 
-v2.0 — `ExplainMatch` (chunk-level explanation: which lines in a result file drove the match), model provenance (`FileStatus` exposes the embedding model per file; `IndexStatus` warns when the index contains mixed-model vectors). v1.9: `filter` glob + `min_score` on all NN calls, `GetCentroid`, `QueryStaleEmbeddings`. v1.8: `FindBoundaries`, `SemanticDiff`, `QueryNearestInStore` (cross-repo federation), `QueryNoveltyScore`, `ExtractTerminology`, `PruneDeleted`. v1.7: 6 semantic retrieval primitives. v1.6: `ReindexFiles`, `Similarity`, `QueryExpansion`, `Cluster`, `ExportEmbeddings`. Wire format is JSON.
+v2.1 — `StreamContext` (token-budgeted RAG context streaming): callers stream symbols ranked by relevance to a cursor and stop reading when the prompt budget is full instead of fetching top-k and locally truncating; `protocol_version` bumped to `2`. v2.0 — `ExplainMatch` (chunk-level explanation: which lines in a result file drove the match), model provenance (`FileStatus` exposes the embedding model per file; `IndexStatus` warns when the index contains mixed-model vectors). v1.9: `filter` glob + `min_score` on all NN calls, `GetCentroid`, `QueryStaleEmbeddings`. v1.8: `FindBoundaries`, `SemanticDiff`, `QueryNearestInStore` (cross-repo federation), `QueryNoveltyScore`, `ExtractTerminology`, `PruneDeleted`. v1.7: 6 semantic retrieval primitives. v1.6: `ReindexFiles`, `Similarity`, `QueryExpansion`, `Cluster`, `ExportEmbeddings`. Wire format is JSON.
 
 ---
 

diff --git a/bindings/rust/benches/framing.rs b/bindings/rust/benches/framing.rs
@@ -7,7 +7,7 @@ use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criteri
 use tokio::runtime::Runtime;
 
 use lip_core::daemon::session::{read_message, write_message};
-use lip_core::query_graph::ServerMessage;
+use lip_core::query_graph::{ErrorCode, ServerMessage};
 
 fn make_rt() -> Runtime {
     tokio::runtime::Builder::new_current_thread()
@@ -19,6 +19,7 @@ fn make_rt() -> Runtime {
 fn make_message(payload_bytes: usize) -> ServerMessage {
     ServerMessage::Error {
         message: "x".repeat(payload_bytes),
+        code: ErrorCode::Internal,
     }
 }
 

diff --git a/bindings/rust/src/daemon/embedding.rs b/bindings/rust/src/daemon/embedding.rs
@@ -10,9 +10,79 @@
 //! When `LIP_EMBEDDING_URL` is unset, [`EmbeddingClient::from_env`] returns `None`
 //! and all embedding requests return a sensible error to the caller.
 
-use anyhow::Context;
 use serde::{Deserialize, Serialize};
 
+/// Classified failure from the embedding HTTP endpoint.
+///
+/// The variants map directly to [`crate::query_graph::ErrorCode`]
+/// categories so the daemon can propagate a precise classification to
+/// clients instead of collapsing every endpoint failure into `Internal`.
+/// Callers that only need a display string should use the `Display` impl.
+#[derive(Debug)]
+pub enum EmbedError {
+    /// The endpoint rejected the requested model name — either 404, or
+    /// a 4xx whose body names the model. Maps to `ErrorCode::UnknownModel`.
+    /// Retrying with the same model is pointless.
+    UnknownModel(String),
+    /// HTTP transport failure, timeout, or TLS error. Maps to
+    /// `ErrorCode::Internal`. Retry is often safe.
+    Transport(String),
+    /// The endpoint returned a response we could not parse, or the
+    /// vector count did not match the input count. Maps to
+    /// `ErrorCode::Internal`. Indicates a backend misconfiguration.
+    Protocol(String),
+    /// Non-2xx status that does not clearly match any of the above.
+    /// Maps to `ErrorCode::Internal`.
+    Http(String),
+}
+
+impl std::fmt::Display for EmbedError {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        match self {
+            EmbedError::UnknownModel(m)
+            | EmbedError::Transport(m)
+            | EmbedError::Protocol(m)
+            | EmbedError::Http(m) => f.write_str(m),
+        }
+    }
+}
+
+impl std::error::Error for EmbedError {}
+
+/// Classify an embedding endpoint's non-2xx response into the narrowest
+/// applicable [`EmbedError`] variant.
+///
+/// Heuristic: 404 is always an unknown-model signal (OpenAI, Ollama, and
+/// most compatible backends 404 on an unrecognised model). Other 4xx are
+/// classified as `UnknownModel` only when the body mentions the model —
+/// OpenAI-compatible errors typically carry `"code":"model_not_found"`
+/// or a message containing `"model"` for this case. Everything else
+/// (5xx, 4xx without model keyword) falls through to `Http`.
+fn classify_http_error(status: reqwest::StatusCode, body: &str) -> EmbedError {
+    let msg = format!("embedding endpoint returned {status}: {body}");
+    if status == reqwest::StatusCode::NOT_FOUND {
+        return EmbedError::UnknownModel(msg);
+    }
+    if status.is_client_error() {
+        let lower = body.to_ascii_lowercase();
+        if lower.contains("model_not_found") || lower.contains("unknown model") {
+            return EmbedError::UnknownModel(msg);
+        }
+        // Conservative: generic 4xx with "model" mention, treat as model issue
+        // only when combined with a "not found" / "invalid" / "unsupported" hint,
+        // to avoid misclassifying auth / rate-limit errors.
+        let looks_model_shaped = lower.contains("model")
+            && (lower.contains("not found")
+                || lower.contains("invalid")
+                || lower.contains("unsupported")
+                || lower.contains("does not exist"));
+        if looks_model_shaped {
+            return EmbedError::UnknownModel(msg);
+        }
+    }
+    EmbedError::Http(msg)
+}
+
 /// Thin client around a single OpenAI-compatible embedding endpoint.
 pub struct EmbeddingClient {
     url: String,
@@ -73,12 +143,14 @@ impl EmbeddingClient {
     ///
     /// # Errors
     ///
-    /// Propagates HTTP, serialisation, and API errors.
+    /// Returns an [`EmbedError`] classified so the daemon can map directly
+    /// to a [`crate::query_graph::ErrorCode`] without inspecting the
+    /// message string.
     pub async fn embed_texts(
         &self,
         texts: &[String],
         model_override: Option<&str>,
-    ) -> anyhow::Result<(Vec<Vec<f32>>, String)> {
+    ) -> Result<(Vec<Vec<f32>>, String), EmbedError> {
         if texts.is_empty() {
             return Ok((vec![], self.default_model.clone()));
         }
@@ -93,31 +165,89 @@ impl EmbeddingClient {
             .json(&body)
             .send()
             .await
-            .context("embedding HTTP request failed")?;
+            .map_err(|e| EmbedError::Transport(format!("embedding HTTP request failed: {e}")))?;
 
         if !resp.status().is_success() {
             let status = resp.status();
             let text = resp.text().await.unwrap_or_default();
-            anyhow::bail!("embedding endpoint returned {status}: {text}");
+            return Err(classify_http_error(status, &text));
         }
 
         let parsed: EmbedResponse = resp
             .json()
             .await
-            .context("failed to parse embedding response")?;
+            .map_err(|e| EmbedError::Protocol(format!("failed to parse embedding response: {e}")))?;
 
         // Re-order by index field to match the input order.
         let mut data = parsed.data;
         data.sort_by_key(|d| d.index);
 
-        anyhow::ensure!(
-            data.len() == texts.len(),
-            "embedding endpoint returned {} vectors for {} inputs",
-            data.len(),
-            texts.len()
-        );
+        if data.len() != texts.len() {
+            return Err(EmbedError::Protocol(format!(
+                "embedding endpoint returned {} vectors for {} inputs",
+                data.len(),
+                texts.len()
+            )));
+        }
 
         let vectors = data.into_iter().map(|d| d.embedding).collect();
         Ok((vectors, parsed.model))
     }
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use reqwest::StatusCode;
+
+    #[test]
+    fn classify_404_is_unknown_model() {
+        let e = classify_http_error(StatusCode::NOT_FOUND, "model not found");
+        assert!(matches!(e, EmbedError::UnknownModel(_)));
+    }
+
+    #[test]
+    fn classify_openai_model_not_found_code() {
+        // OpenAI API shape.
+        let body = r#"{"error":{"code":"model_not_found","message":"The model 'foo' does not exist"}}"#;
+        let e = classify_http_error(StatusCode::BAD_REQUEST, body);
+        assert!(matches!(e, EmbedError::UnknownModel(_)));
+    }
+
+    #[test]
+    fn classify_ollama_model_unknown() {
+        let body = r#"{"error":"model 'nomic-embed-text' not found, try pulling it first"}"#;
+        let e = classify_http_error(StatusCode::NOT_FOUND, body);
+        assert!(matches!(e, EmbedError::UnknownModel(_)));
+    }
+
+    #[test]
+    fn classify_auth_error_stays_http() {
+        // 401 unauthorized must not be misclassified as UnknownModel just
+        // because a token payload might mention "model".
+        let body = "Unauthorized";
+        let e = classify_http_error(StatusCode::UNAUTHORIZED, body);
+        assert!(matches!(e, EmbedError::Http(_)));
+    }
+
+    #[test]
+    fn classify_rate_limit_stays_http() {
+        let e = classify_http_error(StatusCode::TOO_MANY_REQUESTS, "rate limit");
+        assert!(matches!(e, EmbedError::Http(_)));
+    }
+
+    #[test]
+    fn classify_5xx_stays_http() {
+        let e = classify_http_error(StatusCode::INTERNAL_SERVER_ERROR, "backend died");
+        assert!(matches!(e, EmbedError::Http(_)));
+    }
+
+    #[test]
+    fn classify_4xx_mentioning_model_without_not_found_keyword_stays_http() {
+        // "model temperature too high" would mention "model" but is not
+        // an unknown-model signal. Conservative classifier keeps it Http.
+        let body = "model temperature parameter rejected";
+        let e = classify_http_error(StatusCode::BAD_REQUEST, body);
+        assert!(matches!(e, EmbedError::Http(_)));
+    }
+}
-Original file line number
+Diff line change
@@ Expand Up / @@ -379,7 +379,7 @@ Requires Rust 1.78+. No system `protoc` required. @@
     ## Status
-    v2.0 — `ExplainMatch` (chunk-level explanation: which lines in a result file drove the match), model provenance (`FileStatus` exposes the embedding model per file; `IndexStatus` warns when the index contains mixed-model vectors). v1.9: `filter` glob + `min_score` on all NN calls, `GetCentroid`, `QueryStaleEmbeddings`. v1.8: `FindBoundaries`, `SemanticDiff`, `QueryNearestInStore` (cross-repo federation), `QueryNoveltyScore`, `ExtractTerminology`, `PruneDeleted`. v1.7: 6 semantic retrieval primitives. v1.6: `ReindexFiles`, `Similarity`, `QueryExpansion`, `Cluster`, `ExportEmbeddings`. Wire format is JSON.
+    v2.1 — `StreamContext` (token-budgeted RAG context streaming): callers stream symbols ranked by relevance to a cursor and stop reading when the prompt budget is full instead of fetching top-k and locally truncating; `protocol_version` bumped to `2`. v2.0 — `ExplainMatch` (chunk-level explanation: which lines in a result file drove the match), model provenance (`FileStatus` exposes the embedding model per file; `IndexStatus` warns when the index contains mixed-model vectors). v1.9: `filter` glob + `min_score` on all NN calls, `GetCentroid`, `QueryStaleEmbeddings`. v1.8: `FindBoundaries`, `SemanticDiff`, `QueryNearestInStore` (cross-repo federation), `QueryNoveltyScore`, `ExtractTerminology`, `PruneDeleted`. v1.7: 6 semantic retrieval primitives. v1.6: `ReindexFiles`, `Similarity`, `QueryExpansion`, `Cluster`, `ExportEmbeddings`. Wire format is JSON.
     ---
@@ Expand Down @@