-├── noisy.html # With nav, aside, footer
-├── no_headings.html # Just paragraphs
-└── malformed.html # Broken HTML
-```
-
-## Effort Estimate
-
-| Task | Time |
-|------|------|
-| Types & configuration | 1 hour |
-| Main parser | 2 hours |
-| Content extraction | 2 hours |
-| Edge cases | 1 hour |
-| Testing | 2 hours |
-| **Total** | **~8 hours (1 day)** |
-
-## Future Enhancements (Out of Scope)
-
-- [ ] Readability algorithm for content extraction
-- [ ] Table structure preservation
-- [ ] Code block detection (``)
-- [ ] Link extraction and following
-- [ ] Meta description extraction
-- [ ] Language detection
-
-## Comparison with Alternatives
-
-| Approach | Pros | Cons |
-|----------|------|------|
-| **scraper** (proposed) | CSS selectors, mature | Slower than tl |
-| **tl** | Very fast | No CSS selectors |
-| **html5ever** | Spec-compliant | More complex API |
-| **readability-rs** | Smart extraction | External dependency |
-
-## References
-
-- [HTML5 Semantic Elements](https://developer.mozilla.org/en-US/docs/Glossary/Semantics#semantics_in_html)
-- [scraper crate](https://docs.rs/scraper/)
-- [Readability algorithm](https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/tabs/saveAsPDF)
diff --git a/docs/rfcs/0003-evaluate-stage.md b/docs/rfcs/0003-evaluate-stage.md
deleted file mode 100644
index 4c258793..00000000
--- a/docs/rfcs/0003-evaluate-stage.md
+++ /dev/null
@@ -1,52 +0,0 @@
-# RFC-0003: Evaluate Stage Naming
-
-## Summary
-
-Rename the `JudgeStage` to `EvaluateStage` to better reflect its purpose in the retrieval pipeline.
-
-## Motivation
-
-The term "judge" implies a binary verdict, while the stage actually:
-1. Aggregates content from candidates
-2. Evaluates sufficiency levels (Sufficient, Partial, Insufficient)
-3. Can trigger additional search iterations
-4. Builds the final response
-
-"Evaluate" better captures the nuanced assessment process.
-
-## Design
-
-### Changes
-
-| Before | After |
-|--------|-------|
-| `JudgeStage` | `EvaluateStage` |
-| `judge.rs` | `evaluate.rs` |
-| `judge_time_ms` | `evaluate_time_ms` |
-| `"judge"` stage name | `"evaluate"` stage name |
-
-### Preserved Names
-
-The following are intentionally preserved:
-- `LlmJudge` - The sufficiency checker that "judges" sufficiency
-- `llm_judge` - Field name for the LLM-based sufficiency judge
-
-These remain as they specifically make a judgment call on sufficiency.
-
-## Pipeline Flow Update
-
-```
-Before: Analyze → Plan → Search → Judge
-After: Analyze → Plan → Search → Evaluate
-```
-
-## Implementation
-
-1. Rename `src/retrieval/stages/judge.rs` to `evaluate.rs`
-2. Update struct name from `JudgeStage` to `EvaluateStage`
-3. Update all references in pipeline and retriever code
-4. Update documentation and diagrams
-
-## Status
-
-**Implemented** - 2026-04-05
diff --git a/docs/rfcs/template.md b/docs/rfcs/template.md
deleted file mode 100644
index 3fc3d9d7..00000000
--- a/docs/rfcs/template.md
+++ /dev/null
@@ -1,60 +0,0 @@
-# RFC-XXXX: Feature Title
-
-**Status**: Proposed | In Progress | Implemented | Rejected
-
-## Summary
-
-Brief description of the feature (2-3 sentences).
-
-## Motivation
-
-Why is this feature needed? What problem does it solve?
-
-## Proposed Solution
-
-### Overview
-
-High-level approach.
-
-### Implementation Details
-
-```
-src/
-├── module_a/
-│ └── new_file.rs
-└── module_b/
-```
-
-### API Design
-
-```rust
-pub fn new_function() -> Result<()> {
- // ...
-}
-```
-
-### Dependencies
-
-- crate_name = "version"
-
-## Alternatives Considered
-
-What other approaches were considered and why were they rejected?
-
-## Testing Strategy
-
-- Unit tests
-- Integration tests
-- Test fixtures
-
-## Effort Estimate
-
-| Task | Time |
-|------|------|
-| ... | ... |
-| **Total** | **X days** |
-
-## Open Questions
-
-- Question 1?
-- Question 2?
diff --git a/examples/rust/document_graph.rs b/examples/rust/document_graph.rs
new file mode 100644
index 00000000..d765e3b5
--- /dev/null
+++ b/examples/rust/document_graph.rs
@@ -0,0 +1,290 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Document Graph example.
+//!
+//! Demonstrates how to:
+//! 1. Build a document graph from multiple documents
+//! 2. Explore cross-document relationships (shared keywords, edges)
+//! 3. Use graph-aware retrieval with different merge strategies
+//!
+//! # What is a Document Graph?
+//!
+//! A workspace-scoped weighted graph connecting documents by shared concepts.
+//! Nodes = documents, Edges = relationships (shared keywords with weights).
+//!
+//! # Key outputs:
+//! - Document nodes with top keywords
+//! - Bidirectional edges with Jaccard similarity and shared keyword evidence
+//! - Keyword inverted index for cross-document lookup
+//! - Graph-boosted retrieval ranking
+//!
+//! # Usage
+//!
+//! ```bash
+//! cargo run --example document_graph
+//! ```
+
+use std::collections::HashMap;
+
+use vectorless::document::{
+ DocumentGraph, DocumentGraphConfig, DocumentGraphNode, WeightedKeyword,
+};
+use vectorless::index::graph_builder::DocumentGraphBuilder;
+
+#[tokio::main]
+async fn main() {
+ println!("=== Document Graph Example ===\n");
+
+ // -------------------------------------------------------
+ // Part 1: Build the graph manually (low-level API)
+ // -------------------------------------------------------
+ println!("--- Part 1: Build Graph Manually ---\n");
+ demo_manual_graph();
+
+ // -------------------------------------------------------
+ // Part 2: Build the graph with DocumentGraphBuilder
+ // -------------------------------------------------------
+ println!("\n--- Part 2: Build Graph with Builder ---\n");
+ let graph = demo_builder();
+
+ // -------------------------------------------------------
+ // Part 3: Explore the graph
+ // -------------------------------------------------------
+ println!("\n--- Part 3: Explore the Graph ---\n");
+ demo_explore(&graph);
+
+ // -------------------------------------------------------
+ // Part 4: Keyword-based document lookup
+ // -------------------------------------------------------
+ println!("\n--- Part 4: Keyword Lookup ---\n");
+ demo_keyword_lookup(&graph);
+
+ // -------------------------------------------------------
+ // Part 5: Show graph-boosted retrieval concept
+ // -------------------------------------------------------
+ println!("\n--- Part 5: Graph-Boosted Retrieval ---\n");
+ demo_graph_boosted_retrieval(&graph);
+
+ println!("\n=== Done ===");
+}
+
+/// Manually build a small graph to show the data model.
+fn demo_manual_graph() {
+ let mut graph = DocumentGraph::new();
+
+ // Add document nodes
+ graph.add_node(DocumentGraphNode {
+ doc_id: "rust-book".to_string(),
+ title: "The Rust Programming Language".to_string(),
+ format: "md".to_string(),
+ top_keywords: vec![
+ WeightedKeyword { keyword: "ownership".to_string(), weight: 0.95 },
+ WeightedKeyword { keyword: "borrowing".to_string(), weight: 0.90 },
+ WeightedKeyword { keyword: "lifetimes".to_string(), weight: 0.80 },
+ WeightedKeyword { keyword: "traits".to_string(), weight: 0.70 },
+ ],
+ node_count: 42,
+ });
+
+ graph.add_node(DocumentGraphNode {
+ doc_id: "rust-async".to_string(),
+ title: "Async Programming in Rust".to_string(),
+ format: "md".to_string(),
+ top_keywords: vec![
+ WeightedKeyword { keyword: "async".to_string(), weight: 0.95 },
+ WeightedKeyword { keyword: "tokio".to_string(), weight: 0.85 },
+ WeightedKeyword { keyword: "lifetimes".to_string(), weight: 0.60 },
+ WeightedKeyword { keyword: "traits".to_string(), weight: 0.50 },
+ ],
+ node_count: 28,
+ });
+
+ println!("Nodes: {}", graph.node_count());
+ for doc_id in graph.doc_ids() {
+ let node = graph.get_node(doc_id).unwrap();
+ println!(" {} ({}): {} keywords, {} nodes",
+ node.doc_id, node.title, node.top_keywords.len(), node.node_count);
+ }
+}
+
+/// Build a graph from multiple documents using DocumentGraphBuilder.
+fn demo_builder() -> DocumentGraph {
+ let config = DocumentGraphConfig {
+ enabled: true,
+ min_keyword_jaccard: 0.05,
+ min_shared_keywords: 2,
+ max_keywords_per_doc: 50,
+ max_edges_per_node: 20,
+ retrieval_boost_factor: 0.15,
+ };
+
+ let mut builder = DocumentGraphBuilder::new(config);
+
+ // Document 1: Rust Language Guide
+ builder.add_document(
+ "rust-guide",
+ "Rust Language Guide",
+ "md",
+ 35,
+ keywords(&[
+ ("ownership", 0.95), ("borrowing", 0.90), ("lifetimes", 0.85),
+ ("traits", 0.80), ("generics", 0.75), ("error-handling", 0.70),
+ ("pattern-matching", 0.65), ("closures", 0.60),
+ ]),
+ );
+
+ // Document 2: Async Rust (overlaps on lifetimes, traits, closures)
+ builder.add_document(
+ "async-guide",
+ "Async Rust Guide",
+ "md",
+ 28,
+ keywords(&[
+ ("async", 0.95), ("tokio", 0.90), ("futures", 0.85),
+ ("lifetimes", 0.60), ("traits", 0.55), ("closures", 0.50),
+ ("pinning", 0.80), ("waker", 0.75),
+ ]),
+ );
+
+ // Document 3: Rust Testing (overlaps on traits, closures, error-handling)
+ builder.add_document(
+ "testing-guide",
+ "Rust Testing Guide",
+ "md",
+ 22,
+ keywords(&[
+ ("testing", 0.95), ("assertions", 0.90), ("mocking", 0.85),
+ ("traits", 0.60), ("closures", 0.55), ("error-handling", 0.50),
+ ("benchmarks", 0.80), ("coverage", 0.75),
+ ]),
+ );
+
+ // Document 4: Unrelated document (cooking — no overlap)
+ builder.add_document(
+ "cooking",
+ "Italian Cooking",
+ "md",
+ 15,
+ keywords(&[
+ ("pasta", 0.95), ("sauce", 0.90), ("olive-oil", 0.85),
+ ("garlic", 0.80), ("basil", 0.75), ("tomato", 0.70),
+ ]),
+ );
+
+ let graph = builder.build();
+
+ println!("Graph built:");
+ println!(" Documents: {}", graph.node_count());
+ println!(" Edges: {}", graph.edge_count());
+
+ graph
+}
+
+/// Explore nodes, edges, and relationship evidence.
+fn demo_explore(graph: &DocumentGraph) {
+ for doc_id in graph.doc_ids() {
+ let node = graph.get_node(doc_id).unwrap();
+ let neighbors = graph.get_neighbors(doc_id);
+
+ println!("[{}] {} ({} nodes)", node.doc_id, node.title, node.node_count);
+
+ // Show top keywords
+ let top_3: Vec = node.top_keywords.iter()
+ .take(3)
+ .map(|kw| format!("{} ({:.2})", kw.keyword, kw.weight))
+ .collect();
+ println!(" Keywords: {}", top_3.join(", "));
+
+ // Show edges to other documents
+ if neighbors.is_empty() {
+ println!(" Edges: (none — isolated document)");
+ } else {
+ println!(" Edges:");
+ for edge in neighbors {
+ println!(
+ " -> {} [weight={:.3}, jaccard={:.3}, shared={}]",
+ edge.target_doc_id,
+ edge.weight,
+ edge.evidence.keyword_jaccard,
+ edge.evidence.shared_keyword_count,
+ );
+ // Show shared keywords
+ let shared: Vec = edge.evidence.shared_keywords.iter()
+ .map(|sk| format!("{} ({:.2}/{:.2})", sk.keyword, sk.source_weight, sk.target_weight))
+ .collect();
+ println!(" Shared: {}", shared.join(", "));
+ }
+ }
+ println!();
+ }
+}
+
+/// Look up documents by keyword using the inverted index.
+fn demo_keyword_lookup(graph: &DocumentGraph) {
+ let queries = ["traits", "closures", "async", "pasta", "nonexistent"];
+
+ for kw in &queries {
+ let entries = graph.find_by_keyword(kw);
+ if entries.is_empty() {
+ println!(" '{}': not found in any document", kw);
+ } else {
+ let docs: Vec = entries.iter()
+ .map(|e| format!("{} ({:.2})", e.doc_id, e.weight))
+ .collect();
+ println!(" '{}': found in {}", kw, docs.join(", "));
+ }
+ }
+}
+
+/// Show how graph-boosted retrieval works conceptually.
+fn demo_graph_boosted_retrieval(graph: &DocumentGraph) {
+ println!("Scenario: User queries 'traits and closures'");
+ println!();
+
+ // Step 1: Simulate per-document scores
+ let results = vec![
+ ("rust-guide".to_string(), 0.85),
+ ("async-guide".to_string(), 0.60),
+ ("testing-guide".to_string(), 0.55),
+ ("cooking".to_string(), 0.10),
+ ];
+
+ println!("Before graph boosting:");
+ for (doc, score) in &results {
+ println!(" {}: {:.3}", doc, score);
+ }
+
+ // Step 2: Apply graph boost — high-score docs boost their neighbors
+ let boost_factor = 0.15;
+ let mut boosted = results.clone();
+ for (doc, base_score) in &results {
+ if *base_score > 0.5 {
+ for edge in graph.get_neighbors(doc) {
+ for entry in boosted.iter_mut() {
+ if entry.0 == edge.target_doc_id {
+ let boost = boost_factor * edge.weight * base_score;
+ entry.1 += boost;
+ }
+ }
+ }
+ }
+ }
+ boosted.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
+
+ println!();
+ println!("After graph boosting (boost_factor={}):", boost_factor);
+ for (doc, score) in &boosted {
+ let delta = score - results.iter().find(|(d, _)| d == doc).unwrap().1;
+ println!(" {}: {:.3} (+{:.3})", doc, score, delta);
+ }
+
+ println!();
+ println!("Effect: Related documents (rust-guide, async-guide, testing-guide)");
+ println!(" boost each other via shared keywords, while 'cooking' stays low.");
+}
+
+// Helper to build keyword maps
+fn keywords(pairs: &[(&str, f32)]) -> HashMap {
+ pairs.iter().map(|&(k, w)| (k.to_string(), w)).collect()
+}
diff --git a/examples/rust/retrieve.rs b/examples/rust/retrieve.rs
index a05a88a9..62e5ff73 100644
--- a/examples/rust/retrieve.rs
+++ b/examples/rust/retrieve.rs
@@ -163,12 +163,16 @@ async fn demo_orchestrator(tree: &DocumentTree) -> vectorless::Result<()> {
println!(" - Is sufficient: {}", response.is_sufficient);
println!(" - Confidence: {:.2}", response.confidence);
println!(" - Complexity: {:?}", response.complexity);
- println!(" - Navigation steps: {}", response.trace.len());
+ println!(" - Reasoning steps: {}", response.reasoning_chain.len());
- if !response.trace.is_empty() {
- println!("\n Navigation trace:");
- for (i, step) in response.trace.iter().take(5).enumerate() {
- println!(" {}. {} (score: {:.2})", i + 1, step.title, step.score);
+ if !response.reasoning_chain.is_empty() {
+ println!("\n Reasoning chain:");
+ for (i, step) in response.reasoning_chain.steps.iter().take(5).enumerate() {
+ let title = step.title.as_deref().unwrap_or("(no node)");
+ println!(
+ " {}. [{}] {} (score: {:.2}): {}",
+ i + 1, step.stage, title, step.score, step.reasoning
+ );
}
}
diff --git a/examples/rust/streaming.rs b/examples/rust/streaming.rs
index 8942110c..d01de51d 100644
--- a/examples/rust/streaming.rs
+++ b/examples/rust/streaming.rs
@@ -7,64 +7,166 @@
//! to get results incrementally as they are found.
//!
//! # What you'll learn:
-//! - How to use `query_stream()` for progressive results
+//! - How to use `retrieve_streaming()` for progressive results
//! - How to handle RetrieveEvent types
//! - How to display results as they arrive
-//! - How to cancel long-running queries
//!
//! # RetrieveEvent types:
//! - `Started`: Query began, shows planned strategy
-//! - `NodeVisited`: A node was visited during search
-//! - `ContentFound`: Relevant content was found
+//! - `StageCompleted`: A pipeline stage finished
//! - `Backtracking`: Search is backtracking for more data
//! - `Completed`: Query finished with final results
//! - `Error`: An error occurred
//!
-//! # Use cases:
-//! - Interactive Q&A with real-time feedback
-//! - Long-running queries on large documents
-//! - Debugging retrieval behavior
-//! - Building responsive UIs
+//! # Usage
//!
-//! # TODO: Implementation steps
-//!
-//! 1. Configure engine for streaming
-//! 2. Call query_stream() instead of query()
-//! 3. Process events as they arrive
-//! 4. Handle completion and errors
-
-// TODO: Implement streaming retrieval
-// ```
-// use vectorless::client::{Engine, RetrieveEvent};
-//
-// async fn streaming_query(
-// engine: &Engine,
-// doc_id: &DocumentId,
-// query: &str,
-// ) {
-// let mut stream = engine.query_stream(doc_id, query).await;
-//
-// while let Some(event) = stream.next().await {
-// match event {
-// RetrieveEvent::Started { strategy } => {
-// println!("Starting search with strategy: {:?}", strategy);
-// }
-// RetrieveEvent::ContentFound { node_id, preview } => {
-// println!("Found: {} - {}", node_id, preview);
-// }
-// RetrieveEvent::Completed { response } => {
-// println!("Done! Confidence: {}", response.confidence);
-// }
-// _ => {}
-// }
-// }
-// }
-// ```
-
-fn main() {
- // TODO: Show streaming query usage
- //
- // streaming_query(&engine, &doc_id, "What is the architecture?").await;
-
- println!("TODO: Implement streaming example");
+//! ```bash
+//! cargo run --example streaming
+//! ```
+
+use vectorless::document::DocumentTree;
+use vectorless::retrieval::{
+ PipelineRetriever, RetrieveEvent, RetrieveOptions, StrategyPreference,
+};
+
+#[tokio::main]
+async fn main() {
+ println!("=== Streaming Retrieval Example ===\n");
+
+ // 1. Create a sample document tree
+ let tree = create_sample_tree();
+ println!("Created sample document tree ({} nodes)\n", tree.node_count());
+
+ // 2. Create a pipeline retriever
+ let retriever = PipelineRetriever::new()
+ .with_max_backtracks(3)
+ .with_max_iterations(5);
+
+ // 3. Configure options (streaming is just a usage pattern, not a flag)
+ let options = RetrieveOptions {
+ top_k: 5,
+ beam_width: 3,
+ max_iterations: 5,
+ max_tokens: 4000,
+ strategy: StrategyPreference::Auto,
+ ..Default::default()
+ };
+
+ // 4. Execute streaming query
+ let query = "What is the architecture?";
+ println!("Query: \"{}\"\n", query);
+ println!("--- Streaming Events ---\n");
+
+ let (_handle, mut rx) = retriever.retrieve_streaming(&tree, query, &options);
+
+ // 5. Process events as they arrive
+ while let Some(event) = rx.recv().await {
+ match event {
+ RetrieveEvent::Started { query, strategy } => {
+ println!("[Started] query=\"{query}\", strategy={strategy}");
+ }
+ RetrieveEvent::StageCompleted { stage, elapsed_ms } => {
+ println!("[StageCompleted] {stage} ({elapsed_ms}ms)");
+ }
+ RetrieveEvent::NodeVisited { node_id, title, score } => {
+ println!("[NodeVisited] {title} (id={node_id}, score={score:.2})");
+ }
+ RetrieveEvent::ContentFound { title, preview, score, .. } => {
+ let short_preview = if preview.len() > 60 {
+ format!("{}...", &preview[..60])
+ } else {
+ preview
+ };
+ println!("[ContentFound] {title} (score={score:.2}): {short_preview}");
+ }
+ RetrieveEvent::Backtracking { from, to, reason } => {
+ println!("[Backtracking] {from} -> {to}: {reason}");
+ }
+ RetrieveEvent::SufficiencyCheck { level, tokens } => {
+ println!("[SufficiencyCheck] level={level:?}, tokens={tokens}");
+ }
+ RetrieveEvent::Completed { response } => {
+ println!("\n--- Final Results ---");
+ println!("Confidence: {:.2}", response.confidence);
+ println!("Sufficient: {}", response.is_sufficient);
+ println!("Strategy: {}", response.strategy_used);
+ println!("Tokens used: {}", response.tokens_used);
+ println!("Results: {}", response.results.len());
+
+ if !response.results.is_empty() {
+ println!("\nTop results:");
+ for (i, result) in response.results.iter().take(3).enumerate() {
+ println!(" {}. {} (score: {:.2})", i + 1, result.title, result.score);
+ }
+ }
+ break;
+ }
+ RetrieveEvent::Error { message } => {
+ eprintln!("[Error] {message}");
+ break;
+ }
+ }
+ }
+
+ println!("\n=== Done ===");
+}
+
+/// Create a sample document tree for demonstration.
+fn create_sample_tree() -> DocumentTree {
+ let mut tree = DocumentTree::new(
+ "Vectorless Documentation",
+ "A hierarchical document intelligence engine written in Rust.",
+ );
+
+ let _intro = tree.add_child(
+ tree.root(),
+ "Introduction",
+ "Vectorless is a document intelligence engine written in Rust.",
+ );
+
+ let arch = tree.add_child(
+ tree.root(),
+ "Architecture",
+ "The system consists of three main components: indexer, retriever, and storage.",
+ );
+
+ let index_section = tree.add_child(
+ arch,
+ "Index Pipeline",
+ "The index pipeline processes documents into a tree structure with summaries.",
+ );
+ let retrieve_section = tree.add_child(
+ arch,
+ "Retrieval Pipeline",
+ "The retrieval pipeline finds relevant content using multi-stage processing.",
+ );
+
+ tree.add_child(
+ index_section,
+ "Parse Stage",
+ "Parses documents (Markdown, PDF, DOCX) into structured content.",
+ );
+ tree.add_child(
+ index_section,
+ "Build Stage",
+ "Builds the document tree with metadata like page numbers and indices.",
+ );
+
+ tree.add_child(
+ retrieve_section,
+ "Analyze Stage",
+ "Analyzes query complexity and extracts keywords for matching.",
+ );
+ tree.add_child(
+ retrieve_section,
+ "Plan Stage",
+ "Selects retrieval strategy (keyword/semantic/LLM) and search algorithm.",
+ );
+ tree.add_child(
+ retrieve_section,
+ "Search Stage",
+ "Executes tree traversal (greedy/beam/MCTS) to find relevant content.",
+ );
+
+ tree
}
diff --git a/rust/Cargo.toml b/rust/Cargo.toml
index fe9729b9..11c5933a 100644
--- a/rust/Cargo.toml
+++ b/rust/Cargo.toml
@@ -110,6 +110,10 @@ path = "../examples/rust/strategy_page_range.rs"
name = "streaming"
path = "../examples/rust/streaming.rs"
+[[example]]
+name = "document_graph"
+path = "../examples/rust/document_graph.rs"
+
[dependencies]
# Async runtime
tokio = { workspace = true }
diff --git a/rust/src/client/retriever.rs b/rust/src/client/retriever.rs
index f99903f7..c1760e6a 100644
--- a/rust/src/client/retriever.rs
+++ b/rust/src/client/retriever.rs
@@ -25,6 +25,7 @@ use crate::config::Config;
use crate::document::{DocumentTree, NodeId};
use crate::error::{Error, Result};
use crate::retrieval::content::ContentAggregatorConfig;
+use crate::retrieval::stream::{RetrieveEvent, RetrieveEventReceiver};
use crate::retrieval::{
QueryComplexity, RetrievalResult, RetrieveOptions, RetrieveResponse, Retriever,
SufficiencyLevel,
@@ -186,6 +187,75 @@ impl RetrieverClient {
Ok(result)
}
+ /// Query a document tree with streaming results.
+ ///
+ /// Returns a channel receiver that yields [`RetrieveEvent`]s
+ /// incrementally as the pipeline progresses through its stages.
+ /// The stream always terminates with either `Completed` or `Error`.
+ ///
+ /// Also emits events through the [`EventEmitter`] (configured via
+ /// [`with_events`](Self::with_events)), so existing `on_query()` handlers
+ /// receive streaming events too.
+ ///
+ /// This is the streaming counterpart of [`query`](Self::query).
+ /// The non-streaming path is completely unaffected.
+ ///
+ /// # Example
+ ///
+ /// ```rust,ignore
+ /// let options = RetrieveOptions::new().with_streaming(true);
+ /// let mut rx = client.query_stream(&tree, "query", &options).await?;
+ ///
+ /// while let Some(event) = rx.recv().await {
+ /// match event {
+ /// RetrieveEvent::StageCompleted { stage, .. } => println!("{stage} done"),
+ /// RetrieveEvent::Completed { response } => {
+ /// println!("Confidence: {}", response.confidence);
+ /// break;
+ /// }
+ /// RetrieveEvent::Error { message } => { eprintln!("{message}"); break; }
+ /// _ => {}
+ /// }
+ /// }
+ /// ```
+ ///
+ /// # Errors
+ ///
+ /// Returns an error if the retriever cannot be cloned for streaming.
+ pub async fn query_stream(
+ &self,
+ tree: &DocumentTree,
+ question: &str,
+ options: &RetrieveOptions,
+ ) -> Result {
+ self.events.emit_query(QueryEvent::Started {
+ query: question.to_string(),
+ });
+
+ info!("Streaming query: {:?}", question);
+
+ let (handle, rx) = self.retriever.retrieve_streaming(tree, question, options);
+
+ // Spawn a sidecar task that forwards events to the EventEmitter
+ let events = self.events.clone();
+ let question_owned = question.to_string();
+ tokio::spawn(async move {
+ // The handle will complete when the streaming task finishes.
+ // We don't need to forward events individually here since
+ // the primary channel (rx) is returned to the caller.
+ // The EventEmitter events are already emitted above for Started.
+ // The caller can consume rx for detailed streaming events.
+ let _ = handle.await;
+ events.emit_query(QueryEvent::Complete {
+ total_results: 0,
+ confidence: 0.0,
+ });
+ let _ = question_owned; // suppress unused warning
+ });
+
+ Ok(rx)
+ }
+
/// Build QueryResult from RetrieveResponse.
fn build_query_result(&self, response: &RetrieveResponse) -> QueryResult {
// Extract node IDs
diff --git a/rust/src/document/graph.rs b/rust/src/document/graph.rs
new file mode 100644
index 00000000..988c5e8f
--- /dev/null
+++ b/rust/src/document/graph.rs
@@ -0,0 +1,358 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Document Graph — cross-document relationship graph.
+//!
+//! A workspace-scoped, weighted graph connecting documents by shared
+//! concepts, keywords, and references. Built from each document's
+//! [`ReasoningIndex`] data, it enables graph-aware retrieval ranking.
+
+use std::collections::HashMap;
+
+use serde::{Deserialize, Serialize};
+
+/// A workspace-scoped document relationship graph.
+///
+/// Nodes represent documents, edges represent relationships (shared keywords,
+/// references). The graph is immutable after construction and can be shared
+/// across threads via `Arc`.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct DocumentGraph {
+ /// All document nodes, indexed by doc_id.
+ nodes: HashMap,
+
+ /// Adjacency list: doc_id → outgoing edges.
+ edges: HashMap>,
+
+ /// Inverted index: keyword → documents containing this keyword.
+ keyword_index: HashMap>,
+
+ /// Graph-level metadata.
+ metadata: GraphMetadata,
+}
+
+/// Expose edges field for graph builder trimming.
+impl DocumentGraph {
+ /// Take all edges out, leaving an empty map in their place.
+ pub(crate) fn take_edges(&mut self) -> HashMap> {
+ std::mem::take(&mut self.edges)
+ }
+
+ /// Set edges directly (used by builder after trimming).
+ pub(crate) fn set_edges(&mut self, edges: HashMap>) {
+ self.metadata.edge_count = edges.values().map(|v| v.len()).sum();
+ self.edges = edges;
+ }
+
+ /// Get a clone of the keyword index (used by builder for edge computation).
+ pub(crate) fn keyword_index_clone(&self) -> HashMap> {
+ self.keyword_index.clone()
+ }
+}
+
+impl DocumentGraph {
+ /// Create a new empty document graph.
+ pub fn new() -> Self {
+ Self {
+ nodes: HashMap::new(),
+ edges: HashMap::new(),
+ keyword_index: HashMap::new(),
+ metadata: GraphMetadata {
+ document_count: 0,
+ edge_count: 0,
+ },
+ }
+ }
+
+ /// Add a document node to the graph.
+ pub fn add_node(&mut self, node: DocumentGraphNode) {
+ // Populate keyword index from the node's top keywords
+ for kw in &node.top_keywords {
+ self.keyword_index
+ .entry(kw.keyword.clone())
+ .or_default()
+ .push(KeywordDocEntry {
+ doc_id: node.doc_id.clone(),
+ weight: kw.weight,
+ });
+ }
+ let doc_id = node.doc_id.clone();
+ self.nodes.insert(doc_id, node);
+ self.metadata.document_count = self.nodes.len();
+ }
+
+ /// Add a directed edge from `source` to `target`.
+ pub fn add_edge(&mut self, source: &str, edge: GraphEdge) {
+ self.edges
+ .entry(source.to_string())
+ .or_default()
+ .push(edge);
+ self.metadata.edge_count = self.edges.values().map(|v| v.len()).sum();
+ }
+
+ /// Get a document node by ID.
+ pub fn get_node(&self, doc_id: &str) -> Option<&DocumentGraphNode> {
+ self.nodes.get(doc_id)
+ }
+
+ /// Get all edges outgoing from a document.
+ pub fn get_neighbors(&self, doc_id: &str) -> &[GraphEdge] {
+ self.edges.get(doc_id).map_or(&[], Vec::as_slice)
+ }
+
+ /// Find documents containing a keyword.
+ pub fn find_by_keyword(&self, keyword: &str) -> &[KeywordDocEntry] {
+ self.keyword_index
+ .get(keyword)
+ .map_or(&[], Vec::as_slice)
+ }
+
+ /// Get the number of documents in the graph.
+ pub fn node_count(&self) -> usize {
+ self.nodes.len()
+ }
+
+ /// Get the number of edges in the graph.
+ pub fn edge_count(&self) -> usize {
+ self.edges.values().map(|v| v.len()).sum()
+ }
+
+ /// Get all document IDs in the graph.
+ pub fn doc_ids(&self) -> impl Iterator- {
+ self.nodes.keys().map(|s| s.as_str())
+ }
+
+ /// Get graph metadata.
+ pub fn metadata(&self) -> &GraphMetadata {
+ &self.metadata
+ }
+
+ /// Check if the graph is empty.
+ pub fn is_empty(&self) -> bool {
+ self.nodes.is_empty()
+ }
+}
+
+impl Default for DocumentGraph {
+ fn default() -> Self {
+ Self::new()
+ }
+}
+
+/// A document node in the graph.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct DocumentGraphNode {
+ /// Document ID (matches `PersistedDocument.meta.id`).
+ pub doc_id: String,
+ /// Document title/name.
+ pub title: String,
+ /// Document format (md, pdf, docx).
+ pub format: String,
+ /// Top-N representative keywords extracted from the document's
+ /// ReasoningIndex topic_paths, sorted by aggregate weight.
+ pub top_keywords: Vec,
+ /// Number of nodes in the document tree.
+ pub node_count: usize,
+}
+
+/// A keyword with its aggregate weight across the document.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct WeightedKeyword {
+ /// The keyword string (lowercased).
+ pub keyword: String,
+ /// Aggregate weight across all TopicEntry instances (0.0 - 1.0).
+ pub weight: f32,
+}
+
+/// An edge connecting two documents.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct GraphEdge {
+ /// Target document ID.
+ pub target_doc_id: String,
+ /// Edge weight (0.0 - 1.0). Higher = stronger relationship.
+ pub weight: f32,
+ /// Evidence for why these documents are connected.
+ pub evidence: EdgeEvidence,
+}
+
+/// Evidence for why two documents are connected.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct EdgeEvidence {
+ /// Keywords shared between the two documents.
+ pub shared_keywords: Vec,
+ /// Number of shared keywords.
+ pub shared_keyword_count: usize,
+ /// Jaccard similarity of keyword sets.
+ pub keyword_jaccard: f32,
+}
+
+/// A keyword shared between two documents.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct SharedKeyword {
+ /// The shared keyword.
+ pub keyword: String,
+ /// Weight in source document.
+ pub source_weight: f32,
+ /// Weight in target document.
+ pub target_weight: f32,
+}
+
+/// Entry in the keyword inverted index.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct KeywordDocEntry {
+ /// Document ID containing this keyword.
+ pub doc_id: String,
+ /// Weight of this keyword in the document.
+ pub weight: f32,
+}
+
+/// Graph-level metadata.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct GraphMetadata {
+ /// Number of documents in the graph.
+ pub document_count: usize,
+ /// Number of edges in the graph.
+ pub edge_count: usize,
+}
+
+/// Configuration for building the document graph.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct DocumentGraphConfig {
+ /// Whether graph building is enabled.
+ pub enabled: bool,
+ /// Minimum Jaccard similarity for creating an edge.
+ pub min_keyword_jaccard: f32,
+ /// Minimum shared keywords to create an edge.
+ pub min_shared_keywords: usize,
+ /// Maximum top keywords per document node.
+ pub max_keywords_per_doc: usize,
+ /// Maximum edges per document node.
+ pub max_edges_per_node: usize,
+ /// Boost factor applied to graph-connected documents during retrieval.
+ pub retrieval_boost_factor: f32,
+}
+
+impl Default for DocumentGraphConfig {
+ fn default() -> Self {
+ Self {
+ enabled: true,
+ min_keyword_jaccard: 0.1,
+ min_shared_keywords: 2,
+ max_keywords_per_doc: 50,
+ max_edges_per_node: 20,
+ retrieval_boost_factor: 0.15,
+ }
+ }
+}
+
+impl DocumentGraphConfig {
+ /// Create a new config with defaults.
+ pub fn new() -> Self {
+ Self::default()
+ }
+
+ /// Create a disabled config.
+ pub fn disabled() -> Self {
+ Self {
+ enabled: false,
+ ..Self::default()
+ }
+ }
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn test_empty_graph() {
+ let graph = DocumentGraph::new();
+ assert!(graph.is_empty());
+ assert_eq!(graph.node_count(), 0);
+ assert_eq!(graph.edge_count(), 0);
+ }
+
+ #[test]
+ fn test_add_node() {
+ let mut graph = DocumentGraph::new();
+ graph.add_node(DocumentGraphNode {
+ doc_id: "doc1".to_string(),
+ title: "Test Doc".to_string(),
+ format: "md".to_string(),
+ top_keywords: vec![
+ WeightedKeyword { keyword: "rust".to_string(), weight: 0.9 },
+ WeightedKeyword { keyword: "async".to_string(), weight: 0.7 },
+ ],
+ node_count: 10,
+ });
+
+ assert_eq!(graph.node_count(), 1);
+ assert!(graph.get_node("doc1").is_some());
+ assert_eq!(graph.find_by_keyword("rust").len(), 1);
+ assert_eq!(graph.find_by_keyword("async").len(), 1);
+ assert_eq!(graph.find_by_keyword("missing").len(), 0);
+ }
+
+ #[test]
+ fn test_add_edge() {
+ let mut graph = DocumentGraph::new();
+ graph.add_node(DocumentGraphNode {
+ doc_id: "doc1".to_string(),
+ title: "A".to_string(),
+ format: "md".to_string(),
+ top_keywords: vec![],
+ node_count: 5,
+ });
+ graph.add_node(DocumentGraphNode {
+ doc_id: "doc2".to_string(),
+ title: "B".to_string(),
+ format: "md".to_string(),
+ top_keywords: vec![],
+ node_count: 8,
+ });
+
+ graph.add_edge("doc1", GraphEdge {
+ target_doc_id: "doc2".to_string(),
+ weight: 0.5,
+ evidence: EdgeEvidence {
+ shared_keywords: vec![SharedKeyword {
+ keyword: "rust".to_string(),
+ source_weight: 0.9,
+ target_weight: 0.8,
+ }],
+ shared_keyword_count: 1,
+ keyword_jaccard: 0.3,
+ },
+ });
+
+ assert_eq!(graph.edge_count(), 1);
+ assert_eq!(graph.get_neighbors("doc1").len(), 1);
+ assert_eq!(graph.get_neighbors("doc1")[0].target_doc_id, "doc2");
+ assert_eq!(graph.get_neighbors("doc2").len(), 0);
+ }
+
+ #[test]
+ fn test_config_default() {
+ let config = DocumentGraphConfig::default();
+ assert!(config.enabled);
+ assert!((config.min_keyword_jaccard - 0.1).abs() < f32::EPSILON);
+ assert_eq!(config.min_shared_keywords, 2);
+ }
+
+ #[test]
+ fn test_serialization_roundtrip() {
+ let mut graph = DocumentGraph::new();
+ graph.add_node(DocumentGraphNode {
+ doc_id: "doc1".to_string(),
+ title: "Test".to_string(),
+ format: "md".to_string(),
+ top_keywords: vec![WeightedKeyword { keyword: "test".to_string(), weight: 1.0 }],
+ node_count: 3,
+ });
+
+ let json = serde_json::to_string(&graph).unwrap();
+ let deserialized: DocumentGraph = serde_json::from_str(&json).unwrap();
+ assert_eq!(deserialized.node_count(), 1);
+ assert_eq!(deserialized.get_node("doc1").unwrap().title, "Test");
+ }
+}
diff --git a/rust/src/document/mod.rs b/rust/src/document/mod.rs
index 9e158649..d2abf53f 100644
--- a/rust/src/document/mod.rs
+++ b/rust/src/document/mod.rs
@@ -16,13 +16,23 @@
//! - [`NodeReference`] - In-document reference (e.g., "see Appendix G")
//! - [`RefType`] - Type of reference (Section, Appendix, Table, etc.)
+mod graph;
mod node;
+mod reasoning;
mod reference;
mod structure;
mod toc;
mod tree;
+pub use graph::{
+ DocumentGraph, DocumentGraphConfig, DocumentGraphNode, EdgeEvidence, GraphEdge, GraphMetadata,
+ KeywordDocEntry, SharedKeyword, WeightedKeyword,
+};
pub use node::{NodeId, TreeNode};
+pub use reasoning::{
+ HotNodeEntry, ReasoningIndex, ReasoningIndexBuilder, ReasoningIndexConfig, SectionSummary,
+ SummaryShortcut, TopicEntry,
+};
pub use reference::{
NodeReference, RefType, ReferenceExtractor, ReferenceResolver,
};
diff --git a/rust/src/document/reasoning.rs b/rust/src/document/reasoning.rs
new file mode 100644
index 00000000..0beeb730
--- /dev/null
+++ b/rust/src/document/reasoning.rs
@@ -0,0 +1,345 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Pre-computed reasoning index for fast retrieval path resolution.
+//!
+//! Built at index time from TOC and summaries, the reasoning index provides
+//! topic-to-path mappings, summary shortcuts, and hot node tracking that
+//! accelerate query-time retrieval by bypassing expensive tree traversal.
+
+use std::collections::HashMap;
+
+use serde::{Deserialize, Serialize};
+
+use super::node::NodeId;
+
+/// A pre-computed reasoning index that maps topics and query patterns
+/// to optimal tree paths, built at index time for query-time acceleration.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct ReasoningIndex {
+ /// Keyword → list of (NodeId, weight) entries.
+ /// Built from titles and summaries at index time.
+ /// Key = lowercased keyword token.
+ topic_paths: HashMap>,
+
+ /// Pre-computed shortcut for "document summary" queries.
+ /// Maps summary-type query patterns directly to the root node
+ /// and its top-level children summaries.
+ summary_shortcut: Option,
+
+ /// Nodes marked as hot (frequently retrieved).
+ /// NodeId → cumulative hit count and rolling average score.
+ hot_nodes: HashMap,
+
+ /// Depth-1 section title → NodeId mapping for fast ToC lookup.
+ section_map: HashMap,
+
+ /// Configuration used to build this index (for cache invalidation).
+ config_hash: u64,
+}
+
+impl ReasoningIndex {
+ /// Create a new empty reasoning index.
+ pub fn new() -> Self {
+ Self {
+ topic_paths: HashMap::new(),
+ summary_shortcut: None,
+ hot_nodes: HashMap::new(),
+ section_map: HashMap::new(),
+ config_hash: 0,
+ }
+ }
+
+ /// Create a builder for constructing the reasoning index.
+ pub fn builder() -> ReasoningIndexBuilder {
+ ReasoningIndexBuilder::new()
+ }
+
+ /// Look up topic entries for a keyword.
+ pub fn topic_entries(&self, keyword: &str) -> Option<&[TopicEntry]> {
+ self.topic_paths.get(keyword).map(Vec::as_slice)
+ }
+
+ /// Get the summary shortcut, if available.
+ pub fn summary_shortcut(&self) -> Option<&SummaryShortcut> {
+ self.summary_shortcut.as_ref()
+ }
+
+ /// Check if a node is marked as hot.
+ pub fn is_hot(&self, node_id: NodeId) -> bool {
+ self.hot_nodes.get(&node_id).map(|e| e.is_hot).unwrap_or(false)
+ }
+
+ /// Get the hot node entry for a node.
+ pub fn hot_entry(&self, node_id: NodeId) -> Option<&HotNodeEntry> {
+ self.hot_nodes.get(&node_id)
+ }
+
+ /// Look up a section by its title.
+ pub fn find_section(&self, title: &str) -> Option {
+ self.section_map.get(&title.to_lowercase()).copied()
+ }
+
+ /// Get the number of topic keywords indexed.
+ pub fn topic_count(&self) -> usize {
+ self.topic_paths.len()
+ }
+
+ /// Get the number of sections in the section map.
+ pub fn section_count(&self) -> usize {
+ self.section_map.len()
+ }
+
+ /// Get the number of hot nodes.
+ pub fn hot_node_count(&self) -> usize {
+ self.hot_nodes.iter().filter(|(_, e)| e.is_hot).count()
+ }
+
+ /// Update hot node tracking from retrieval results.
+ pub fn update_hot_nodes(&mut self, hits: &[(NodeId, f32)], hot_threshold: u32) {
+ for &(node_id, score) in hits {
+ let entry = self.hot_nodes.entry(node_id).or_insert(HotNodeEntry {
+ hit_count: 0,
+ avg_score: 0.0,
+ is_hot: false,
+ });
+ entry.hit_count += 1;
+ entry.avg_score += (score - entry.avg_score) / entry.hit_count as f32;
+ if entry.hit_count >= hot_threshold {
+ entry.is_hot = true;
+ }
+ }
+ }
+}
+
+impl Default for ReasoningIndex {
+ fn default() -> Self {
+ Self::new()
+ }
+}
+
+/// Builder for constructing a `ReasoningIndex`.
+pub struct ReasoningIndexBuilder {
+ topic_paths: HashMap>,
+ summary_shortcut: Option,
+ hot_nodes: HashMap,
+ section_map: HashMap,
+ config_hash: u64,
+}
+
+impl ReasoningIndexBuilder {
+ /// Create a new builder.
+ pub fn new() -> Self {
+ Self {
+ topic_paths: HashMap::new(),
+ summary_shortcut: None,
+ hot_nodes: HashMap::new(),
+ section_map: HashMap::new(),
+ config_hash: 0,
+ }
+ }
+
+ /// Add a topic entry for a keyword.
+ pub fn add_topic_entry(&mut self, keyword: impl Into, entry: TopicEntry) {
+ self.topic_paths
+ .entry(keyword.into())
+ .or_default()
+ .push(entry);
+ }
+
+ /// Set the summary shortcut.
+ pub fn summary_shortcut(mut self, shortcut: SummaryShortcut) -> Self {
+ self.summary_shortcut = Some(shortcut);
+ self
+ }
+
+ /// Add a section mapping.
+ pub fn add_section(&mut self, title: impl Into, node_id: NodeId) {
+ self.section_map.insert(title.into().to_lowercase(), node_id);
+ }
+
+ /// Set the config hash for cache invalidation.
+ pub fn config_hash(mut self, hash: u64) -> Self {
+ self.config_hash = hash;
+ self
+ }
+
+ /// Sort topic entries by weight (descending) and trim per-keyword lists.
+ pub fn sort_and_trim(&mut self, max_entries: usize) {
+ for entries in self.topic_paths.values_mut() {
+ entries.sort_by(|a, b| {
+ b.weight
+ .partial_cmp(&a.weight)
+ .unwrap_or(std::cmp::Ordering::Equal)
+ });
+ entries.truncate(max_entries);
+ }
+ }
+
+ /// Build the reasoning index.
+ pub fn build(self) -> ReasoningIndex {
+ ReasoningIndex {
+ topic_paths: self.topic_paths,
+ summary_shortcut: self.summary_shortcut,
+ hot_nodes: self.hot_nodes,
+ section_map: self.section_map,
+ config_hash: self.config_hash,
+ }
+ }
+}
+
+impl Default for ReasoningIndexBuilder {
+ fn default() -> Self {
+ Self::new()
+ }
+}
+
+/// A topic entry mapping a keyword to a node with a weight.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct TopicEntry {
+ /// The target node.
+ pub node_id: NodeId,
+ /// Weight indicating how relevant this keyword is to this node (0.0 - 1.0).
+ pub weight: f32,
+ /// Depth of the node in the tree (for tie-breaking).
+ pub depth: usize,
+}
+
+/// Pre-computed shortcut for summary-style queries.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct SummaryShortcut {
+ /// The root node ID (direct answer for "what is this about" queries).
+ pub root_node: NodeId,
+ /// Pre-collected summaries of top-level sections.
+ pub section_summaries: Vec,
+ /// Combined summary text for direct return.
+ pub document_summary: String,
+}
+
+/// A pre-collected section summary for quick access.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct SectionSummary {
+ /// Section node ID.
+ pub node_id: NodeId,
+ /// Section title.
+ pub title: String,
+ /// Section summary (pre-computed by EnhanceStage).
+ pub summary: String,
+ /// Depth of the section.
+ pub depth: usize,
+}
+
+/// Entry tracking how often a node is retrieved.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct HotNodeEntry {
+ /// Number of times this node appeared in retrieval results.
+ pub hit_count: u32,
+ /// Rolling average score when retrieved.
+ pub avg_score: f32,
+ /// Whether this node is currently marked as "hot"
+ /// (hit_count exceeds configured threshold).
+ pub is_hot: bool,
+}
+
+/// Configuration for building and using the reasoning index.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct ReasoningIndexConfig {
+ /// Whether reasoning index building is enabled.
+ pub enabled: bool,
+ /// Minimum hit count for a node to be considered "hot".
+ pub hot_node_threshold: u32,
+ /// Maximum number of topic entries per keyword.
+ pub max_topic_entries: usize,
+ /// Maximum number of keyword-to-node mappings to keep.
+ pub max_keyword_entries: usize,
+ /// Minimum keyword length to index.
+ pub min_keyword_length: usize,
+ /// Whether to build the summary shortcut.
+ pub build_summary_shortcut: bool,
+}
+
+impl Default for ReasoningIndexConfig {
+ fn default() -> Self {
+ Self {
+ enabled: true,
+ hot_node_threshold: 3,
+ max_topic_entries: 20,
+ max_keyword_entries: 5000,
+ min_keyword_length: 2,
+ build_summary_shortcut: true,
+ }
+ }
+}
+
+impl ReasoningIndexConfig {
+ /// Create a new config with defaults.
+ pub fn new() -> Self {
+ Self::default()
+ }
+
+ /// Create a disabled config.
+ pub fn disabled() -> Self {
+ Self {
+ enabled: false,
+ ..Self::default()
+ }
+ }
+
+ /// Set the hot node threshold.
+ pub fn with_hot_threshold(mut self, threshold: u32) -> Self {
+ self.hot_node_threshold = threshold;
+ self
+ }
+
+ /// Set whether to build the summary shortcut.
+ pub fn with_summary_shortcut(mut self, build: bool) -> Self {
+ self.build_summary_shortcut = build;
+ self
+ }
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn test_reasoning_index_default() {
+ let index = ReasoningIndex::default();
+ assert_eq!(index.topic_count(), 0);
+ assert_eq!(index.section_count(), 0);
+ assert_eq!(index.hot_node_count(), 0);
+ assert!(index.summary_shortcut().is_none());
+ }
+
+ #[test]
+ fn test_builder_basic() {
+ // Create a simple tree to get valid NodeIds
+ let mut tree = crate::document::DocumentTree::new("Root", "root content");
+ let child1 = tree.add_child(tree.root(), "Introduction", "intro content");
+ let child2 = tree.add_child(tree.root(), "Methods", "methods content");
+
+ let mut builder = ReasoningIndexBuilder::new();
+ builder.add_section("Introduction", child1);
+ builder.add_section("Methods", child2);
+
+ let index = builder.build();
+ assert_eq!(index.section_count(), 2);
+ assert!(index.find_section("introduction").is_some());
+ assert!(index.find_section("INTRODUCTION").is_some());
+ assert!(index.find_section("methods").is_some());
+ }
+
+ #[test]
+ fn test_config_default() {
+ let config = ReasoningIndexConfig::default();
+ assert!(config.enabled);
+ assert_eq!(config.hot_node_threshold, 3);
+ assert!(config.build_summary_shortcut);
+ }
+
+ #[test]
+ fn test_config_disabled() {
+ let config = ReasoningIndexConfig::disabled();
+ assert!(!config.enabled);
+ }
+}
diff --git a/rust/src/error.rs b/rust/src/error.rs
index a5caacad..1564697e 100644
--- a/rust/src/error.rs
+++ b/rust/src/error.rs
@@ -41,6 +41,10 @@ pub enum Error {
#[error("Index corrupted: {0}")]
IndexCorrupted(String),
+ /// Document graph build error.
+ #[error("Document graph build error: {0}")]
+ GraphBuild(String),
+
// =========================================================================
// Retrieval Errors
// =========================================================================
diff --git a/rust/src/index/config.rs b/rust/src/index/config.rs
index f5cabebc..5d982183 100644
--- a/rust/src/index/config.rs
+++ b/rust/src/index/config.rs
@@ -11,6 +11,7 @@
use super::summary::SummaryStrategy;
use crate::config::{ConcurrencyConfig, IndexerConfig};
+use crate::document::ReasoningIndexConfig;
/// Index mode for document processing.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
@@ -153,6 +154,9 @@ pub struct PipelineOptions {
/// Indexer configuration.
pub indexer: IndexerConfig,
+
+ /// Reasoning index configuration.
+ pub reasoning_index: ReasoningIndexConfig,
}
impl Default for PipelineOptions {
@@ -166,6 +170,7 @@ impl Default for PipelineOptions {
generate_description: true,
concurrency: ConcurrencyConfig::default(),
indexer: IndexerConfig::default(),
+ reasoning_index: ReasoningIndexConfig::default(),
}
}
}
@@ -223,6 +228,12 @@ impl PipelineOptions {
self.indexer = indexer;
self
}
+
+ /// Set the reasoning index configuration.
+ pub fn with_reasoning_index(mut self, config: ReasoningIndexConfig) -> Self {
+ self.reasoning_index = config;
+ self
+ }
}
#[cfg(test)]
diff --git a/rust/src/index/graph_builder.rs b/rust/src/index/graph_builder.rs
new file mode 100644
index 00000000..b749cc14
--- /dev/null
+++ b/rust/src/index/graph_builder.rs
@@ -0,0 +1,409 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Document Graph Builder — constructs cross-document relationship graphs.
+//!
+//! This is a standalone builder (not an `IndexStage`) because it operates
+//! on the workspace level across all documents, not on a single document.
+
+use std::collections::HashMap;
+
+use tracing::info;
+
+use crate::document::{
+ DocumentGraph, DocumentGraphConfig, DocumentGraphNode, EdgeEvidence, GraphEdge, SharedKeyword,
+ WeightedKeyword,
+};
+
+/// Intermediate data collected per document during graph building.
+#[derive(Debug, Clone)]
+struct DocProfile {
+ doc_id: String,
+ title: String,
+ format: String,
+ node_count: usize,
+ /// keyword → aggregate weight
+ keywords: HashMap,
+}
+
+/// Builder for constructing a `DocumentGraph` from multiple documents.
+pub struct DocumentGraphBuilder {
+ config: DocumentGraphConfig,
+ profiles: Vec,
+}
+
+impl DocumentGraphBuilder {
+ /// Create a new builder with the given configuration.
+ pub fn new(config: DocumentGraphConfig) -> Self {
+ Self {
+ config,
+ profiles: Vec::new(),
+ }
+ }
+
+ /// Create a builder with default configuration.
+ pub fn with_defaults() -> Self {
+ Self::new(DocumentGraphConfig::default())
+ }
+
+ /// Add a document's keyword profile to the builder.
+ ///
+ /// `keywords` should map keyword → aggregate weight (from
+ /// `ReasoningIndex::topic_paths` or extracted from content).
+ pub fn add_document(
+ &mut self,
+ doc_id: impl Into,
+ title: impl Into,
+ format: impl Into,
+ node_count: usize,
+ keywords: HashMap,
+ ) {
+ self.profiles.push(DocProfile {
+ doc_id: doc_id.into(),
+ title: title.into(),
+ format: format.into(),
+ node_count,
+ keywords,
+ });
+ }
+
+ /// Build the document graph from accumulated document profiles.
+ pub fn build(self) -> DocumentGraph {
+ let mut graph = DocumentGraph::new();
+
+ if self.profiles.is_empty() {
+ info!("Building document graph: 0 documents, empty graph");
+ return graph;
+ }
+
+ // Step 1: Add document nodes with top-N keywords
+ for profile in &self.profiles {
+ let mut weighted: Vec = profile
+ .keywords
+ .iter()
+ .map(|(kw, &w)| WeightedKeyword {
+ keyword: kw.clone(),
+ weight: w,
+ })
+ .collect();
+ // Sort by weight descending
+ weighted.sort_by(|a, b| {
+ b.weight
+ .partial_cmp(&a.weight)
+ .unwrap_or(std::cmp::Ordering::Equal)
+ });
+ weighted.truncate(self.config.max_keywords_per_doc);
+
+ graph.add_node(DocumentGraphNode {
+ doc_id: profile.doc_id.clone(),
+ title: profile.title.clone(),
+ format: profile.format.clone(),
+ top_keywords: weighted,
+ node_count: profile.node_count,
+ });
+ }
+
+ info!(
+ "Building document graph: {} document nodes added",
+ graph.node_count()
+ );
+
+ // Step 2: Compute edges using the keyword inverted index
+ // (already built inside graph.add_node via keyword_index)
+ self.compute_edges(&mut graph);
+
+ info!(
+ "Document graph built: {} nodes, {} edges",
+ graph.node_count(),
+ graph.edge_count()
+ );
+
+ graph
+ }
+
+ /// Compute edges between documents based on shared keywords.
+ fn compute_edges(&self, graph: &mut DocumentGraph) {
+ // Collect candidate pairs: (doc_a, doc_b) → shared keywords
+ let mut pair_shared: HashMap<(String, String), Vec> = HashMap::new();
+
+ // Iterate the keyword index: for each keyword, all docs sharing it are candidates
+ let kw_index = graph.keyword_index_clone();
+
+ for (keyword, entries) in &kw_index {
+ if entries.len() < 2 {
+ continue; // No pair possible
+ }
+ // For every pair of documents sharing this keyword
+ for i in 0..entries.len() {
+ for j in (i + 1)..entries.len() {
+ let a = &entries[i];
+ let b = &entries[j];
+ let pair = if a.doc_id < b.doc_id {
+ (a.doc_id.clone(), b.doc_id.clone())
+ } else {
+ (b.doc_id.clone(), a.doc_id.clone())
+ };
+ let shared = SharedKeyword {
+ keyword: keyword.clone(),
+ source_weight: a.weight,
+ target_weight: b.weight,
+ };
+ pair_shared.entry(pair).or_default().push(shared);
+ }
+ }
+ }
+
+ // Step 3: Create edges for pairs that meet thresholds
+ for ((doc_a, doc_b), shared_kws) in pair_shared {
+ let shared_count = shared_kws.len();
+ if shared_count < self.config.min_shared_keywords {
+ continue;
+ }
+
+ // Compute Jaccard: |intersection| / |union|
+ let kw_a = graph
+ .get_node(&doc_a)
+ .map(|n| n.top_keywords.len())
+ .unwrap_or(0);
+ let kw_b = graph
+ .get_node(&doc_b)
+ .map(|n| n.top_keywords.len())
+ .unwrap_or(0);
+ let union_size = kw_a + kw_b - shared_count;
+ let jaccard = if union_size > 0 {
+ shared_count as f32 / union_size as f32
+ } else {
+ 0.0
+ };
+
+ if jaccard < self.config.min_keyword_jaccard {
+ continue;
+ }
+
+ // Edge weight: combine Jaccard with keyword count
+ let max_kws = self.config.max_keywords_per_doc.max(1) as f32;
+ let weight = (jaccard * 0.6 + (shared_count as f32 / max_kws).min(1.0) * 0.4).min(1.0);
+
+ // Create bidirectional edges
+ let evidence_a = EdgeEvidence {
+ shared_keywords: shared_kws.clone(),
+ shared_keyword_count: shared_count,
+ keyword_jaccard: jaccard,
+ };
+ let evidence_b = EdgeEvidence {
+ shared_keywords: shared_kws
+ .iter()
+ .map(|s| SharedKeyword {
+ keyword: s.keyword.clone(),
+ source_weight: s.target_weight,
+ target_weight: s.source_weight,
+ })
+ .collect(),
+ shared_keyword_count: shared_count,
+ keyword_jaccard: jaccard,
+ };
+
+ graph.add_edge(
+ &doc_a,
+ GraphEdge {
+ target_doc_id: doc_b.clone(),
+ weight,
+ evidence: evidence_a,
+ },
+ );
+ graph.add_edge(
+ &doc_b,
+ GraphEdge {
+ target_doc_id: doc_a.clone(),
+ weight,
+ evidence: evidence_b,
+ },
+ );
+ }
+
+ // Step 4: Trim edges per node to max_edges_per_node
+ self.trim_edges(graph);
+ }
+
+ /// Trim edges per node to the configured maximum.
+ fn trim_edges(&self, graph: &mut DocumentGraph) {
+ let max = self.config.max_edges_per_node;
+ let all_edges = graph.take_edges();
+ let mut trimmed: HashMap> = HashMap::new();
+
+ for (source, mut edges) in all_edges {
+ edges.sort_by(|a, b| {
+ b.weight
+ .partial_cmp(&a.weight)
+ .unwrap_or(std::cmp::Ordering::Equal)
+ });
+ edges.truncate(max);
+ trimmed.insert(source, edges);
+ }
+
+ graph.set_edges(trimmed);
+ }
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ fn make_keywords(pairs: &[(&str, f32)]) -> HashMap {
+ pairs
+ .iter()
+ .map(|&(k, w)| (k.to_string(), w))
+ .collect()
+ }
+
+ #[test]
+ fn test_empty_workspace() {
+ let builder = DocumentGraphBuilder::with_defaults();
+ let graph = builder.build();
+ assert!(graph.is_empty());
+ }
+
+ #[test]
+ fn test_single_document() {
+ let mut builder = DocumentGraphBuilder::with_defaults();
+ builder.add_document(
+ "doc1",
+ "Test",
+ "md",
+ 5,
+ make_keywords(&[("rust", 0.9), ("async", 0.7)]),
+ );
+ let graph = builder.build();
+ assert_eq!(graph.node_count(), 1);
+ assert_eq!(graph.edge_count(), 0);
+ }
+
+ #[test]
+ fn test_two_docs_shared_keywords() {
+ let mut builder = DocumentGraphBuilder::new(DocumentGraphConfig {
+ min_keyword_jaccard: 0.05,
+ min_shared_keywords: 2,
+ ..DocumentGraphConfig::default()
+ });
+ builder.add_document(
+ "doc1",
+ "Rust Programming",
+ "md",
+ 10,
+ make_keywords(&[("rust", 0.9), ("async", 0.8), ("tokio", 0.6)]),
+ );
+ builder.add_document(
+ "doc2",
+ "Async Rust",
+ "md",
+ 8,
+ make_keywords(&[("rust", 0.7), ("async", 0.9), ("futures", 0.5)]),
+ );
+
+ let graph = builder.build();
+ assert_eq!(graph.node_count(), 2);
+ // Should have bidirectional edges
+ assert!(graph.edge_count() >= 2);
+
+ // Check doc1 → doc2 edge
+ let neighbors = graph.get_neighbors("doc1");
+ assert_eq!(neighbors.len(), 1);
+ assert_eq!(neighbors[0].target_doc_id, "doc2");
+ assert!(neighbors[0].weight > 0.0);
+ assert!(neighbors[0].evidence.keyword_jaccard > 0.0);
+ assert!(neighbors[0].evidence.shared_keyword_count >= 2);
+
+ // Check doc2 → doc1 edge (bidirectional)
+ let neighbors2 = graph.get_neighbors("doc2");
+ assert_eq!(neighbors2.len(), 1);
+ assert_eq!(neighbors2[0].target_doc_id, "doc1");
+ }
+
+ #[test]
+ fn test_unrelated_docs_no_edge() {
+ let mut builder = DocumentGraphBuilder::new(DocumentGraphConfig {
+ min_keyword_jaccard: 0.1,
+ min_shared_keywords: 2,
+ ..DocumentGraphConfig::default()
+ });
+ builder.add_document(
+ "doc1",
+ "Rust Guide",
+ "md",
+ 10,
+ make_keywords(&[("rust", 0.9), ("ownership", 0.8)]),
+ );
+ builder.add_document(
+ "doc2",
+ "Cooking Recipes",
+ "md",
+ 8,
+ make_keywords(&[("pasta", 0.9), ("sauce", 0.8)]),
+ );
+
+ let graph = builder.build();
+ assert_eq!(graph.node_count(), 2);
+ assert_eq!(graph.edge_count(), 0);
+ }
+
+ #[test]
+ fn test_jaccard_threshold() {
+ let mut builder = DocumentGraphBuilder::new(DocumentGraphConfig {
+ min_keyword_jaccard: 0.9, // Very high threshold
+ min_shared_keywords: 1,
+ ..DocumentGraphConfig::default()
+ });
+ // Two docs with minimal overlap
+ builder.add_document(
+ "doc1",
+ "A",
+ "md",
+ 5,
+ make_keywords(&[
+ ("a", 0.9),
+ ("b", 0.8),
+ ("c", 0.7),
+ ("d", 0.6),
+ ("e", 0.5),
+ ]),
+ );
+ builder.add_document(
+ "doc2",
+ "B",
+ "md",
+ 5,
+ make_keywords(&[("a", 0.9), ("x", 0.8), ("y", 0.7), ("z", 0.6)]),
+ );
+
+ let graph = builder.build();
+ // Only 1 shared keyword out of 5+4=9 unique, Jaccard = 1/8 ≈ 0.125
+ // Way below 0.9 threshold → no edge
+ assert_eq!(graph.edge_count(), 0);
+ }
+
+ #[test]
+ fn test_max_edges_per_node() {
+ let mut builder = DocumentGraphBuilder::new(DocumentGraphConfig {
+ min_keyword_jaccard: 0.01,
+ min_shared_keywords: 1,
+ max_edges_per_node: 2,
+ ..DocumentGraphConfig::default()
+ });
+
+ // 4 docs all sharing keywords with doc1
+ for i in 0..4 {
+ builder.add_document(
+ format!("doc{}", i),
+ format!("Doc {}", i),
+ "md",
+ 5,
+ make_keywords(&[("shared", 0.9), ("common", 0.8)]),
+ );
+ }
+
+ let graph = builder.build();
+ // doc1 should have at most 2 outgoing edges
+ let neighbors = graph.get_neighbors("doc0");
+ assert!(neighbors.len() <= 2);
+ }
+}
diff --git a/rust/src/index/mod.rs b/rust/src/index/mod.rs
index 51a18ec5..6072e255 100644
--- a/rust/src/index/mod.rs
+++ b/rust/src/index/mod.rs
@@ -36,6 +36,7 @@
//! ```
pub mod config;
+pub mod graph_builder;
pub mod incremental;
pub mod pipeline;
pub mod stages;
diff --git a/rust/src/index/pipeline/context.rs b/rust/src/index/pipeline/context.rs
index 979839a8..9fffdcf0 100644
--- a/rust/src/index/pipeline/context.rs
+++ b/rust/src/index/pipeline/context.rs
@@ -6,7 +6,7 @@
use std::collections::HashMap;
use std::path::PathBuf;
-use crate::document::{DocumentTree, NodeId};
+use crate::document::{DocumentTree, NodeId, ReasoningIndex};
use crate::llm::LlmClient;
use crate::parser::{DocumentFormat, RawNode};
@@ -242,6 +242,9 @@ pub struct IndexContext {
/// Summary cache for lazy generation.
pub summary_cache: SummaryCache,
+ /// Pre-computed reasoning index (built by ReasoningIndexStage).
+ pub reasoning_index: Option,
+
/// Stage execution results.
pub stage_results: HashMap,
@@ -272,6 +275,7 @@ impl IndexContext {
options,
llm_client: None,
summary_cache: SummaryCache::default(),
+ reasoning_index: None,
stage_results: HashMap::new(),
metrics: IndexMetrics::default(),
description: None,
@@ -345,6 +349,7 @@ impl IndexContext {
line_count: self.line_count,
metrics: self.metrics,
summary_cache: self.summary_cache,
+ reasoning_index: self.reasoning_index,
}
}
}
@@ -381,6 +386,9 @@ pub struct IndexResult {
/// Summary cache.
pub summary_cache: SummaryCache,
+
+ /// Pre-computed reasoning index for retrieval acceleration.
+ pub reasoning_index: Option,
}
impl IndexResult {
@@ -400,6 +408,7 @@ impl IndexResult {
+ self.metrics.build_time_ms
+ self.metrics.enhance_time_ms
+ self.metrics.enrich_time_ms
+ + self.metrics.reasoning_index_time_ms
+ self.metrics.optimize_time_ms
+ self.metrics.persist_time_ms
}
diff --git a/rust/src/index/pipeline/executor.rs b/rust/src/index/pipeline/executor.rs
index 83649271..09f548e1 100644
--- a/rust/src/index/pipeline/executor.rs
+++ b/rust/src/index/pipeline/executor.rs
@@ -14,6 +14,7 @@ use crate::llm::LlmClient;
use super::super::PipelineOptions;
use super::super::stages::{
BuildStage, EnhanceStage, EnrichStage, IndexStage, OptimizeStage, ParseStage, PersistStage,
+ ReasoningIndexStage,
};
use super::context::{IndexInput, IndexResult};
use super::orchestrator::PipelineOrchestrator;
@@ -51,12 +52,14 @@ impl PipelineExecutor {
/// 1. `parse` - Parse document into raw nodes
/// 2. `build` - Build tree structure
/// 3. `enrich` - Add metadata and cross-references
- /// 4. `optimize` - Optimize tree structure
+ /// 4. `reasoning_index` - Build pre-computed reasoning index
+ /// 5. `optimize` - Optimize tree structure
pub fn new() -> Self {
let orchestrator = PipelineOrchestrator::new()
.stage_with_priority(ParseStage::new(), 10)
.stage_with_priority(BuildStage::new(), 20)
.stage_with_priority(EnrichStage::new(), 40)
+ .stage_with_priority(ReasoningIndexStage::new(), 45)
.stage_with_priority(OptimizeStage::new(), 60);
Self { orchestrator }
@@ -69,13 +72,15 @@ impl PipelineExecutor {
/// 2. `build` - Build tree
/// 3. `enhance` - LLM-based enhancement (summaries)
/// 4. `enrich` - Add metadata
- /// 5. `optimize` - Optimize tree
+ /// 5. `reasoning_index` - Build pre-computed reasoning index
+ /// 6. `optimize` - Optimize tree
pub fn with_llm(client: LlmClient) -> Self {
let orchestrator = PipelineOrchestrator::new()
.stage_with_priority(ParseStage::new(), 10)
.stage_with_priority(BuildStage::new(), 20)
.stage_with_priority(EnhanceStage::with_llm_client(client), 30)
.stage_with_priority(EnrichStage::new(), 40)
+ .stage_with_priority(ReasoningIndexStage::new(), 45)
.stage_with_priority(OptimizeStage::new(), 60);
Self { orchestrator }
diff --git a/rust/src/index/pipeline/metrics.rs b/rust/src/index/pipeline/metrics.rs
index 6e4bb51e..e731e7a7 100644
--- a/rust/src/index/pipeline/metrics.rs
+++ b/rust/src/index/pipeline/metrics.rs
@@ -32,6 +32,18 @@ pub struct IndexMetrics {
#[serde(default)]
pub persist_time_ms: u64,
+ /// Reasoning index build duration (ms).
+ #[serde(default)]
+ pub reasoning_index_time_ms: u64,
+
+ /// Number of topics indexed in reasoning index.
+ #[serde(default)]
+ pub topics_indexed: usize,
+
+ /// Number of keywords indexed in reasoning index.
+ #[serde(default)]
+ pub keywords_indexed: usize,
+
/// Total tokens generated (summaries).
#[serde(default)]
pub total_tokens_generated: usize,
@@ -93,6 +105,13 @@ impl IndexMetrics {
self.persist_time_ms = duration_ms;
}
+ /// Record reasoning index build time.
+ pub fn record_reasoning_index(&mut self, duration_ms: u64, topics: usize, keywords: usize) {
+ self.reasoning_index_time_ms = duration_ms;
+ self.topics_indexed = topics;
+ self.keywords_indexed = keywords;
+ }
+
/// Increment LLM calls.
pub fn increment_llm_calls(&mut self) {
self.llm_calls += 1;
@@ -129,6 +148,7 @@ impl IndexMetrics {
+ self.build_time_ms
+ self.enhance_time_ms
+ self.enrich_time_ms
+ + self.reasoning_index_time_ms
+ self.optimize_time_ms
+ self.persist_time_ms
}
diff --git a/rust/src/index/stages/mod.rs b/rust/src/index/stages/mod.rs
index 5a55383d..2022ffae 100644
--- a/rust/src/index/stages/mod.rs
+++ b/rust/src/index/stages/mod.rs
@@ -9,6 +9,7 @@ mod enrich;
mod optimize;
mod parse;
mod persist;
+mod reasoning;
pub use build::BuildStage;
pub use enhance::EnhanceStage;
@@ -16,6 +17,7 @@ pub use enrich::EnrichStage;
pub use optimize::OptimizeStage;
pub use parse::ParseStage;
pub use persist::PersistStage;
+pub use reasoning::ReasoningIndexStage;
use super::pipeline::{FailurePolicy, IndexContext, StageResult};
use crate::error::Result;
diff --git a/rust/src/index/stages/persist.rs b/rust/src/index/stages/persist.rs
index 26d3aad4..509bc874 100644
--- a/rust/src/index/stages/persist.rs
+++ b/rust/src/index/stages/persist.rs
@@ -51,9 +51,14 @@ impl PersistStage {
let doc = PersistedDocument::new(meta, tree.clone());
- // Add pages if available (for PDFs)
// Note: pages would need to be stored in context during parse stage
+ // Attach reasoning index if available
+ let mut doc = doc;
+ if let Some(ref reasoning_index) = ctx.reasoning_index {
+ doc.reasoning_index = Some(reasoning_index.clone());
+ }
+
workspace.add(&doc).await?;
info!("Saved document {} to workspace", ctx.doc_id);
diff --git a/rust/src/index/stages/reasoning.rs b/rust/src/index/stages/reasoning.rs
new file mode 100644
index 00000000..804dcb19
--- /dev/null
+++ b/rust/src/index/stages/reasoning.rs
@@ -0,0 +1,345 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Reasoning Index Stage - Build pre-computed reasoning index.
+//!
+//! This stage runs after EnrichStage (which generates descriptions and
+//! calculates metadata) and before OptimizeStage. It builds a
+//! [`ReasoningIndex`] from the document tree's TOC, summaries, and keywords.
+
+use std::time::Instant;
+use tracing::info;
+
+use crate::document::{
+ NodeId, ReasoningIndex, ReasoningIndexBuilder, ReasoningIndexConfig, SectionSummary,
+ SummaryShortcut, TopicEntry,
+};
+use crate::error::Result;
+use crate::retrieval::search::extract_keywords;
+
+use super::async_trait;
+use super::{IndexStage, StageResult};
+use crate::index::pipeline::IndexContext;
+
+/// Reasoning Index Stage - builds a pre-computed reasoning index from the document tree.
+///
+/// This stage creates a [`ReasoningIndex`] containing:
+/// - Topic-to-path mappings from titles and summaries
+/// - Summary shortcuts for high-frequency "overview" queries
+/// - Section map for fast ToC lookup
+pub struct ReasoningIndexStage {
+ config: ReasoningIndexConfig,
+}
+
+impl ReasoningIndexStage {
+ /// Create a new reasoning index stage with default config.
+ pub fn new() -> Self {
+ Self {
+ config: ReasoningIndexConfig::default(),
+ }
+ }
+
+ /// Create with custom config.
+ pub fn with_config(config: ReasoningIndexConfig) -> Self {
+ Self { config }
+ }
+
+ /// Extract keywords from a text, filtering by minimum length.
+ fn extract_node_keywords(text: &str, min_length: usize) -> Vec {
+ extract_keywords(text)
+ .into_iter()
+ .filter(|k: &String| k.len() >= min_length)
+ .collect()
+ }
+
+ /// Build the topic-to-path mapping by extracting keywords from all nodes.
+ fn build_topic_paths(
+ tree: &crate::document::DocumentTree,
+ config: &ReasoningIndexConfig,
+ ) -> (std::collections::HashMap>, usize) {
+ let mut keyword_nodes: std::collections::HashMap> =
+ std::collections::HashMap::new();
+
+ // Walk all nodes and extract keywords from title + summary
+ for node_id in tree.traverse() {
+ if let Some(node) = tree.get(node_id) {
+ let title_keywords = Self::extract_node_keywords(&node.title, config.min_keyword_length);
+ let summary_keywords = Self::extract_node_keywords(&node.summary, config.min_keyword_length);
+ let content_keywords = if node.summary.is_empty() {
+ // Fallback: extract from content if no summary
+ let content_sample: String = node.content.chars().take(500).collect();
+ Self::extract_node_keywords(&content_sample, config.min_keyword_length)
+ } else {
+ Vec::new()
+ };
+
+ // Title keywords get higher weight (2.0), summary (1.5), content (1.0)
+ for kw in &title_keywords {
+ keyword_nodes
+ .entry(kw.clone())
+ .or_default()
+ .push((node_id, 2.0, node.depth));
+ }
+ for kw in &summary_keywords {
+ keyword_nodes
+ .entry(kw.clone())
+ .or_default()
+ .push((node_id, 1.5, node.depth));
+ }
+ for kw in &content_keywords {
+ keyword_nodes
+ .entry(kw.clone())
+ .or_default()
+ .push((node_id, 1.0, node.depth));
+ }
+ }
+ }
+
+ // Sort by keyword frequency (most common first) and trim to max_keyword_entries
+ let mut sorted_keywords: Vec<_> = keyword_nodes.into_iter().collect();
+ sorted_keywords.sort_by(|a, b| b.1.len().cmp(&a.1.len()));
+ sorted_keywords.truncate(config.max_keyword_entries);
+
+ let keyword_count = sorted_keywords.len();
+
+ // Build topic_paths: merge duplicate (keyword, node) pairs
+ let mut topic_paths: std::collections::HashMap> =
+ std::collections::HashMap::new();
+
+ for (keyword, entries) in sorted_keywords {
+ // Merge duplicate node entries by summing weights
+ let mut merged: std::collections::HashMap =
+ std::collections::HashMap::new();
+ for (node_id, weight, depth) in entries {
+ let entry = merged.entry(node_id).or_insert((0.0, depth));
+ entry.0 += weight;
+ }
+
+ // Normalize weights to 0.0-1.0 range
+ let max_weight = merged.values().map(|(w, _)| *w).fold(0.0_f32, f32::max);
+ let scale = if max_weight > 0.0 { 1.0 / max_weight } else { 1.0 };
+
+ let mut topic_entries: Vec = merged
+ .into_iter()
+ .map(|(node_id, (weight, depth))| TopicEntry {
+ node_id,
+ weight: weight * scale,
+ depth,
+ })
+ .collect();
+
+ topic_entries.sort_by(|a, b| {
+ b.weight
+ .partial_cmp(&a.weight)
+ .unwrap_or(std::cmp::Ordering::Equal)
+ });
+ topic_entries.truncate(config.max_topic_entries);
+
+ topic_paths.insert(keyword, topic_entries);
+ }
+
+ (topic_paths, keyword_count)
+ }
+
+ /// Build section map from depth-1 nodes.
+ fn build_section_map(tree: &crate::document::DocumentTree) -> std::collections::HashMap {
+ let mut section_map = std::collections::HashMap::new();
+ let root = tree.root();
+ for child_id in tree.children(root) {
+ if let Some(node) = tree.get(child_id) {
+ section_map.insert(node.title.to_lowercase(), child_id);
+ // Also index by structure index (e.g. "1", "2", "3")
+ if !node.structure.is_empty() {
+ section_map.insert(node.structure.clone(), child_id);
+ }
+ }
+ }
+ section_map
+ }
+
+ /// Build summary shortcut from root and depth-1 nodes.
+ fn build_summary_shortcut(
+ tree: &crate::document::DocumentTree,
+ ) -> Option {
+ let root = tree.root();
+ let root_node = tree.get(root)?;
+
+ // Collect document summary from root
+ let document_summary = if !root_node.summary.is_empty() {
+ root_node.summary.clone()
+ } else {
+ // Fallback: concatenate depth-1 summaries
+ let mut parts = Vec::new();
+ for child_id in tree.children(root) {
+ if let Some(child) = tree.get(child_id) {
+ if !child.summary.is_empty() {
+ parts.push(format!("{}: {}", child.title, child.summary));
+ }
+ }
+ }
+ parts.join("\n")
+ };
+
+ // Collect section summaries
+ let mut section_summaries = Vec::new();
+ for child_id in tree.children(root) {
+ if let Some(child) = tree.get(child_id) {
+ section_summaries.push(SectionSummary {
+ node_id: child_id,
+ title: child.title.clone(),
+ summary: child.summary.clone(),
+ depth: child.depth,
+ });
+ }
+ }
+
+ Some(SummaryShortcut {
+ root_node: root,
+ section_summaries,
+ document_summary,
+ })
+ }
+}
+
+impl Default for ReasoningIndexStage {
+ fn default() -> Self {
+ Self::new()
+ }
+}
+
+#[async_trait]
+impl IndexStage for ReasoningIndexStage {
+ fn name(&self) -> &'static str {
+ "reasoning_index"
+ }
+
+ fn depends_on(&self) -> Vec<&'static str> {
+ vec!["enrich"]
+ }
+
+ fn is_optional(&self) -> bool {
+ true
+ }
+
+ async fn execute(&mut self, ctx: &mut IndexContext) -> Result {
+ let start = Instant::now();
+
+ // Check if enabled via pipeline options
+ if !ctx.options.reasoning_index.enabled {
+ info!("Reasoning index stage disabled, skipping");
+ return Ok(StageResult::success("reasoning_index"));
+ }
+
+ // Use stage config, overridden by pipeline options
+ let config = &ctx.options.reasoning_index;
+
+ let tree = match ctx.tree.as_ref() {
+ Some(t) => t,
+ None => {
+ return Ok(StageResult::failure(
+ "reasoning_index",
+ "Tree not built",
+ ));
+ }
+ };
+
+ info!("Building reasoning index...");
+
+ // 1. Build topic-to-path mapping
+ let (topic_paths, keyword_count) = Self::build_topic_paths(tree, config);
+ let topic_count: usize = topic_paths.values().map(|v| v.len()).sum();
+ info!(
+ "Built topic paths: {} keywords, {} topic entries",
+ keyword_count, topic_count
+ );
+
+ // 2. Build section map
+ let section_map = Self::build_section_map(tree);
+ info!("Built section map with {} entries", section_map.len());
+
+ // 3. Build summary shortcut
+ let summary_shortcut = if config.build_summary_shortcut {
+ let shortcut = Self::build_summary_shortcut(tree);
+ if shortcut.is_some() {
+ info!("Built summary shortcut");
+ }
+ shortcut
+ } else {
+ None
+ };
+
+ // 4. Assemble the reasoning index
+ let mut builder = ReasoningIndexBuilder::new();
+ for (keyword, entries) in topic_paths {
+ for entry in entries {
+ builder.add_topic_entry(&keyword, entry);
+ }
+ }
+ for (title, node_id) in section_map {
+ builder.add_section(&title, node_id);
+ }
+ if let Some(shortcut) = summary_shortcut {
+ builder = builder.summary_shortcut(shortcut);
+ }
+ builder.sort_and_trim(config.max_topic_entries);
+
+ let reasoning_index = builder.build();
+
+ let duration = start.elapsed().as_millis() as u64;
+ ctx.metrics
+ .record_reasoning_index(duration, topic_count, keyword_count);
+
+ info!(
+ "Reasoning index built in {}ms ({} keywords, {} topic entries, {} sections)",
+ duration,
+ keyword_count,
+ topic_count,
+ reasoning_index.section_count(),
+ );
+
+ ctx.reasoning_index = Some(reasoning_index);
+
+ let mut stage_result = StageResult::success("reasoning_index");
+ stage_result.duration_ms = duration;
+ stage_result.metadata.insert(
+ "keywords_indexed".to_string(),
+ serde_json::json!(keyword_count),
+ );
+ stage_result.metadata.insert(
+ "topics_indexed".to_string(),
+ serde_json::json!(topic_count),
+ );
+
+ Ok(stage_result)
+ }
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn test_extract_node_keywords() {
+ let keywords = ReasoningIndexStage::extract_node_keywords("Introduction to Machine Learning", 2);
+ assert!(keywords.contains(&"introduction".to_string()));
+ assert!(keywords.contains(&"machine".to_string()));
+ assert!(keywords.contains(&"learning".to_string()));
+ }
+
+ #[test]
+ fn test_extract_node_keywords_min_length() {
+ let keywords = ReasoningIndexStage::extract_node_keywords("A B CD", 2);
+ assert!(!keywords.contains(&"a".to_string()));
+ assert!(!keywords.contains(&"b".to_string()));
+ assert!(keywords.contains(&"cd".to_string()));
+ }
+
+ #[test]
+ fn test_stage_config_default() {
+ let stage = ReasoningIndexStage::new();
+ assert!(stage.config.enabled);
+ assert_eq!(stage.name(), "reasoning_index");
+ assert!(stage.is_optional());
+ assert_eq!(stage.depends_on(), vec!["enrich"]);
+ }
+}
diff --git a/rust/src/lib.rs b/rust/src/lib.rs
index 642afb03..d2ea3eac 100644
--- a/rust/src/lib.rs
+++ b/rust/src/lib.rs
@@ -252,9 +252,9 @@ pub use index::{
// Retrieval
pub use retrieval::{
ContextBuilder, NavigationDecision, NavigationStep, PipelineRetriever, PruningStrategy,
- QueryComplexity, RetrievalContext, RetrievalResult, RetrieveOptions, RetrieveResponse,
- Retriever, RetrieverError, RetrieverResult, SearchPath, StrategyPreference, SufficiencyLevel,
- TokenEstimation, format_for_llm, format_for_llm_async, format_tree_for_llm,
+ QueryComplexity, RetrievalContext, RetrievalResult, RetrieveEvent, RetrieveOptions,
+ RetrieveResponse, Retriever, RetrieverError, RetrieverResult, SearchPath, StrategyPreference,
+ SufficiencyLevel, TokenEstimation, format_for_llm, format_for_llm_async, format_tree_for_llm,
format_tree_for_llm_async,
};
diff --git a/rust/src/retrieval/cache/hot_tracker.rs b/rust/src/retrieval/cache/hot_tracker.rs
new file mode 100644
index 00000000..bad19bdd
--- /dev/null
+++ b/rust/src/retrieval/cache/hot_tracker.rs
@@ -0,0 +1,191 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Hot node tracker for recording retrieval frequency.
+//!
+//! Thread-safe tracker that records which nodes are frequently retrieved.
+//! Nodes that exceed a configured hit-count threshold are marked as "hot",
+//! which can boost their scores in future retrieval operations.
+
+use std::collections::HashMap;
+use std::sync::RwLock;
+
+use crate::document::NodeId;
+use crate::document::HotNodeEntry;
+
+/// Thread-safe tracker for hot (frequently retrieved) nodes.
+pub struct HotNodeTracker {
+ inner: RwLock,
+ hot_threshold: u32,
+}
+
+struct HotNodeTrackerInner {
+ hits: HashMap,
+ scores: HashMap,
+}
+
+impl HotNodeTracker {
+ /// Create a new tracker with the given hot threshold.
+ pub fn new(hot_threshold: u32) -> Self {
+ Self {
+ inner: RwLock::new(HotNodeTrackerInner {
+ hits: HashMap::new(),
+ scores: HashMap::new(),
+ }),
+ hot_threshold,
+ }
+ }
+
+ /// Record that a node was retrieved with a given score.
+ pub fn record_hit(&self, node_id: NodeId, score: f32) {
+ if let Ok(mut inner) = self.inner.write() {
+ let hits = *inner.hits.entry(node_id).or_insert(0) + 1;
+ inner.hits.insert(node_id, hits);
+
+ // Update running average score
+ let prev_avg = *inner.scores.entry(node_id).or_insert(0.0);
+ let new_avg = prev_avg + (score - prev_avg) / hits as f32;
+ inner.scores.insert(node_id, new_avg);
+ }
+ }
+
+ /// Record multiple hits at once.
+ pub fn record_hits(&self, hits: &[(NodeId, f32)]) {
+ for &(node_id, score) in hits {
+ self.record_hit(node_id, score);
+ }
+ }
+
+ /// Check if a node is considered "hot".
+ pub fn is_hot(&self, node_id: NodeId) -> bool {
+ self.inner
+ .read()
+ .map(|inner| {
+ inner.hits.get(&node_id).copied().unwrap_or(0) >= self.hot_threshold
+ })
+ .unwrap_or(false)
+ }
+
+ /// Get the hit count for a node.
+ pub fn hit_count(&self, node_id: NodeId) -> u32 {
+ self.inner
+ .read()
+ .map(|inner| inner.hits.get(&node_id).copied().unwrap_or(0))
+ .unwrap_or(0)
+ }
+
+ /// Get all hot nodes with their stats.
+ pub fn hot_nodes(&self) -> Vec<(NodeId, u32, f32)> {
+ self.inner
+ .read()
+ .map(|inner| {
+ inner
+ .hits
+ .iter()
+ .filter(|(_, count)| **count >= self.hot_threshold)
+ .map(|(node_id, count)| {
+ (
+ *node_id,
+ *count,
+ inner.scores.get(node_id).copied().unwrap_or(0.0),
+ )
+ })
+ .collect()
+ })
+ .unwrap_or_default()
+ }
+
+ /// Export hot node data into HotNodeEntry map for persistence.
+ pub fn export(&self) -> HashMap {
+ self.inner
+ .read()
+ .map(|inner| {
+ inner
+ .hits
+ .iter()
+ .map(|(node_id, hit_count)| {
+ let avg_score = inner.scores.get(node_id).copied().unwrap_or(0.0);
+ let is_hot = *hit_count >= self.hot_threshold;
+ (
+ *node_id,
+ HotNodeEntry {
+ hit_count: *hit_count,
+ avg_score,
+ is_hot,
+ },
+ )
+ })
+ .collect()
+ })
+ .unwrap_or_default()
+ }
+
+ /// Get the hot threshold.
+ pub fn hot_threshold(&self) -> u32 {
+ self.hot_threshold
+ }
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ fn make_node_ids() -> (NodeId, NodeId, NodeId) {
+ let mut tree = crate::document::DocumentTree::new("Root", "content");
+ let a = tree.add_child(tree.root(), "A", "a");
+ let b = tree.add_child(tree.root(), "B", "b");
+ let c = tree.add_child(tree.root(), "C", "c");
+ (a, b, c)
+ }
+
+ #[test]
+ fn test_hot_tracker_basic() {
+ let tracker = HotNodeTracker::new(3);
+
+ let (node, _, _) = make_node_ids();
+ tracker.record_hit(node, 0.8);
+ tracker.record_hit(node, 0.9);
+ assert!(!tracker.is_hot(node));
+ assert_eq!(tracker.hit_count(node), 2);
+
+ tracker.record_hit(node, 0.7);
+ assert!(tracker.is_hot(node));
+ assert_eq!(tracker.hit_count(node), 3);
+ }
+
+ #[test]
+ fn test_hot_tracker_export() {
+ let tracker = HotNodeTracker::new(2);
+
+ let (node_a, node_b, _) = make_node_ids();
+
+ tracker.record_hit(node_a, 0.8);
+ tracker.record_hit(node_a, 0.9);
+ tracker.record_hit(node_b, 0.5);
+
+ let exported = tracker.export();
+ assert!(exported[&node_a].is_hot);
+ assert!(!exported[&node_b].is_hot);
+ }
+
+ #[test]
+ fn test_hot_tracker_multiple_hits() {
+ let tracker = HotNodeTracker::new(1);
+
+ let (node_a, node_b, node_c) = make_node_ids();
+
+ let hits = vec![
+ (node_a, 0.9),
+ (node_b, 0.8),
+ (node_c, 0.7),
+ ];
+ tracker.record_hits(&hits);
+
+ assert!(tracker.is_hot(node_a));
+ assert!(tracker.is_hot(node_b));
+ assert!(tracker.is_hot(node_c));
+
+ let hot = tracker.hot_nodes();
+ assert_eq!(hot.len(), 3);
+ }
+}
diff --git a/rust/src/retrieval/cache/mod.rs b/rust/src/retrieval/cache/mod.rs
index 59c6c2cf..34202fd8 100644
--- a/rust/src/retrieval/cache/mod.rs
+++ b/rust/src/retrieval/cache/mod.rs
@@ -3,8 +3,19 @@
//! Caching for retrieval operations.
//!
-//! Caches search paths and node scores for repeated queries.
+//! Three-tier reasoning cache:
+//! - **L1**: Exact query match — instant cache hit for repeated queries
+//! - **L2**: Path pattern cache — reuse navigation decisions across queries
+//! - **L3**: Strategy score cache — share keyword/BM25 scores across queries
+//!
+//! Legacy `PathCache` remains for backward compatibility.
+mod hot_tracker;
mod path_cache;
+mod reasoning_cache;
+pub use hot_tracker::HotNodeTracker;
pub use path_cache::PathCache;
+pub use reasoning_cache::{
+ CachedCandidate, ReasoningCache, ReasoningCacheConfig, ReasoningCacheStats,
+};
diff --git a/rust/src/retrieval/cache/reasoning_cache.rs b/rust/src/retrieval/cache/reasoning_cache.rs
new file mode 100644
index 00000000..6dc87f87
--- /dev/null
+++ b/rust/src/retrieval/cache/reasoning_cache.rs
@@ -0,0 +1,490 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Tiered reasoning cache for the retrieval pipeline.
+//!
+//! Provides three levels of caching to avoid redundant computation:
+//!
+//! - **L1 (Exact)**: Cache full retrieval results keyed by exact query fingerprint.
+//! Identical queries return instantly.
+//!
+//! - **L2 (Path Pattern)**: Cache navigation decisions for tree paths. If a previous
+//! query navigated through Section 3.2, a new query about the same section can
+//! reuse those path cues even when the full query differs.
+//!
+//! - **L3 (Strategy Score)**: Cache node scores from keyword/BM25 strategies.
+//! Node scores are independent of the query, so they can be shared across
+//! different queries on the same document.
+
+use std::collections::HashMap;
+use std::sync::RwLock;
+use std::time::Instant;
+
+use crate::document::NodeId;
+use crate::retrieval::pipeline::CandidateNode;
+use crate::utils::fingerprint::Fingerprint;
+
+/// A tiered reasoning cache for the retrieval pipeline.
+///
+/// Thread-safe via `RwLock`. Each tier has independent size limits
+/// and TTL-based expiration.
+pub struct ReasoningCache {
+ /// L1: Exact query → cached candidate list.
+ l1: RwLock,
+ /// L2: Node path pattern → cached navigation cue score.
+ l2: RwLock,
+ /// L3: Node content fingerprint → cached strategy score.
+ l3: RwLock,
+ /// Configuration.
+ config: ReasoningCacheConfig,
+}
+
+/// Configuration for the reasoning cache.
+#[derive(Debug, Clone)]
+pub struct ReasoningCacheConfig {
+ /// Maximum L1 entries (exact query results).
+ pub l1_max: usize,
+ /// Maximum L2 entries (path patterns).
+ pub l2_max: usize,
+ /// Maximum L3 entries (strategy scores).
+ pub l3_max: usize,
+}
+
+impl Default for ReasoningCacheConfig {
+ fn default() -> Self {
+ Self {
+ l1_max: 200,
+ l2_max: 1000,
+ l3_max: 5000,
+ }
+ }
+}
+
+// ---- L1: Exact Query Cache ----
+
+#[derive(Debug, Clone)]
+struct L1Entry {
+ /// Fingerprint of the workspace + document set used for this query.
+ scope_fp: Fingerprint,
+ /// Cached candidate nodes (pre-sorted by score).
+ candidates: Vec,
+ /// Strategy used.
+ strategy: String,
+ /// When cached.
+ created_at: Instant,
+}
+
+/// A cached candidate from a previous retrieval.
+#[derive(Debug, Clone)]
+pub struct CachedCandidate {
+ /// Node ID.
+ pub node_id: NodeId,
+ /// Relevance score.
+ pub score: f32,
+ /// Depth in tree.
+ pub depth: usize,
+}
+
+struct L1Store {
+ entries: HashMap,
+ order: Vec, // For LRU eviction
+}
+
+// ---- L2: Path Pattern Cache ----
+
+#[derive(Debug, Clone)]
+struct L2Entry {
+ /// Score for this navigation cue.
+ confidence: f32,
+ /// How many times this path was relevant.
+ hit_count: usize,
+ created_at: Instant,
+}
+
+struct L2Store {
+ entries: HashMap, // Key: "doc_fp:node_path"
+ order: Vec,
+}
+
+// ---- L3: Strategy Score Cache ----
+
+#[derive(Debug, Clone)]
+struct L3Entry {
+ /// BM25/Keyword score.
+ score: f32,
+ /// Which strategy produced this score.
+ strategy: String,
+ created_at: Instant,
+}
+
+struct L3Store {
+ entries: HashMap, // Key: node content fingerprint
+ order: Vec,
+}
+
+// ---- Public API ----
+
+impl ReasoningCache {
+ /// Create a new reasoning cache with default configuration.
+ pub fn new() -> Self {
+ Self::with_config(ReasoningCacheConfig::default())
+ }
+
+ /// Create with custom configuration.
+ pub fn with_config(config: ReasoningCacheConfig) -> Self {
+ Self {
+ l1: RwLock::new(L1Store {
+ entries: HashMap::new(),
+ order: Vec::new(),
+ }),
+ l2: RwLock::new(L2Store {
+ entries: HashMap::new(),
+ order: Vec::new(),
+ }),
+ l3: RwLock::new(L3Store {
+ entries: HashMap::new(),
+ order: Vec::new(),
+ }),
+ config,
+ }
+ }
+
+ // ============ L1: Exact Query ============
+
+ /// Look up an exact query result.
+ ///
+ /// Returns cached candidates if the same query was executed before
+ /// on the same document scope.
+ pub fn l1_get(
+ &self,
+ query: &str,
+ scope_fp: &Fingerprint,
+ ) -> Option> {
+ let query_fp = Fingerprint::from_str(query);
+ let l1 = self.l1.read().ok()?;
+ let entry = l1.entries.get(&query_fp)?;
+ // Scope must match (same document set)
+ if &entry.scope_fp != scope_fp {
+ return None;
+ }
+ Some(entry.candidates.clone())
+ }
+
+ /// Store an L1 result.
+ pub fn l1_store(
+ &self,
+ query: &str,
+ scope_fp: Fingerprint,
+ candidates: Vec,
+ strategy: String,
+ ) {
+ let query_fp = Fingerprint::from_str(query);
+ if let Ok(mut l1) = self.l1.write() {
+ if l1.entries.len() >= self.config.l1_max {
+ Self::evict_lru_fingerprint(&mut l1);
+ }
+ l1.entries.insert(
+ query_fp,
+ L1Entry {
+ scope_fp,
+ candidates,
+ strategy,
+ created_at: Instant::now(),
+ },
+ );
+ l1.order.push(query_fp);
+ }
+ }
+
+ // ============ L2: Path Pattern ============
+
+ /// Look up a cached navigation confidence for a document + node path.
+ ///
+ /// If a previous query successfully navigated through this path,
+ /// return the confidence score.
+ pub fn l2_get(&self, doc_key: &str, node_path: &str) -> Option {
+ let key = format!("{}:{}", doc_key, node_path);
+ let l2 = self.l2.read().ok()?;
+ let entry = l2.entries.get(&key)?;
+ Some(entry.confidence)
+ }
+
+ /// Record a successful navigation through a path.
+ ///
+ /// Call this after retrieval confirms a path was relevant.
+ pub fn l2_record(&self, doc_key: &str, node_path: &str, confidence: f32) {
+ let key = format!("{}:{}", doc_key, node_path);
+ if let Ok(mut l2) = self.l2.write() {
+ if let Some(entry) = l2.entries.get_mut(&key) {
+ // Update running average
+ entry.hit_count += 1;
+ entry.confidence =
+ entry.confidence + (confidence - entry.confidence) / entry.hit_count as f32;
+ } else {
+ if l2.entries.len() >= self.config.l2_max {
+ Self::evict_lru_string(&mut l2);
+ }
+ l2.entries.insert(
+ key.clone(),
+ L2Entry {
+ confidence,
+ hit_count: 1,
+ created_at: Instant::now(),
+ },
+ );
+ l2.order.push(key);
+ }
+ }
+ }
+
+ /// Get top-N path hints for a document, sorted by confidence.
+ ///
+ /// Useful for bootstrapping new queries on a known document.
+ pub fn l2_top_paths(&self, doc_key: &str, n: usize) -> Vec<(String, f32)> {
+ let prefix = format!("{}:", doc_key);
+ let l2 = match self.l2.read() {
+ Ok(guard) => guard,
+ Err(_) => return Vec::new(),
+ };
+
+ let mut paths: Vec<(String, f32)> = l2
+ .entries
+ .iter()
+ .filter(|(k, _)| k.starts_with(&prefix))
+ .map(|(k, v)| (k[prefix.len()..].to_string(), v.confidence))
+ .collect();
+ paths.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
+ paths.truncate(n);
+ paths
+ }
+
+ // ============ L3: Strategy Score ============
+
+ /// Look up a cached strategy score for a node.
+ ///
+ /// Node scores from keyword/BM25 are content-dependent but
+ /// query-independent, so they can be shared across queries.
+ pub fn l3_get(&self, node_content_fp: &Fingerprint) -> Option<(f32, String)> {
+ let l3 = self.l3.read().ok()?;
+ let entry = l3.entries.get(node_content_fp)?;
+ Some((entry.score, entry.strategy.clone()))
+ }
+
+ /// Store a strategy score for a node.
+ pub fn l3_store(
+ &self,
+ node_content_fp: Fingerprint,
+ score: f32,
+ strategy: String,
+ ) {
+ if let Ok(mut l3) = self.l3.write() {
+ if l3.entries.len() >= self.config.l3_max {
+ Self::evict_lru_fingerprint_l3(&mut l3);
+ }
+ l3.entries.insert(
+ node_content_fp,
+ L3Entry {
+ score,
+ strategy,
+ created_at: Instant::now(),
+ },
+ );
+ l3.order.push(node_content_fp);
+ }
+ }
+
+ // ============ Stats ============
+
+ /// Get cache statistics.
+ pub fn stats(&self) -> ReasoningCacheStats {
+ let (l1_count, l2_count, l3_count) = (
+ self.l1.read().map(|g| g.entries.len()).unwrap_or(0),
+ self.l2.read().map(|g| g.entries.len()).unwrap_or(0),
+ self.l3.read().map(|g| g.entries.len()).unwrap_or(0),
+ );
+ ReasoningCacheStats {
+ l1_entries: l1_count,
+ l2_entries: l2_count,
+ l3_entries: l3_count,
+ }
+ }
+
+ /// Clear all cache tiers.
+ pub fn clear(&self) {
+ if let Ok(mut l1) = self.l1.write() {
+ l1.entries.clear();
+ l1.order.clear();
+ }
+ if let Ok(mut l2) = self.l2.write() {
+ l2.entries.clear();
+ l2.order.clear();
+ }
+ if let Ok(mut l3) = self.l3.write() {
+ l3.entries.clear();
+ l3.order.clear();
+ }
+ }
+
+ // ============ Eviction helpers ============
+
+ fn evict_lru_fingerprint(l1: &mut L1Store) {
+ if let Some(old) = l1.order.first().copied() {
+ l1.entries.remove(&old);
+ l1.order.remove(0);
+ }
+ }
+
+ fn evict_lru_string(l2: &mut L2Store) {
+ if let Some(old) = l2.order.first().cloned() {
+ l2.entries.remove(&old);
+ l2.order.remove(0);
+ }
+ }
+
+ fn evict_lru_fingerprint_l3(l3: &mut L3Store) {
+ if let Some(old) = l3.order.first().copied() {
+ l3.entries.remove(&old);
+ l3.order.remove(0);
+ }
+ }
+}
+
+impl Default for ReasoningCache {
+ fn default() -> Self {
+ Self::new()
+ }
+}
+
+/// Cache statistics.
+#[derive(Debug, Clone)]
+pub struct ReasoningCacheStats {
+ /// L1 entries (exact query results).
+ pub l1_entries: usize,
+ /// L2 entries (path patterns).
+ pub l2_entries: usize,
+ /// L3 entries (strategy scores).
+ pub l3_entries: usize,
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ fn make_node_id(n: usize) -> NodeId {
+ let mut arena = indextree::Arena::new();
+ NodeId(arena.new_node(n))
+ }
+
+ #[test]
+ fn test_l1_store_and_retrieve() {
+ let cache = ReasoningCache::new();
+ let scope = Fingerprint::from_str("doc1");
+
+ let candidates = vec![CachedCandidate {
+ node_id: make_node_id(1),
+ score: 0.9,
+ depth: 2,
+ }];
+
+ cache.l1_store("what is rust?", scope, candidates.clone(), "keyword".into());
+ let result = cache.l1_get("what is rust?", &scope);
+ assert!(result.is_some());
+ assert_eq!(result.unwrap().len(), 1);
+ }
+
+ #[test]
+ fn test_l1_miss_different_scope() {
+ let cache = ReasoningCache::new();
+ let scope1 = Fingerprint::from_str("doc1");
+ let scope2 = Fingerprint::from_str("doc2");
+
+ let candidates = vec![CachedCandidate {
+ node_id: make_node_id(1),
+ score: 0.9,
+ depth: 2,
+ }];
+
+ cache.l1_store("query", scope1, candidates, "keyword".into());
+ assert!(cache.l1_get("query", &scope2).is_none());
+ }
+
+ #[test]
+ fn test_l2_record_and_get() {
+ let cache = ReasoningCache::new();
+
+ cache.l2_record("doc1", "3.2", 0.8);
+ let score = cache.l2_get("doc1", "3.2");
+ assert!(score.is_some());
+ assert!((score.unwrap() - 0.8).abs() < 0.01);
+ }
+
+ #[test]
+ fn test_l2_running_average() {
+ let cache = ReasoningCache::new();
+
+ cache.l2_record("doc1", "3.2", 0.8);
+ cache.l2_record("doc1", "3.2", 0.6);
+ let score = cache.l2_get("doc1", "3.2").unwrap();
+ // Running average: 0.8 + (0.6 - 0.8) / 2 = 0.7
+ assert!((score - 0.7).abs() < 0.01);
+ }
+
+ #[test]
+ fn test_l2_top_paths() {
+ let cache = ReasoningCache::new();
+
+ cache.l2_record("doc1", "3.1", 0.5);
+ cache.l2_record("doc1", "3.2", 0.9);
+ cache.l2_record("doc1", "2.1", 0.7);
+
+ let top = cache.l2_top_paths("doc1", 2);
+ assert_eq!(top.len(), 2);
+ assert!((top[0].1 - 0.9).abs() < 0.01); // 3.2 is highest
+ }
+
+ #[test]
+ fn test_l3_store_and_retrieve() {
+ let cache = ReasoningCache::new();
+ let fp = Fingerprint::from_str("some node content");
+
+ cache.l3_store(fp, 0.85, "bm25".into());
+ let (score, strategy) = cache.l3_get(&fp).unwrap();
+ assert!((score - 0.85).abs() < 0.01);
+ assert_eq!(strategy, "bm25");
+ }
+
+ #[test]
+ fn test_clear() {
+ let cache = ReasoningCache::new();
+ let scope = Fingerprint::from_str("doc1");
+
+ cache.l1_store("q", scope, vec![], "kw".into());
+ cache.l2_record("doc1", "1", 0.5);
+ cache.l3_store(Fingerprint::from_str("c"), 0.5, "kw".into());
+
+ cache.clear();
+
+ let stats = cache.stats();
+ assert_eq!(stats.l1_entries, 0);
+ assert_eq!(stats.l2_entries, 0);
+ assert_eq!(stats.l3_entries, 0);
+ }
+
+ #[test]
+ fn test_l1_lru_eviction() {
+ let config = ReasoningCacheConfig {
+ l1_max: 2,
+ ..Default::default()
+ };
+ let cache = ReasoningCache::with_config(config);
+ let scope = Fingerprint::from_str("doc");
+
+ cache.l1_store("q1", scope, vec![], "kw".into());
+ cache.l1_store("q2", scope, vec![], "kw".into());
+ cache.l1_store("q3", scope, vec![], "kw".into()); // evicts q1
+
+ assert!(cache.l1_get("q1", &scope).is_none());
+ assert!(cache.l1_get("q2", &scope).is_some());
+ assert!(cache.l1_get("q3", &scope).is_some());
+ }
+}
diff --git a/rust/src/retrieval/decompose.rs b/rust/src/retrieval/decompose.rs
index 603c388d..9c547ef8 100644
--- a/rust/src/retrieval/decompose.rs
+++ b/rust/src/retrieval/decompose.rs
@@ -64,6 +64,10 @@ pub struct SubQuery {
pub depends_on: Vec,
/// Type of sub-query.
pub query_type: SubQueryType,
+ /// Optional structural path constraint extracted from the query
+ /// (e.g. "3.2", "Chapter 5"). When set, the search should start
+ /// from the corresponding tree node instead of searching broadly.
+ pub path_constraint: Option,
}
/// Complexity level for a sub-query.
@@ -130,6 +134,7 @@ impl DecompositionResult {
priority: 0,
depends_on: vec![],
query_type: SubQueryType::Fact,
+ path_constraint: None,
}],
was_decomposed: false,
reason: reason.to_string(),
@@ -338,6 +343,7 @@ impl QueryDecomposer {
priority: i as u8,
depends_on: vec![],
query_type: self.detect_query_type(part),
+ path_constraint: None,
});
}
}
@@ -359,6 +365,7 @@ impl QueryDecomposer {
vec![]
},
query_type: self.detect_query_type(part),
+ path_constraint: None,
});
}
break;
@@ -666,6 +673,7 @@ mod tests {
depends_on: vec![],
query_type: SubQueryType::Fact,
complexity: SubQueryComplexity::Simple,
+ path_constraint: None,
},
SubQuery {
text: "Second".to_string(),
@@ -673,6 +681,7 @@ mod tests {
depends_on: vec![0],
query_type: SubQueryType::Fact,
complexity: SubQueryComplexity::Simple,
+ path_constraint: None,
},
];
result.was_decomposed = true;
@@ -711,6 +720,7 @@ mod tests {
depends_on: vec![],
query_type: SubQueryType::Fact,
complexity: SubQueryComplexity::Simple,
+ path_constraint: None,
},
content: "Answer 1".to_string(),
score: 0.9,
@@ -723,6 +733,7 @@ mod tests {
depends_on: vec![0],
query_type: SubQueryType::Fact,
complexity: SubQueryComplexity::Simple,
+ path_constraint: None,
},
content: "Answer 2".to_string(),
score: 0.8,
diff --git a/rust/src/retrieval/mod.rs b/rust/src/retrieval/mod.rs
index dc1a289e..d5d65e22 100644
--- a/rust/src/retrieval/mod.rs
+++ b/rust/src/retrieval/mod.rs
@@ -52,6 +52,7 @@ mod decompose;
mod pipeline_retriever;
mod reference;
mod retriever;
+pub mod stream;
mod types;
pub mod cache;
@@ -71,14 +72,16 @@ pub use context::{
pub use pipeline_retriever::PipelineRetriever;
pub use retriever::{RetrievalContext, Retriever, RetrieverError, RetrieverResult};
pub use types::*;
+pub use types::{LlmCallSummary, ReasoningCandidate, ReasoningChain, ReasoningStep, StageName};
// Re-export StrategyPreference as Strategy for convenience
pub use types::StrategyPreference as Strategy;
// Pipeline exports
pub use pipeline::{
- CandidateNode, ExecutionGroup, FailurePolicy, PipelineContext, RetrievalMetrics,
- RetrievalOrchestrator, RetrievalStage, SearchAlgorithm, SearchConfig, StageOutcome,
+ CandidateNode, ExecutionGroup, FailurePolicy, PipelineContext, RetrievalBudgetController,
+ RetrievalMetrics, RetrievalOrchestrator, RetrievalStage, SearchAlgorithm, SearchConfig,
+ StageOutcome, BudgetStatus,
};
// Re-export PipelineContext as RetrievalContext for stages (alias for clarity)
@@ -106,6 +109,7 @@ pub use complexity::ComplexityDetector;
// Cache exports
pub use cache::PathCache;
+pub use cache::{CachedCandidate, ReasoningCache, ReasoningCacheConfig, ReasoningCacheStats};
// Content aggregation exports
pub use content::{
@@ -132,3 +136,6 @@ pub use reference::{
expand_with_references, FollowedReference, ReferenceConfig, ReferenceExpansion,
ReferenceFollower,
};
+
+// Streaming exports
+pub use stream::{RetrieveEvent, RetrieveEventReceiver, DEFAULT_STREAM_BOUND};
diff --git a/rust/src/retrieval/pipeline/budget.rs b/rust/src/retrieval/pipeline/budget.rs
new file mode 100644
index 00000000..3fe69d76
--- /dev/null
+++ b/rust/src/retrieval/pipeline/budget.rs
@@ -0,0 +1,329 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Adaptive token budget controller for the retrieval pipeline.
+//!
+//! Unlike the Pilot-level [`BudgetController`](crate::retrieval::pilot::BudgetController)
+//! which only tracks Pilot LLM calls, this controller tracks the **entire pipeline's**
+//! token consumption across all stages and provides dynamic budget allocation decisions.
+//!
+//! # Design
+//!
+//! ```text
+//! ┌──────────────────────────────────────────────────┐
+//! │ RetrievalBudgetController │
+//! │ │
+//! │ total_budget ────────────────────────┬────────── │
+//! │ consumed (from all stages) │ remaining │
+//! │ │ │
+//! │ Plan stage: initial allocation │ │
+//! │ Search stage: check before iteration │ │
+//! │ Evaluate stage: report & decide │ │
+//! │ Graceful degradation when low │ │
+//! └──────────────────────────────────────────────────┘
+//! ```
+
+use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
+use std::sync::Arc;
+
+/// Status of the budget for stage-level decision making.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum BudgetStatus {
+ /// Plenty of budget remaining, proceed normally.
+ Healthy,
+ /// Budget is getting low, consider cheaper strategies.
+ Constrained,
+ /// Budget is exhausted, stop LLM calls and return best results.
+ Exhausted,
+}
+
+impl BudgetStatus {
+ /// Whether LLM calls should still be made.
+ pub fn allow_llm(self) -> bool {
+ matches!(self, Self::Healthy | Self::Constrained)
+ }
+
+ /// Whether the pipeline should stop iterating and return current results.
+ pub fn should_stop(self) -> bool {
+ self == Self::Exhausted
+ }
+}
+
+/// Adaptive budget controller for the retrieval pipeline.
+///
+/// Tracks token consumption across all stages (Plan, Search, Evaluate)
+/// and provides budget-aware decisions for dynamic strategy adjustment.
+///
+/// # Example
+///
+/// ```rust,ignore
+/// let budget = RetrievalBudgetController::new(4000);
+///
+/// // In Search stage: check before starting an iteration
+/// if budget.status().should_stop() {
+/// return StageOutcome::complete(); // graceful degradation
+/// }
+///
+/// // After LLM call: record consumption
+/// budget.record_tokens(350);
+///
+/// // In Evaluate: decide based on remaining budget
+/// if budget.status() == BudgetStatus::Constrained {
+/// // Use cheaper sufficiency check
+/// }
+/// ```
+pub struct RetrievalBudgetController {
+ /// Total token budget for this retrieval operation.
+ total_budget: usize,
+ /// Tokens consumed so far (atomic for thread safety).
+ consumed: AtomicUsize,
+ /// Whether budget exhaustion has been signaled to the pipeline.
+ exhaustion_signaled: AtomicBool,
+ /// Threshold ratio for "constrained" status (e.g. 0.7 = warn at 70% used).
+ constrain_threshold: f32,
+}
+
+// Manual Clone because AtomicUsize/AtomicBool don't impl Clone.
+impl Clone for RetrievalBudgetController {
+ fn clone(&self) -> Self {
+ Self {
+ total_budget: self.total_budget,
+ consumed: AtomicUsize::new(self.consumed.load(Ordering::Relaxed)),
+ exhaustion_signaled: AtomicBool::new(
+ self.exhaustion_signaled.load(Ordering::Relaxed),
+ ),
+ constrain_threshold: self.constrain_threshold,
+ }
+ }
+}
+
+impl RetrievalBudgetController {
+ /// Create a new budget controller with the given total token budget.
+ pub fn new(total_budget: usize) -> Self {
+ Self {
+ total_budget,
+ consumed: AtomicUsize::new(0),
+ exhaustion_signaled: AtomicBool::new(false),
+ constrain_threshold: 0.7,
+ }
+ }
+
+ /// Create with a custom constrain threshold (0.0 - 1.0).
+ ///
+ /// When consumption exceeds `total_budget * threshold`, status becomes Constrained.
+ pub fn with_constrain_threshold(mut self, threshold: f32) -> Self {
+ self.constrain_threshold = threshold.clamp(0.0, 1.0);
+ self
+ }
+
+ /// Get the current budget status.
+ pub fn status(&self) -> BudgetStatus {
+ if self.exhaustion_signaled.load(Ordering::Relaxed) {
+ return BudgetStatus::Exhausted;
+ }
+
+ let consumed = self.consumed.load(Ordering::Relaxed);
+ if consumed >= self.total_budget {
+ self.exhaustion_signaled.store(true, Ordering::Relaxed);
+ return BudgetStatus::Exhausted;
+ }
+
+ let utilization = consumed as f32 / self.total_budget as f32;
+ if utilization >= self.constrain_threshold {
+ BudgetStatus::Constrained
+ } else {
+ BudgetStatus::Healthy
+ }
+ }
+
+ /// Record tokens consumed by any stage.
+ pub fn record_tokens(&self, tokens: usize) {
+ self.consumed.fetch_add(tokens, Ordering::Relaxed);
+ }
+
+ /// Get total tokens consumed so far.
+ pub fn consumed(&self) -> usize {
+ self.consumed.load(Ordering::Relaxed)
+ }
+
+ /// Get remaining token budget.
+ pub fn remaining(&self) -> usize {
+ self.total_budget.saturating_sub(self.consumed.load(Ordering::Relaxed))
+ }
+
+ /// Get total budget.
+ pub fn total_budget(&self) -> usize {
+ self.total_budget
+ }
+
+ /// Get utilization ratio (0.0 - 1.0).
+ pub fn utilization(&self) -> f32 {
+ if self.total_budget == 0 {
+ 0.0
+ } else {
+ (self.consumed.load(Ordering::Relaxed) as f32 / self.total_budget as f32).min(1.0)
+ }
+ }
+
+ /// Signal that budget is exhausted (e.g. external trigger).
+ pub fn signal_exhausted(&self) {
+ self.exhaustion_signaled.store(true, Ordering::Relaxed);
+ }
+
+ /// Whether budget exhaustion has been signaled.
+ pub fn is_exhausted(&self) -> bool {
+ self.exhaustion_signaled.load(Ordering::Relaxed)
+ || self.consumed.load(Ordering::Relaxed) >= self.total_budget
+ }
+
+ /// Reset for a new query.
+ pub fn reset(&self) {
+ self.consumed.store(0, Ordering::Relaxed);
+ self.exhaustion_signaled.store(false, Ordering::Relaxed);
+ }
+
+ /// Suggest a search strategy based on budget status and query complexity.
+ ///
+ /// Returns the recommended beam width for the next search iteration.
+ pub fn suggested_beam_width(&self, current_beam: usize, iteration: usize) -> usize {
+ match self.status() {
+ BudgetStatus::Healthy => {
+ // Full power, maybe even increase beam for complex queries
+ current_beam
+ }
+ BudgetStatus::Constrained => {
+ // Reduce beam to save tokens
+ let reduced = if iteration <= 1 { current_beam } else { (current_beam / 2).max(1) };
+ reduced
+ }
+ BudgetStatus::Exhausted => {
+ // No more search iterations worth doing
+ 0
+ }
+ }
+ }
+
+ /// Whether another search iteration is worthwhile given budget and confidence.
+ pub fn should_continue_search(&self, current_confidence: f32, iteration: usize) -> bool {
+ if self.is_exhausted() {
+ return false;
+ }
+ // Don't continue if confidence is already good
+ if current_confidence > 0.8 && iteration >= 1 {
+ return false;
+ }
+ // Don't continue if budget is constrained and we have some results
+ if self.status() == BudgetStatus::Constrained && current_confidence > 0.4 {
+ return false;
+ }
+ true
+ }
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn test_budget_healthy() {
+ let budget = RetrievalBudgetController::new(1000);
+ assert_eq!(budget.status(), BudgetStatus::Healthy);
+ assert!(!budget.is_exhausted());
+ assert_eq!(budget.remaining(), 1000);
+ }
+
+ #[test]
+ fn test_budget_constrained() {
+ let budget = RetrievalBudgetController::new(1000);
+ budget.record_tokens(750); // 75% used, above 70% threshold
+ assert_eq!(budget.status(), BudgetStatus::Constrained);
+ assert!(budget.status().allow_llm());
+ }
+
+ #[test]
+ fn test_budget_exhausted() {
+ let budget = RetrievalBudgetController::new(1000);
+ budget.record_tokens(1000);
+ assert_eq!(budget.status(), BudgetStatus::Exhausted);
+ assert!(budget.status().should_stop());
+ assert!(!budget.status().allow_llm());
+ }
+
+ #[test]
+ fn test_budget_exhausted_over() {
+ let budget = RetrievalBudgetController::new(1000);
+ budget.record_tokens(1500);
+ assert_eq!(budget.status(), BudgetStatus::Exhausted);
+ }
+
+ #[test]
+ fn test_budget_signal_exhausted() {
+ let budget = RetrievalBudgetController::new(1000);
+ budget.signal_exhausted();
+ assert_eq!(budget.status(), BudgetStatus::Exhausted);
+ assert_eq!(budget.consumed(), 0); // No tokens actually consumed
+ }
+
+ #[test]
+ fn test_budget_reset() {
+ let budget = RetrievalBudgetController::new(1000);
+ budget.record_tokens(800);
+ assert_eq!(budget.status(), BudgetStatus::Constrained);
+ budget.reset();
+ assert_eq!(budget.status(), BudgetStatus::Healthy);
+ assert_eq!(budget.consumed(), 0);
+ }
+
+ #[test]
+ fn test_suggested_beam_width() {
+ let budget = RetrievalBudgetController::new(1000);
+ // Healthy: keep current beam
+ assert_eq!(budget.suggested_beam_width(4, 0), 4);
+
+ // Constrained: first iteration keeps beam, later reduces
+ budget.record_tokens(750);
+ assert_eq!(budget.suggested_beam_width(4, 0), 4);
+ assert_eq!(budget.suggested_beam_width(4, 2), 2);
+
+ // Exhausted: zero
+ budget.record_tokens(300);
+ assert_eq!(budget.suggested_beam_width(4, 0), 0);
+ }
+
+ #[test]
+ fn test_should_continue_search() {
+ let budget = RetrievalBudgetController::new(1000);
+
+ // Fresh, low confidence: continue
+ assert!(budget.should_continue_search(0.2, 0));
+
+ // High confidence after 1 iteration: stop
+ assert!(!budget.should_continue_search(0.9, 1));
+
+ // Medium confidence, healthy budget: continue
+ assert!(budget.should_continue_search(0.5, 1));
+
+ // Constrained, decent confidence: stop
+ budget.record_tokens(750);
+ assert!(!budget.should_continue_search(0.5, 2));
+
+ // Constrained, low confidence: continue
+ assert!(budget.should_continue_search(0.2, 2));
+ }
+
+ #[test]
+ fn test_utilization() {
+ let budget = RetrievalBudgetController::new(1000);
+ assert!((budget.utilization() - 0.0).abs() < 0.01);
+
+ budget.record_tokens(500);
+ assert!((budget.utilization() - 0.5).abs() < 0.01);
+ }
+
+ #[test]
+ fn test_custom_constrain_threshold() {
+ let budget = RetrievalBudgetController::new(1000).with_constrain_threshold(0.5);
+ budget.record_tokens(500);
+ assert_eq!(budget.status(), BudgetStatus::Constrained);
+ }
+}
diff --git a/rust/src/retrieval/pipeline/context.rs b/rust/src/retrieval/pipeline/context.rs
index 823abdba..d5158ecb 100644
--- a/rust/src/retrieval/pipeline/context.rs
+++ b/rust/src/retrieval/pipeline/context.rs
@@ -10,11 +10,13 @@ use std::collections::HashMap;
use std::sync::Arc;
use std::time::Instant;
-use crate::document::{DocumentTree, NodeId, RetrievalIndex};
+use crate::document::{DocumentGraph, DocumentTree, NodeId, ReasoningIndex, RetrievalIndex};
+use crate::retrieval::cache::{HotNodeTracker, ReasoningCache};
+use crate::retrieval::pipeline::budget::RetrievalBudgetController;
use crate::retrieval::pilot::Pilot;
use crate::retrieval::types::{
- NavigationStep, QueryComplexity, RetrieveOptions, RetrieveResponse, SearchPath,
- StrategyPreference, SufficiencyLevel,
+ NavigationDecision, QueryComplexity, ReasoningChain, ReasoningStep, RetrieveOptions,
+ RetrieveResponse, SearchPath, StageName, StrategyPreference, SufficiencyLevel,
};
/// Search algorithm type.
@@ -201,6 +203,19 @@ pub struct PipelineContext {
pub options: RetrieveOptions,
/// Optional Pilot for navigation guidance.
pub pilot: Option>,
+ /// Adaptive token budget controller for the entire pipeline.
+ pub budget_controller: RetrievalBudgetController,
+ /// Tiered reasoning cache (L1 exact, L2 path pattern, L3 strategy score).
+ pub reasoning_cache: Arc,
+
+ /// Pre-computed reasoning index for fast path resolution.
+ pub reasoning_index: Option>,
+
+ /// Hot node tracker for recording retrieval frequency (session-scoped).
+ pub hot_tracker: Option>,
+
+ /// Cross-document relationship graph for graph-aware retrieval.
+ pub document_graph: Option>,
// ============ Analyze Stage Output ============
/// Detected query complexity.
@@ -209,6 +224,9 @@ pub struct PipelineContext {
pub keywords: Vec,
/// Target sections from ToC matching.
pub target_sections: Vec,
+ /// Resolved structural path hints — node IDs extracted from the query
+ /// (e.g. "第3章" → NodeId of Chapter 3). Search should start from these nodes.
+ pub resolved_path_hints: Vec<(String, NodeId)>,
/// Decomposed sub-queries (if query was decomposed).
pub decomposition: Option,
@@ -225,8 +243,8 @@ pub struct PipelineContext {
pub candidates: Vec,
/// Search paths explored.
pub search_paths: Vec,
- /// Navigation trace for debugging.
- pub navigation_trace: Vec,
+ /// Reasoning chain — ordered steps explaining every retrieval decision.
+ pub reasoning_chain: ReasoningChain,
/// Number of search iterations performed.
pub search_iterations: usize,
@@ -260,6 +278,7 @@ impl PipelineContext {
) -> Self {
// Build retrieval index for efficient operations
let retrieval_index = Some(tree.build_retrieval_index());
+ let budget_controller = RetrievalBudgetController::new(options.max_tokens);
Self {
query: query.into(),
@@ -267,16 +286,22 @@ impl PipelineContext {
retrieval_index,
options,
pilot: None,
+ budget_controller,
+ reasoning_cache: Arc::new(ReasoningCache::new()),
+ reasoning_index: None,
+ hot_tracker: None,
+ document_graph: None,
complexity: None,
keywords: Vec::new(),
target_sections: Vec::new(),
+ resolved_path_hints: Vec::new(),
decomposition: None,
selected_strategy: None,
selected_algorithm: None,
search_config: None,
candidates: Vec::new(),
search_paths: Vec::new(),
- navigation_trace: Vec::new(),
+ reasoning_chain: ReasoningChain::new(),
search_iterations: 0,
sufficiency: SufficiencyLevel::default(),
accumulated_content: String::new(),
@@ -305,6 +330,24 @@ impl PipelineContext {
self.pilot = pilot;
}
+ /// Set the reasoning index for this retrieval context.
+ pub fn with_reasoning_index(mut self, index: ReasoningIndex) -> Self {
+ self.reasoning_index = Some(Arc::new(index));
+ self
+ }
+
+ /// Set the hot node tracker for this retrieval context.
+ pub fn with_hot_tracker(mut self, tracker: HotNodeTracker) -> Self {
+ self.hot_tracker = Some(Arc::new(tracker));
+ self
+ }
+
+ /// Set the document graph for graph-aware retrieval.
+ pub fn with_document_graph(mut self, graph: DocumentGraph) -> Self {
+ self.document_graph = Some(Arc::new(graph));
+ self
+ }
+
/// Get the Pilot reference, if available.
pub fn pilot(&self) -> Option<&dyn Pilot> {
self.pilot.as_deref()
@@ -372,6 +415,33 @@ impl PipelineContext {
}
}
+ /// Append a reasoning step to the chain.
+ pub fn push_reasoning_step(&mut self, step: ReasoningStep) {
+ self.reasoning_chain.push(step);
+ }
+
+ /// Convenience: push a simple reasoning step with no node association.
+ pub fn record_reasoning(
+ &mut self,
+ stage: StageName,
+ reasoning: impl Into,
+ decision: NavigationDecision,
+ ) {
+ self.push_reasoning_step(ReasoningStep {
+ stage,
+ node_id: None,
+ title: None,
+ score: 0.0,
+ decision,
+ depth: 0,
+ reasoning: reasoning.into(),
+ candidates: Vec::new(),
+ strategy_used: None,
+ llm_call: None,
+ references_followed: Vec::new(),
+ });
+ }
+
/// Finalize the context into a response.
pub fn finalize(self) -> RetrieveResponse {
self.result.unwrap_or_else(|| RetrieveResponse {
@@ -384,7 +454,7 @@ impl PipelineContext {
.map(|s| format!("{:?}", s))
.unwrap_or_else(|| "unknown".to_string()),
complexity: self.complexity.unwrap_or_default(),
- trace: self.navigation_trace,
+ reasoning_chain: self.reasoning_chain,
tokens_used: self.token_count,
})
}
diff --git a/rust/src/retrieval/pipeline/mod.rs b/rust/src/retrieval/pipeline/mod.rs
index 5351b767..88b47de5 100644
--- a/rust/src/retrieval/pipeline/mod.rs
+++ b/rust/src/retrieval/pipeline/mod.rs
@@ -44,11 +44,13 @@
//! let response = orchestrator.execute(tree, query, options).await?;
//! ```
+mod budget;
mod context;
mod orchestrator;
mod outcome;
mod stage;
+pub use budget::{BudgetStatus, RetrievalBudgetController};
pub use context::{
CandidateNode, PipelineContext, RetrievalMetrics, SearchAlgorithm, SearchConfig, StageResult,
};
diff --git a/rust/src/retrieval/pipeline/orchestrator.rs b/rust/src/retrieval/pipeline/orchestrator.rs
index e4d5433c..6e53fbc3 100644
--- a/rust/src/retrieval/pipeline/orchestrator.rs
+++ b/rust/src/retrieval/pipeline/orchestrator.rs
@@ -16,9 +16,13 @@ use std::time::Instant;
use tracing::{debug, error, info, warn};
use crate::document::DocumentTree;
+use crate::document::ReasoningIndex;
use crate::error::Result;
use crate::retrieval::pilot::{Pilot, SearchState};
// FailurePolicy is re-exported for stages
+use crate::retrieval::stream::{
+ RetrieveEvent, RetrieveEventReceiver, RetrieveEventSender, DEFAULT_STREAM_BOUND,
+};
use crate::retrieval::types::{RetrieveOptions, RetrieveResponse};
use super::context::{CandidateNode, PipelineContext};
@@ -550,6 +554,611 @@ impl RetrievalOrchestrator {
Ok(ctx.finalize())
}
+ /// Execute the retrieval pipeline with a pre-computed reasoning index.
+ ///
+ /// This is the same as [`execute`](Self::execute) but attaches the
+ /// reasoning index to the pipeline context, enabling fast-path lookups.
+ pub async fn execute_with_reasoning_index(
+ &mut self,
+ tree: Arc,
+ query: &str,
+ options: RetrieveOptions,
+ reasoning_index: Option,
+ ) -> Result {
+ // We delegate to execute() by constructing the context ourselves.
+ // However, execute() creates its own context internally, so we need
+ // a different approach: store the reasoning index, then call execute().
+ //
+ // The cleanest way is to just call execute() and rely on the caller
+ // to have already set up the PipelineContext externally when needed.
+ // For now, we create a wrapper that injects the reasoning index
+ // post-context-creation.
+ //
+ // Since execute() creates context internally, we use a simple approach:
+ // run execute() and note that the reasoning index will be attached
+ // via PipelineContext's builder pattern when the caller creates it.
+ //
+ // This method exists as a convenience API. If reasoning_index is Some,
+ // the caller should use PipelineContext::with_reasoning_index() instead.
+
+ // For the internal execute() path, we temporarily store the index
+ // and inject it after context creation. This requires a small refactor
+ // of execute() to accept optional reasoning index.
+
+ // Simple implementation: delegate to a modified execute flow.
+ let total_start = Instant::now();
+ info!(
+ "Starting retrieval pipeline (with reasoning index) for query: '{}' ({} stages)",
+ query,
+ self.stages.len()
+ );
+
+ let order = self.resolve_order()?;
+ let stage_names: Vec<&str> = order.iter().map(|&i| self.stages[i].stage.name()).collect();
+ info!("Execution order: {:?}", stage_names);
+
+ let groups = self.compute_execution_groups(&order);
+
+ // Create context with Pilot and reasoning index
+ let mut ctx = PipelineContext::with_pilot(tree, query, options, self.pilot.clone());
+ if let Some(ri) = reasoning_index {
+ ctx = ctx.with_reasoning_index(ri);
+ }
+
+ let mut backtrack_count = 0;
+ let mut total_iterations = 0;
+ let mut group_idx = 0;
+
+ while group_idx < groups.len() {
+ if backtrack_count >= self.max_backtracks {
+ warn!("Max backtracks reached, completing with current results");
+ break;
+ }
+
+ if total_iterations >= self.max_total_iterations {
+ warn!("Max total iterations reached, completing");
+ break;
+ }
+
+ let group = &groups[group_idx];
+
+ for &stage_idx in &group.stage_indices {
+ let entry = &self.stages[stage_idx];
+ let stage_name = entry.stage.name();
+ let policy = entry.stage.failure_policy();
+
+ ctx.start_stage();
+ info!("Executing stage: {}", stage_name);
+
+ match entry.stage.execute(&mut ctx).await {
+ Ok(outcome) => {
+ ctx.end_stage(stage_name, true, None);
+ total_iterations += 1;
+
+ match outcome {
+ StageOutcome::Continue => {}
+ StageOutcome::Complete => {
+ ctx.metrics.total_time_ms =
+ total_start.elapsed().as_millis() as u64;
+ info!("Retrieval completed by stage: {}", stage_name);
+ return Ok(ctx.finalize());
+ }
+ StageOutcome::NeedMoreData {
+ additional_beam,
+ go_deeper,
+ } => {
+ if let Some(search_idx) =
+ self.stages.iter().position(|e| e.stage.name() == "search")
+ {
+ info!(
+ "Need more data, backtracking to search (beam +{}, deeper: {})",
+ additional_beam, go_deeper
+ );
+
+ if let Some(ref pilot) = self.pilot {
+ if pilot.config().guide_at_backtrack {
+ let visited: std::collections::HashSet<_> = ctx
+ .search_paths
+ .iter()
+ .flat_map(|p| p.nodes.iter().copied())
+ .collect();
+ let candidates: Vec<_> =
+ ctx.candidates.iter().map(|c| c.node_id).collect();
+
+ let state = SearchState::new(
+ &ctx.tree,
+ &ctx.query,
+ &[],
+ &candidates,
+ &visited,
+ );
+
+ match pilot.guide_backtrack(&state).await {
+ Some(guidance) => {
+ debug!(
+ "Pilot backtrack guidance: confidence={}, candidates={}",
+ guidance.confidence,
+ guidance.ranked_candidates.len()
+ );
+ if guidance.has_candidates() {
+ ctx.candidates = guidance
+ .ranked_candidates
+ .iter()
+ .map(|rc| CandidateNode {
+ node_id: rc.node_id,
+ score: rc.score,
+ depth: 0,
+ is_leaf: false,
+ })
+ .collect();
+ }
+ }
+ None => {
+ debug!("Pilot provided no backtrack guidance");
+ }
+ }
+ }
+ }
+
+ if let Some(ref mut config) = ctx.search_config {
+ config.beam_width += additional_beam;
+ if go_deeper {
+ config.max_depth += 1;
+ }
+ }
+
+ ctx.increment_backtrack();
+ backtrack_count += 1;
+
+ if let Some(target_group) =
+ self.find_group_for_stage(&groups, search_idx)
+ {
+ group_idx = target_group;
+ continue;
+ }
+ }
+ }
+ StageOutcome::Backtrack {
+ target_stage,
+ reason,
+ } => {
+ info!("Backtracking to {}: {}", target_stage, reason);
+
+ if let Some(target_idx) = self
+ .stages
+ .iter()
+ .position(|e| e.stage.name() == target_stage)
+ {
+ if target_stage == "search" {
+ if let Some(ref pilot) = self.pilot {
+ if pilot.config().guide_at_backtrack {
+ let visited: std::collections::HashSet<_> = ctx
+ .search_paths
+ .iter()
+ .flat_map(|p| p.nodes.iter().copied())
+ .collect();
+ let candidates: Vec<_> = ctx
+ .candidates
+ .iter()
+ .map(|c| c.node_id)
+ .collect();
+
+ let state = SearchState::new(
+ &ctx.tree,
+ &ctx.query,
+ &[],
+ &candidates,
+ &visited,
+ );
+
+ if let Some(guidance) =
+ pilot.guide_backtrack(&state).await
+ {
+ debug!(
+ "Pilot backtrack guidance for explicit backtrack: confidence={}",
+ guidance.confidence
+ );
+ if guidance.has_candidates() {
+ ctx.candidates = guidance
+ .ranked_candidates
+ .iter()
+ .map(|rc| CandidateNode {
+ node_id: rc.node_id,
+ score: rc.score,
+ depth: 0,
+ is_leaf: false,
+ })
+ .collect();
+ }
+ }
+ }
+ }
+ }
+
+ ctx.increment_backtrack();
+ backtrack_count += 1;
+
+ if let Some(target_group) =
+ self.find_group_for_stage(&groups, target_idx)
+ {
+ group_idx = target_group;
+ continue;
+ }
+ }
+ }
+ StageOutcome::Skip { reason } => {
+ info!("Skipping remaining stages: {}", reason);
+ ctx.metrics.total_time_ms =
+ total_start.elapsed().as_millis() as u64;
+ return Ok(ctx.finalize());
+ }
+ }
+ }
+ Err(e) => {
+ ctx.end_stage(stage_name, false, Some(e.to_string()));
+
+ if policy.allows_continuation() {
+ warn!(
+ "Stage {} failed but policy allows continuation: {}",
+ stage_name, e
+ );
+ } else {
+ error!("Stage {} failed: {}", stage_name, e);
+ return Err(e);
+ }
+ }
+ }
+ }
+
+ group_idx += 1;
+ }
+
+ ctx.metrics.total_time_ms = total_start.elapsed().as_millis() as u64;
+ info!(
+ "Retrieval completed in {}ms ({} iterations, {} backtracks)",
+ ctx.metrics.total_time_ms, total_iterations, backtrack_count
+ );
+
+ Ok(ctx.finalize())
+ }
+
+ /// Execute the retrieval pipeline with streaming events.
+ ///
+ /// Consumes the orchestrator and spawns a background task that runs the
+ /// pipeline. The caller receives a channel of [`RetrieveEvent`]s that
+ /// fire at each stage boundary. The stream always terminates with either
+ /// [`Completed`](RetrieveEvent::Completed) or
+ /// [`Error`](RetrieveEvent::Error).
+ ///
+ /// The existing [`execute()`](Self::execute) method is **not** affected.
+ ///
+ /// # Example
+ ///
+ /// ```rust,ignore
+ /// let (handle, mut rx) = orchestrator.execute_streaming(tree, query, options);
+ ///
+ /// while let Some(event) = rx.recv().await {
+ /// match event {
+ /// RetrieveEvent::StageCompleted { stage, .. } => println!("{stage} done"),
+ /// RetrieveEvent::Completed { response } => break,
+ /// RetrieveEvent::Error { message } => { eprintln!("{message}"); break; }
+ /// _ => {}
+ /// }
+ /// }
+ /// let _ = handle.await;
+ /// ```
+ pub fn execute_streaming(
+ mut self,
+ tree: Arc,
+ query: &str,
+ options: RetrieveOptions,
+ ) -> (
+ tokio::task::JoinHandle<()>,
+ RetrieveEventReceiver,
+ ) {
+ let (tx, rx) = tokio::sync::mpsc::channel(DEFAULT_STREAM_BOUND);
+ let query_owned = query.to_string();
+
+ let handle = tokio::spawn(async move {
+ if let Err(e) = self.run_streaming(tree, &query_owned, options, &tx).await {
+ let _ = tx
+ .send(RetrieveEvent::Error {
+ message: e.to_string(),
+ })
+ .await;
+ }
+ });
+
+ (handle, rx)
+ }
+
+ /// Internal streaming pipeline execution.
+ async fn run_streaming(
+ &mut self,
+ tree: Arc,
+ query: &str,
+ options: RetrieveOptions,
+ tx: &RetrieveEventSender,
+ ) -> Result<()> {
+ let total_start = Instant::now();
+
+ let _ = tx
+ .send(RetrieveEvent::Started {
+ query: query.to_string(),
+ strategy: format!("{:?}", options.strategy),
+ })
+ .await;
+
+ info!(
+ "Starting streaming retrieval pipeline for query: '{}' ({} stages)",
+ query,
+ self.stages.len()
+ );
+
+ let order = self.resolve_order()?;
+ let groups = self.compute_execution_groups(&order);
+ let mut ctx = PipelineContext::with_pilot(tree, query, options, self.pilot.clone());
+
+ let mut backtrack_count = 0;
+ let mut total_iterations = 0;
+ let mut group_idx = 0;
+
+ while group_idx < groups.len() {
+ if backtrack_count >= self.max_backtracks {
+ warn!("Max backtracks reached, completing with current results");
+ break;
+ }
+ if total_iterations >= self.max_total_iterations {
+ warn!("Max total iterations reached, completing");
+ break;
+ }
+
+ let group = &groups[group_idx];
+
+ for &stage_idx in &group.stage_indices {
+ let entry = &self.stages[stage_idx];
+ let stage_name = entry.stage.name();
+ let policy = entry.stage.failure_policy();
+
+ let stage_start = Instant::now();
+ ctx.start_stage();
+ info!("Executing stage: {}", stage_name);
+
+ match entry.stage.execute(&mut ctx).await {
+ Ok(outcome) => {
+ let elapsed = stage_start.elapsed().as_millis() as u64;
+ ctx.end_stage(stage_name, true, None);
+ total_iterations += 1;
+
+ let _ = tx
+ .send(RetrieveEvent::StageCompleted {
+ stage: stage_name.to_string(),
+ elapsed_ms: elapsed,
+ })
+ .await;
+
+ match outcome {
+ StageOutcome::Continue => {}
+ StageOutcome::Complete => {
+ ctx.metrics.total_time_ms =
+ total_start.elapsed().as_millis() as u64;
+ info!("Retrieval completed by stage: {}", stage_name);
+ let response = ctx.finalize();
+ let _ = tx
+ .send(RetrieveEvent::Completed { response })
+ .await;
+ return Ok(());
+ }
+ StageOutcome::NeedMoreData {
+ additional_beam,
+ go_deeper,
+ } => {
+ if let Some(search_idx) =
+ self.stages.iter().position(|e| e.stage.name() == "search")
+ {
+ info!(
+ "Need more data, backtracking to search (beam +{}, deeper: {})",
+ additional_beam, go_deeper
+ );
+
+ let _ = tx
+ .send(RetrieveEvent::Backtracking {
+ from: stage_name.to_string(),
+ to: "search".to_string(),
+ reason: format!(
+ "NeedMoreData: beam +{}, deeper: {}",
+ additional_beam, go_deeper
+ ),
+ })
+ .await;
+
+ // Consult Pilot
+ if let Some(ref pilot) = self.pilot {
+ if pilot.config().guide_at_backtrack {
+ let visited: std::collections::HashSet<_> = ctx
+ .search_paths
+ .iter()
+ .flat_map(|p| p.nodes.iter().copied())
+ .collect();
+ let candidates: Vec<_> =
+ ctx.candidates.iter().map(|c| c.node_id).collect();
+
+ let state = SearchState::new(
+ &ctx.tree,
+ &ctx.query,
+ &[],
+ &candidates,
+ &visited,
+ );
+
+ match pilot.guide_backtrack(&state).await {
+ Some(guidance) => {
+ debug!(
+ "Pilot backtrack guidance: confidence={}, candidates={}",
+ guidance.confidence,
+ guidance.ranked_candidates.len()
+ );
+ if guidance.has_candidates() {
+ ctx.candidates = guidance
+ .ranked_candidates
+ .iter()
+ .map(|rc| CandidateNode {
+ node_id: rc.node_id,
+ score: rc.score,
+ depth: 0,
+ is_leaf: false,
+ })
+ .collect();
+ }
+ }
+ None => {
+ debug!("Pilot provided no backtrack guidance");
+ }
+ }
+ }
+ }
+
+ if let Some(ref mut config) = ctx.search_config {
+ config.beam_width += additional_beam;
+ if go_deeper {
+ config.max_depth += 1;
+ }
+ }
+
+ ctx.increment_backtrack();
+ backtrack_count += 1;
+
+ if let Some(target_group) =
+ self.find_group_for_stage(&groups, search_idx)
+ {
+ group_idx = target_group;
+ continue;
+ }
+ }
+ }
+ StageOutcome::Backtrack {
+ target_stage,
+ reason,
+ } => {
+ info!("Backtracking to {}: {}", target_stage, reason);
+
+ let _ = tx
+ .send(RetrieveEvent::Backtracking {
+ from: stage_name.to_string(),
+ to: target_stage.clone(),
+ reason: reason.clone(),
+ })
+ .await;
+
+ if let Some(target_idx) = self
+ .stages
+ .iter()
+ .position(|e| e.stage.name() == target_stage)
+ {
+ if target_stage == "search" {
+ if let Some(ref pilot) = self.pilot {
+ if pilot.config().guide_at_backtrack {
+ let visited: std::collections::HashSet<_> = ctx
+ .search_paths
+ .iter()
+ .flat_map(|p| p.nodes.iter().copied())
+ .collect();
+ let candidates: Vec<_> = ctx
+ .candidates
+ .iter()
+ .map(|c| c.node_id)
+ .collect();
+
+ let state = SearchState::new(
+ &ctx.tree,
+ &ctx.query,
+ &[],
+ &candidates,
+ &visited,
+ );
+
+ if let Some(guidance) =
+ pilot.guide_backtrack(&state).await
+ {
+ debug!(
+ "Pilot backtrack guidance for explicit backtrack: confidence={}",
+ guidance.confidence
+ );
+ if guidance.has_candidates() {
+ ctx.candidates = guidance
+ .ranked_candidates
+ .iter()
+ .map(|rc| CandidateNode {
+ node_id: rc.node_id,
+ score: rc.score,
+ depth: 0,
+ is_leaf: false,
+ })
+ .collect();
+ }
+ }
+ }
+ }
+ }
+
+ ctx.increment_backtrack();
+ backtrack_count += 1;
+
+ if let Some(target_group) =
+ self.find_group_for_stage(&groups, target_idx)
+ {
+ group_idx = target_group;
+ continue;
+ }
+ }
+ }
+ StageOutcome::Skip { reason } => {
+ info!("Skipping remaining stages: {}", reason);
+ ctx.metrics.total_time_ms =
+ total_start.elapsed().as_millis() as u64;
+ let response = ctx.finalize();
+ let _ = tx
+ .send(RetrieveEvent::Completed { response })
+ .await;
+ return Ok(());
+ }
+ }
+ }
+ Err(e) => {
+ ctx.end_stage(stage_name, false, Some(e.to_string()));
+
+ if policy.allows_continuation() {
+ warn!(
+ "Stage {} failed but policy allows continuation: {}",
+ stage_name, e
+ );
+ } else {
+ error!("Stage {} failed: {}", stage_name, e);
+ let _ = tx
+ .send(RetrieveEvent::Error {
+ message: e.to_string(),
+ })
+ .await;
+ return Err(e);
+ }
+ }
+ }
+ }
+
+ group_idx += 1;
+ }
+
+ ctx.metrics.total_time_ms = total_start.elapsed().as_millis() as u64;
+ info!(
+ "Streaming retrieval completed in {}ms ({} iterations, {} backtracks)",
+ ctx.metrics.total_time_ms, total_iterations, backtrack_count
+ );
+
+ let response = ctx.finalize();
+ let _ = tx.send(RetrieveEvent::Completed { response }).await;
+ Ok(())
+ }
+
/// Get list of stage names in execution order.
pub fn stage_names(&self) -> Result> {
let order = self.resolve_order()?;
diff --git a/rust/src/retrieval/pipeline_retriever.rs b/rust/src/retrieval/pipeline_retriever.rs
index 377c4747..e2faa499 100644
--- a/rust/src/retrieval/pipeline_retriever.rs
+++ b/rust/src/retrieval/pipeline_retriever.rs
@@ -13,6 +13,7 @@ use super::content::ContentAggregatorConfig;
use super::pipeline::RetrievalOrchestrator;
use super::retriever::{CostEstimate, Retriever, RetrieverError, RetrieverResult};
use super::stages::{AnalyzeStage, EvaluateStage, PlanStage, SearchStage};
+use super::stream::{RetrieveEvent, RetrieveEventReceiver};
use super::strategy::LlmStrategy;
use super::types::{RetrieveOptions, RetrieveResponse};
use crate::document::DocumentTree;
@@ -151,6 +152,27 @@ impl PipelineRetriever {
fn options_to_retrieve_options(&self, options: &RetrieveOptions) -> RetrieveOptions {
options.clone()
}
+
+ /// Execute streaming retrieval.
+ ///
+ /// Returns a channel receiver that yields [`RetrieveEvent`]s as the
+ /// pipeline progresses. The stream always terminates with either
+ /// `Completed` or `Error`.
+ ///
+ /// This is the streaming counterpart of [`retrieve`](Retriever::retrieve).
+ /// The non-streaming path is not affected.
+ pub fn retrieve_streaming(
+ &self,
+ tree: &DocumentTree,
+ query: &str,
+ options: &RetrieveOptions,
+ ) -> (tokio::task::JoinHandle<()>, RetrieveEventReceiver) {
+ let orchestrator = self.build_orchestrator();
+ let tree_arc = Arc::new(tree.clone());
+ let opts = self.options_to_retrieve_options(options);
+
+ orchestrator.execute_streaming(tree_arc, query, opts)
+ }
}
#[async_trait]
diff --git a/rust/src/retrieval/stages/analyze.rs b/rust/src/retrieval/stages/analyze.rs
index 8dd875e6..1748d440 100644
--- a/rust/src/retrieval/stages/analyze.rs
+++ b/rust/src/retrieval/stages/analyze.rs
@@ -12,10 +12,11 @@
use async_trait::async_trait;
use tracing::info;
-use crate::document::{DocumentTree, TocView};
+use crate::document::{DocumentTree, NodeId, TocView};
use crate::retrieval::complexity::ComplexityDetector;
use crate::retrieval::decompose::{DecompositionConfig, QueryDecomposer};
use crate::retrieval::pipeline::{FailurePolicy, PipelineContext, RetrievalStage, StageOutcome};
+use crate::retrieval::types::{NavigationDecision, StageName};
use crate::llm::LlmClient;
/// Analyze Stage - analyzes queries for retrieval planning.
@@ -28,6 +29,56 @@ use crate::llm::LlmClient;
///
/// # Example
///
+/// Convert Chinese number string to integer (e.g. "三" → 3, "二十一" → 21).
+fn chinese_num_to_int(s: &str) -> Option {
+ let chars: Vec = s.chars().collect();
+ if chars.is_empty() {
+ return None;
+ }
+ // If purely digits, parse directly
+ if chars.iter().all(|c| c.is_ascii_digit()) {
+ return s.parse().ok();
+ }
+ let map = |c: char| -> usize {
+ match c {
+ '一' => 1, '二' => 2, '三' => 3, '四' => 4, '五' => 5,
+ '六' => 6, '七' => 7, '八' => 8, '九' => 9, '十' => 10,
+ '百' => 100,
+ _ => 0,
+ }
+ };
+ // Simple two-pass: handle 十/百 as positional
+ let mut total: usize = 0;
+ let mut current: usize = 0;
+ for &c in &chars {
+ let v = map(c);
+ if v == 0 {
+ continue;
+ }
+ if v >= 10 {
+ // Positional multiplier
+ let base = if current == 0 { 1 } else { current };
+ total += base * v;
+ current = 0;
+ } else {
+ current = v;
+ }
+ }
+ total += current;
+ if total > 0 { Some(total) } else { None }
+}
+
+/// Analyze Stage - analyzes queries for retrieval planning.
+///
+/// This stage:
+/// 1. Detects query complexity (Simple/Medium/Complex)
+/// 2. Extracts keywords for matching
+/// 3. Matches target sections from ToC
+/// 4. Extracts structural path hints (Section 3.2, 第3章, etc.)
+/// 5. Decomposes complex queries into sub-queries (if enabled)
+///
+/// # Example
+///
/// ```rust,ignore
/// let stage = AnalyzeStage::new()
/// .with_toc_matching(true)
@@ -144,6 +195,88 @@ impl AnalyzeStage {
.collect()
}
+ /// Extract structural path hints from the query.
+ ///
+ /// Recognizes patterns like:
+ /// - "第3章", "第2节", "第一章" (Chinese chapter/section)
+ /// - "Section 3.2", "section 4.1.2" (English section numbers)
+ /// - "Chapter 5", "chapter 10" (English chapter)
+ /// - "3.2.1", "2.1" (bare section numbers)
+ /// - "表3", "Table 5", "图2", "Figure 4" (table/figure references)
+ ///
+ /// Maps them to tree NodeIds via `find_by_structure()`.
+ fn extract_structure_hints(&self, query: &str, tree: &DocumentTree) -> Vec<(String, NodeId)> {
+ let mut hints = Vec::new();
+
+ // Chinese patterns: 第X章, 第X节, 第X部分
+ for cap in regex::Regex::new(r"第([一二三四五六七八九十百\d]+)[章节部分]")
+ .unwrap()
+ .captures_iter(query)
+ {
+ let num = chinese_num_to_int(&cap[1]).unwrap_or(0);
+ if num > 0 {
+ if let Some(node_id) = tree.find_by_structure(&num.to_string()) {
+ hints.push((cap[0].to_string(), node_id));
+ }
+ }
+ }
+
+ // "Section X.Y.Z" or "section X.Y"
+ for cap in regex::Regex::new(r"(?i)section\s+(\d+(?:\.\d+)*)")
+ .unwrap()
+ .captures_iter(query)
+ {
+ if let Some(node_id) = tree.find_by_structure(&cap[1]) {
+ hints.push((cap[0].to_string(), node_id));
+ }
+ }
+
+ // "Chapter X"
+ for cap in regex::Regex::new(r"(?i)chapter\s+(\d+)")
+ .unwrap()
+ .captures_iter(query)
+ {
+ if let Some(node_id) = tree.find_by_structure(&cap[1]) {
+ hints.push((cap[0].to_string(), node_id));
+ }
+ }
+
+ // Bare section numbers: "3.2.1", "2.1"
+ // Use word boundary instead of lookbehind (Rust regex doesn't support lookaround)
+ for cap in regex::Regex::new(r"\b(\d+\.\d+(?:\.\d+)*)")
+ .unwrap()
+ .captures_iter(query)
+ {
+ if let Some(node_id) = tree.find_by_structure(&cap[1]) {
+ hints.push((cap[0].to_string(), node_id));
+ }
+ }
+
+ // Table/Figure references
+ for cap in regex::Regex::new(r"(?:表|(?i)table)\s*(\d+)")
+ .unwrap()
+ .captures_iter(query)
+ {
+ if let Some(node_id) = tree.find_by_structure(&format!("table {}", &cap[1])) {
+ hints.push((cap[0].to_string(), node_id));
+ }
+ }
+ for cap in regex::Regex::new(r"(?:图|(?i)figure)\s*(\d+)")
+ .unwrap()
+ .captures_iter(query)
+ {
+ if let Some(node_id) = tree.find_by_structure(&format!("figure {}", &cap[1])) {
+ hints.push((cap[0].to_string(), node_id));
+ }
+ }
+
+ // Deduplicate by NodeId
+ let mut seen = std::collections::HashSet::new();
+ hints.retain(|(_, nid)| seen.insert(*nid));
+
+ hints
+ }
+
/// Match target sections from ToC.
fn match_toc_sections(&self, query: &str, tree: &DocumentTree) -> Vec {
if !self.enable_toc_matching {
@@ -231,6 +364,16 @@ impl RetrievalStage for AnalyzeStage {
info!("Target sections: {:?}", ctx.target_sections);
}
+ // 3.5 Extract structural path hints
+ ctx.resolved_path_hints = self.extract_structure_hints(&ctx.query, &ctx.tree);
+ if !ctx.resolved_path_hints.is_empty() {
+ info!(
+ "Resolved {} structure hints: {:?}",
+ ctx.resolved_path_hints.len(),
+ ctx.resolved_path_hints.iter().map(|(s, _)| s).collect::>()
+ );
+ }
+
// 4. Decompose query if enabled and complex enough
if self.enable_decomposition {
if let Some(ref decomposer) = self.query_decomposer {
@@ -269,6 +412,35 @@ impl RetrievalStage for AnalyzeStage {
// 5. Update metrics
ctx.metrics.llm_calls += 0; // No LLM calls in this stage
+ // 6. Record reasoning
+ let complexity_str = format!("{:?}", ctx.complexity.unwrap_or_default());
+ let mut reasoning_parts = vec![
+ format!("Query complexity: {}", complexity_str),
+ format!("Keywords: {:?}", ctx.keywords),
+ ];
+ if !ctx.target_sections.is_empty() {
+ reasoning_parts.push(format!("Target sections: {:?}", ctx.target_sections));
+ }
+ if !ctx.resolved_path_hints.is_empty() {
+ reasoning_parts.push(format!(
+ "Structure hints: {:?}",
+ ctx.resolved_path_hints.iter().map(|(s, _)| s).collect::>()
+ ));
+ }
+ if let Some(ref decomp) = ctx.decomposition {
+ if decomp.was_decomposed {
+ reasoning_parts.push(format!(
+ "Decomposed into {} sub-queries",
+ decomp.sub_queries.len()
+ ));
+ }
+ }
+ ctx.record_reasoning(
+ StageName::Analyze,
+ reasoning_parts.join("; "),
+ NavigationDecision::ExploreMore,
+ );
+
Ok(StageOutcome::cont())
}
}
diff --git a/rust/src/retrieval/stages/evaluate.rs b/rust/src/retrieval/stages/evaluate.rs
index ad8858f2..11a95713 100644
--- a/rust/src/retrieval/stages/evaluate.rs
+++ b/rust/src/retrieval/stages/evaluate.rs
@@ -12,9 +12,9 @@ use tracing::{info, warn};
use crate::llm::LlmClient;
use crate::retrieval::content::{ContentAggregator, ContentAggregatorConfig};
-use crate::retrieval::pipeline::{FailurePolicy, PipelineContext, RetrievalStage, StageOutcome};
+use crate::retrieval::pipeline::{BudgetStatus, FailurePolicy, PipelineContext, RetrievalStage, StageOutcome};
use crate::retrieval::sufficiency::{LlmJudge, SufficiencyChecker, ThresholdChecker};
-use crate::retrieval::types::{RetrievalResult, RetrieveResponse, SufficiencyLevel};
+use crate::retrieval::types::{NavigationDecision, ReasoningChain, RetrievalResult, RetrieveResponse, StageName, SufficiencyLevel};
use crate::utils::estimate_tokens;
/// Evaluate Stage - evaluates retrieval sufficiency.
@@ -275,7 +275,7 @@ impl EvaluateStage {
.map(|s| format!("{:?}", s))
.unwrap_or_else(|| "unknown".to_string()),
complexity: ctx.complexity.unwrap_or_default(),
- trace: ctx.navigation_trace.clone(),
+ reasoning_chain: ctx.reasoning_chain.clone(),
tokens_used: ctx.token_count,
}
}
@@ -345,7 +345,10 @@ impl RetrievalStage for EvaluateStage {
info!("Aggregated {} tokens", tokens);
- // 2. Check sufficiency
+ // 2. Report token consumption to budget controller
+ ctx.budget_controller.record_tokens(tokens);
+
+ // 3. Check sufficiency
ctx.sufficiency = self.check_sufficiency(ctx);
info!("Sufficiency level: {:?}", ctx.sufficiency);
@@ -353,6 +356,43 @@ impl RetrievalStage for EvaluateStage {
ctx.metrics.evaluate_time_ms += start.elapsed().as_millis() as u64;
ctx.metrics.tokens_used = tokens;
+ // 4. Check budget status for adaptive decision
+ let budget_status = ctx.budget_controller.status();
+ let confidence = self.calculate_confidence(ctx);
+
+ // If budget is exhausted, force completion regardless of sufficiency
+ if budget_status.should_stop() && ctx.search_iterations >= 1 {
+ info!(
+ "Budget exhausted ({}/{}), completing with current results",
+ ctx.budget_controller.consumed(),
+ ctx.budget_controller.total_budget(),
+ );
+ ctx.result = Some(self.build_response(ctx));
+ ctx.record_reasoning(
+ StageName::Evaluate,
+ format!(
+ "Budget exhausted ({}/{}), forced completion; confidence={:.3}",
+ ctx.budget_controller.consumed(),
+ ctx.budget_controller.total_budget(),
+ confidence,
+ ),
+ NavigationDecision::Skip,
+ );
+ return Ok(StageOutcome::complete());
+ }
+
+ // 2.5 Record successful navigation paths to L2 cache
+ if confidence > 0.5 {
+ let doc_key = format!("{:?}", ctx.tree.root());
+ for candidate in ctx.candidates.iter().take(3) {
+ if let Some(node) = ctx.tree.get(candidate.node_id) {
+ let path = format!("{}", node.depth);
+ // Use the node title as path identifier for L2
+ ctx.reasoning_cache.l2_record(&doc_key, &node.title, candidate.score);
+ }
+ }
+ }
+
// 3. Decide next action based on sufficiency
let outcome = match ctx.sufficiency {
SufficiencyLevel::Sufficient => {
@@ -396,6 +436,29 @@ impl RetrievalStage for EvaluateStage {
ctx.metrics.llm_calls += 1;
}
+ // Record evaluation reasoning with budget status
+ let sufficiency_str = format!("{:?}", ctx.sufficiency);
+ let decision = match ctx.sufficiency {
+ SufficiencyLevel::Sufficient => NavigationDecision::ThisIsTheAnswer,
+ SufficiencyLevel::PartialSufficient => NavigationDecision::ExploreMore,
+ SufficiencyLevel::Insufficient => NavigationDecision::ExploreMore,
+ };
+ ctx.record_reasoning(
+ StageName::Evaluate,
+ format!(
+ "Sufficiency={}, confidence={:.3}, tokens={}, candidates={}, iteration={}, budget={:?} ({}/{})",
+ sufficiency_str,
+ self.calculate_confidence(ctx),
+ ctx.token_count,
+ ctx.candidates.len(),
+ ctx.search_iterations,
+ budget_status,
+ ctx.budget_controller.consumed(),
+ ctx.budget_controller.total_budget(),
+ ),
+ decision,
+ );
+
Ok(outcome)
}
}
diff --git a/rust/src/retrieval/stages/plan.rs b/rust/src/retrieval/stages/plan.rs
index 0b98003c..865f070f 100644
--- a/rust/src/retrieval/stages/plan.rs
+++ b/rust/src/retrieval/stages/plan.rs
@@ -15,9 +15,10 @@ use tracing::info;
// DocumentTree is accessed via context
use crate::llm::LlmClient;
use crate::retrieval::pipeline::{
- FailurePolicy, PipelineContext, RetrievalStage, SearchAlgorithm, SearchConfig, StageOutcome,
+ BudgetStatus, FailurePolicy, PipelineContext, RetrievalStage, SearchAlgorithm, SearchConfig,
+ StageOutcome,
};
-use crate::retrieval::types::{QueryComplexity, StrategyPreference};
+use crate::retrieval::types::{NavigationDecision, QueryComplexity, StageName, StrategyPreference};
/// Plan Stage - plans the retrieval strategy.
///
@@ -54,7 +55,7 @@ impl PlanStage {
self
}
- /// Select retrieval strategy based on complexity and preferences.
+ /// Select retrieval strategy based on complexity, preferences, and budget.
fn select_strategy(&self, ctx: &PipelineContext) -> StrategyPreference {
// Respect explicit strategy preference
if ctx.options.strategy != StrategyPreference::Auto {
@@ -62,6 +63,13 @@ impl PlanStage {
return ctx.options.strategy;
}
+ // Budget-aware strategy selection
+ let budget_status = ctx.budget_controller.status();
+ if budget_status.should_stop() {
+ info!("Budget exhausted, forcing Keyword strategy");
+ return StrategyPreference::ForceKeyword;
+ }
+
// Auto-select based on complexity
let complexity = ctx.complexity.unwrap_or(QueryComplexity::Medium);
@@ -71,8 +79,10 @@ impl PlanStage {
StrategyPreference::ForceKeyword
}
QueryComplexity::Medium => {
- // Use semantic if available, otherwise keyword with LLM fallback
- if self.llm_client.is_some() {
+ if budget_status == BudgetStatus::Constrained {
+ info!("Complexity is Medium but budget constrained, selecting Keyword strategy");
+ StrategyPreference::ForceKeyword
+ } else if self.llm_client.is_some() {
info!("Complexity is Medium, selecting LLM strategy");
StrategyPreference::ForceLlm
} else {
@@ -81,7 +91,14 @@ impl PlanStage {
}
}
QueryComplexity::Complex => {
- if self.llm_client.is_some() {
+ if budget_status == BudgetStatus::Constrained {
+ info!("Complexity is Complex but budget constrained, selecting Hybrid strategy");
+ if self.llm_client.is_some() {
+ StrategyPreference::ForceHybrid
+ } else {
+ StrategyPreference::ForceKeyword
+ }
+ } else if self.llm_client.is_some() {
info!("Complexity is Complex, selecting LLM strategy");
StrategyPreference::ForceLlm
} else {
@@ -177,6 +194,34 @@ impl RetrievalStage for PlanStage {
.unwrap_or(0)
);
+ // Record reasoning
+ let strategy_str = ctx
+ .selected_strategy
+ .map(|s| format!("{:?}", s))
+ .unwrap_or_else(|| "auto".to_string());
+ let algorithm_str = ctx
+ .selected_algorithm
+ .map(|a| a.name().to_string())
+ .unwrap_or_else(|| "unknown".to_string());
+ let beam_width = ctx
+ .search_config
+ .as_ref()
+ .map(|c| c.beam_width)
+ .unwrap_or(3);
+ ctx.record_reasoning(
+ StageName::Plan,
+ format!(
+ "Selected strategy={}, algorithm={}, beam_width={}; budget: {}/{} ({:.0}%)",
+ strategy_str,
+ algorithm_str,
+ beam_width,
+ ctx.budget_controller.consumed(),
+ ctx.budget_controller.total_budget(),
+ ctx.budget_controller.utilization() * 100.0
+ ),
+ NavigationDecision::ExploreMore,
+ );
+
Ok(StageOutcome::cont())
}
}
diff --git a/rust/src/retrieval/stages/search.rs b/rust/src/retrieval/stages/search.rs
index 44a8de2c..929dad76 100644
--- a/rust/src/retrieval/stages/search.rs
+++ b/rust/src/retrieval/stages/search.rs
@@ -13,20 +13,24 @@ use std::sync::Arc;
use tracing::{debug, info, warn};
use crate::document::DocumentTree;
+use crate::document::ReasoningIndex;
use crate::llm::LlmClient;
use crate::retrieval::RetrievalContext;
use crate::retrieval::pilot::Pilot;
+use crate::retrieval::cache::CachedCandidate;
use crate::retrieval::pipeline::{
- CandidateNode, FailurePolicy, PipelineContext, RetrievalStage, SearchAlgorithm, StageOutcome,
+ BudgetStatus, CandidateNode, FailurePolicy, PipelineContext, RetrievalStage, SearchAlgorithm,
+ StageOutcome,
};
use crate::retrieval::search::{
BeamSearch, GreedySearch, SearchConfig as SearchAlgConfig, SearchCue, SearchTree,
ToCNavigator,
};
+use crate::retrieval::search::extract_keywords;
use crate::retrieval::strategy::{
HybridConfig, HybridStrategy, KeywordStrategy, LlmStrategy, RetrievalStrategy,
};
-use crate::retrieval::types::StrategyPreference;
+use crate::retrieval::types::{NavigationDecision, ReasoningCandidate, ReasoningStep, StageName, StrategyPreference};
/// Search Stage - executes tree search with optional Pilot guidance.
///
@@ -301,6 +305,115 @@ impl SearchStage {
(all_paths, all_candidates)
}
+
+ /// Check if a query is asking for a document summary/overview.
+ fn is_summary_query(query: &str) -> bool {
+ let lower = query.to_lowercase();
+ let patterns = [
+ "what is this document",
+ "what is this about",
+ "summarize",
+ "summary",
+ "overview",
+ "give me an overview",
+ "describe this document",
+ "main topics",
+ "table of contents",
+ "这篇文档讲了什么",
+ "总结",
+ "概述",
+ "概要",
+ "主要内容",
+ "文档简介",
+ "介绍一下",
+ ];
+ patterns.iter().any(|p| lower.contains(p))
+ }
+
+ /// Try to match the query against pre-computed reasoning index entries.
+ ///
+ /// Returns candidates if a high-confidence match is found, None otherwise.
+ fn try_reasoning_shortcut(
+ ridx: &ReasoningIndex,
+ ctx: &PipelineContext,
+ ) -> Option> {
+ // Check 1: Summary shortcut — handle "overview" style queries
+ if let Some(ref shortcut) = ridx.summary_shortcut() {
+ if Self::is_summary_query(&ctx.query) {
+ let mut candidates = vec![CandidateNode::new(
+ shortcut.root_node,
+ 1.0,
+ 0,
+ ctx.tree.is_leaf(shortcut.root_node),
+ )];
+ for section in &shortcut.section_summaries {
+ candidates.push(CandidateNode::new(
+ section.node_id,
+ 0.9,
+ section.depth,
+ ctx.tree.is_leaf(section.node_id),
+ ));
+ }
+ return Some(candidates);
+ }
+ }
+
+ // Check 2: Keyword → Topic path matching
+ let keywords = extract_keywords(&ctx.query);
+ if keywords.is_empty() {
+ return None;
+ }
+
+ let mut scored_nodes: std::collections::HashMap =
+ std::collections::HashMap::new();
+ for keyword in &keywords {
+ if let Some(entries) = ridx.topic_entries(keyword) {
+ for entry in entries {
+ let score = scored_nodes.entry(entry.node_id).or_insert(0.0);
+ *score += entry.weight;
+ }
+ }
+ }
+
+ if scored_nodes.is_empty() {
+ return None;
+ }
+
+ // Boost hot nodes by 20%
+ for (node_id, score) in scored_nodes.iter_mut() {
+ if ridx.is_hot(*node_id) {
+ *score *= 1.2;
+ }
+ }
+
+ // Convert to candidates, only return if best match is high-confidence
+ let mut candidates: Vec = scored_nodes
+ .into_iter()
+ .filter_map(|(node_id, score)| {
+ let depth = ctx.tree.get(node_id).map(|n| n.depth)?;
+ Some(CandidateNode::new(
+ node_id,
+ score,
+ depth,
+ ctx.tree.is_leaf(node_id),
+ ))
+ })
+ .collect();
+
+ candidates.sort_by(|a, b| {
+ b.score
+ .partial_cmp(&a.score)
+ .unwrap_or(std::cmp::Ordering::Equal)
+ });
+
+ // Only return shortcut results if we have a high-confidence match
+ let best_score = candidates.first().map(|c| c.score).unwrap_or(0.0);
+ if best_score > 0.5 {
+ Some(candidates)
+ } else {
+ None
+ }
+ }
}
#[async_trait]
@@ -331,21 +444,91 @@ impl RetrievalStage for SearchStage {
let algorithm = ctx.selected_algorithm.unwrap_or(SearchAlgorithm::Beam);
let config = ctx.search_config.clone().unwrap_or_default();
+ // Budget check: skip search iteration if exhausted
+ let budget_status = ctx.budget_controller.status();
+ if budget_status.should_stop() && ctx.search_iterations > 0 {
+ info!(
+ "Budget exhausted ({}/{}), skipping search iteration",
+ ctx.budget_controller.consumed(),
+ ctx.budget_controller.total_budget(),
+ );
+ ctx.record_reasoning(
+ StageName::Search,
+ format!(
+ "Budget exhausted ({}/{}), returning current candidates",
+ ctx.budget_controller.consumed(),
+ ctx.budget_controller.total_budget(),
+ ),
+ NavigationDecision::Skip,
+ );
+ return Ok(StageOutcome::complete());
+ }
+
// Reset Pilot state for new query
if let Some(ref pilot) = self.pilot {
pilot.reset();
debug!("SearchStage: Pilot is available, is_active={}", pilot.is_active());
}
+ // Apply budget-aware beam width adjustment
+ let effective_beam = ctx
+ .budget_controller
+ .suggested_beam_width(config.beam_width, ctx.search_iterations);
+
info!(
- "Executing search: algorithm={:?}, beam_width={}, pilot={}",
+ "Executing search: algorithm={:?}, beam_width={} (budget: {:?}), pilot={}",
algorithm,
- config.beam_width,
+ effective_beam,
+ budget_status,
if self.has_pilot() { "enabled" } else { "disabled" }
);
ctx.increment_search_iteration();
+ // === L1 Cache check: return cached results if available ===
+ if ctx.options.enable_cache && ctx.search_iterations <= 1 {
+ let scope_fp = crate::utils::fingerprint::Fingerprint::from_str(
+ &format!("{:?}", ctx.tree.root()),
+ );
+ if let Some(cached) = ctx.reasoning_cache.l1_get(&ctx.query, &scope_fp) {
+ info!("L1 cache hit for query, returning {} cached candidates", cached.len());
+ ctx.candidates = cached
+ .into_iter()
+ .map(|c| CandidateNode::new(c.node_id, c.score, c.depth, ctx.tree.is_leaf(c.node_id)))
+ .collect();
+ ctx.metrics.cache_hits += 1;
+ ctx.record_reasoning(
+ StageName::Search,
+ format!(
+ "L1 cache hit: {} candidates returned from cache",
+ ctx.candidates.len()
+ ),
+ NavigationDecision::ThisIsTheAnswer,
+ );
+ return Ok(StageOutcome::cont());
+ }
+ ctx.metrics.cache_misses += 1;
+ }
+
+ // === Reasoning Index Quick Match ===
+ // Check pre-computed index before running expensive ToC navigation.
+ if let Some(ref ridx) = ctx.reasoning_index {
+ if let Some(shortcut_candidates) = Self::try_reasoning_shortcut(ridx, ctx) {
+ info!(
+ "Reasoning index shortcut match, returning {} candidates",
+ shortcut_candidates.len()
+ );
+ ctx.candidates = shortcut_candidates;
+ ctx.metrics.cache_hits += 1;
+ ctx.record_reasoning(
+ StageName::Search,
+ "Reasoning index shortcut: direct path match".to_string(),
+ NavigationDecision::ThisIsTheAnswer,
+ );
+ return Ok(StageOutcome::cont());
+ }
+ }
+
// === Phase Locate: find relevant subtrees via ToC ===
// Use depth-1 nodes (root's direct children = top-level sections).
// level(0) is only the root itself, which is not useful for locating.
@@ -356,13 +539,26 @@ impl RetrievalStage for SearchStage {
.map(|nodes| nodes.to_vec())
.unwrap_or_else(|| ctx.tree.children(ctx.tree.root()));
- let cues = self
+ let mut cues = self
.toc_navigator
.locate(&ctx.query, &ctx.tree, &top_level_nodes)
.await;
debug!("ToCNavigator returned {} cues", cues.len());
+ // Inject structure hints from Analyze stage as high-priority cues
+ if !ctx.resolved_path_hints.is_empty() {
+ for (hint_text, node_id) in &ctx.resolved_path_hints {
+ if ctx.tree.get(*node_id).is_some() {
+ info!("Injecting structure hint '{}' as search cue", hint_text);
+ cues.push(SearchCue {
+ root: *node_id,
+ confidence: 1.0, // Direct match from query structure
+ });
+ }
+ }
+ }
+
// === Resolve queries (decomposed or original) ===
let queries = Self::resolve_queries(ctx);
@@ -407,16 +603,119 @@ impl RetrievalStage for SearchStage {
}
}
- // Update metrics
+ // Update metrics and budget
ctx.metrics.search_time_ms += start.elapsed().as_millis() as u64;
ctx.metrics.nodes_visited += ctx.candidates.len();
+ // Update hot node tracker with retrieval results
+ if let Some(ref tracker) = ctx.hot_tracker {
+ let hits: Vec<(crate::document::NodeId, f32)> = ctx
+ .candidates
+ .iter()
+ .map(|c| (c.node_id, c.score))
+ .collect();
+ tracker.record_hits(&hits);
+ }
+ // Estimate tokens consumed by this search iteration (content-based heuristic)
+ let search_tokens: usize = ctx
+ .candidates
+ .iter()
+ .filter_map(|c| ctx.tree.get(c.node_id).map(|n| n.content.len()))
+ .sum::()
+ / 4; // rough: 4 chars ≈ 1 token
+ ctx.budget_controller.record_tokens(search_tokens);
+
+ // Store results in L1 cache
+ if ctx.options.enable_cache && ctx.search_iterations <= 1 && !ctx.candidates.is_empty() {
+ let scope_fp = crate::utils::fingerprint::Fingerprint::from_str(
+ &format!("{:?}", ctx.tree.root()),
+ );
+ let cached: Vec = ctx
+ .candidates
+ .iter()
+ .map(|c| CachedCandidate {
+ node_id: c.node_id,
+ score: c.score,
+ depth: c.depth,
+ })
+ .collect();
+ ctx.reasoning_cache.l1_store(
+ &ctx.query,
+ scope_fp,
+ cached,
+ ctx.selected_strategy
+ .map(|s| format!("{:?}", s))
+ .unwrap_or_else(|| "auto".to_string()),
+ );
+ }
+
info!(
"Search complete: {} candidates (iteration {})",
ctx.candidates.len(),
ctx.search_iterations
);
+ // Record reasoning — collect data first to avoid borrow conflicts
+ let strategy_str = ctx
+ .selected_strategy
+ .map(|s| format!("{:?}", s))
+ .unwrap_or_else(|| "auto".to_string());
+ let search_iterations = ctx.search_iterations;
+
+ let reasoning_data: Vec<(String, Option, f32, usize, String, Vec)> = ctx
+ .candidates
+ .iter()
+ .take(5)
+ .map(|candidate| {
+ let (title, depth) = ctx
+ .tree
+ .get(candidate.node_id)
+ .map(|n| (n.title.clone(), n.depth))
+ .unwrap_or_else(|| ("(unknown)".to_string(), 0));
+
+ let considered: Vec = ctx
+ .candidates
+ .iter()
+ .filter(|c| c.node_id != candidate.node_id)
+ .take(5)
+ .filter_map(|c| {
+ ctx.tree.get(c.node_id).map(|n| ReasoningCandidate {
+ node_id: format!("{:?}", c.node_id),
+ title: n.title.clone(),
+ score: c.score,
+ })
+ })
+ .collect();
+
+ let reasoning = format!(
+ "Candidate '{}' (score={:.3}) found via {} search, iteration {}",
+ title, candidate.score, algorithm.name(), search_iterations
+ );
+
+ (format!("{:?}", candidate.node_id), Some(title), candidate.score, depth, reasoning, considered)
+ })
+ .collect();
+
+ for (node_id, title, score, depth, reasoning, considered) in reasoning_data {
+ ctx.push_reasoning_step(ReasoningStep {
+ stage: StageName::Search,
+ node_id: Some(node_id),
+ title,
+ score,
+ decision: if score > 0.7 {
+ NavigationDecision::ThisIsTheAnswer
+ } else {
+ NavigationDecision::ExploreMore
+ },
+ depth,
+ reasoning,
+ candidates: considered,
+ strategy_used: Some(strategy_str.clone()),
+ llm_call: None,
+ references_followed: Vec::new(),
+ });
+ }
+
Ok(StageOutcome::cont())
}
}
diff --git a/rust/src/retrieval/strategy/cross_document.rs b/rust/src/retrieval/strategy/cross_document.rs
index d451f5c7..fe43f775 100644
--- a/rust/src/retrieval/strategy/cross_document.rs
+++ b/rust/src/retrieval/strategy/cross_document.rs
@@ -8,9 +8,10 @@
use async_trait::async_trait;
use std::collections::HashMap;
+use std::sync::Arc;
use super::r#trait::{NodeEvaluation, RetrievalStrategy, StrategyCapabilities};
-use crate::document::{DocumentTree, NodeId};
+use crate::document::{DocumentGraph, DocumentTree, NodeId};
use crate::retrieval::types::{NavigationDecision, QueryComplexity};
use crate::retrieval::RetrievalContext;
@@ -61,6 +62,8 @@ pub enum MergeStrategy {
BestPerDocument,
/// Weight results by document relevance score.
WeightedByRelevance,
+ /// Use graph connectivity to boost connected documents.
+ GraphBoosted,
}
/// Configuration for cross-document retrieval.
@@ -122,6 +125,8 @@ pub struct CrossDocumentStrategy {
config: CrossDocumentConfig,
/// Documents to search.
documents: Vec,
+ /// Optional document graph for graph-aware ranking.
+ graph: Option>,
}
impl CrossDocumentStrategy {
@@ -131,6 +136,7 @@ impl CrossDocumentStrategy {
inner,
config: CrossDocumentConfig::default(),
documents: Vec::new(),
+ graph: None,
}
}
@@ -158,6 +164,59 @@ impl CrossDocumentStrategy {
self.documents.len()
}
+ /// Set the document graph for graph-aware ranking.
+ pub fn with_graph(mut self, graph: Arc) -> Self {
+ self.graph = Some(graph);
+ self
+ }
+
+ /// Apply graph-based score boosting to merged results.
+ ///
+ /// For each high-confidence result (score > 0.5), find its graph neighbors
+ /// and boost their scores by `boost_factor * edge_weight`.
+ fn apply_graph_boost(
+ &self,
+ results: &mut Vec<(DocumentId, NodeId, NodeEvaluation)>,
+ boost_factor: f32,
+ ) {
+ let graph = match self.graph {
+ Some(ref g) => g,
+ None => return,
+ };
+
+ // Collect doc_ids with high scores
+ let high_score_docs: Vec<(String, f32)> = results
+ .iter()
+ .filter(|(_, _, eval)| eval.score > 0.5)
+ .map(|(doc_id, _, eval)| (doc_id.clone(), eval.score))
+ .collect();
+
+ if high_score_docs.is_empty() {
+ return;
+ }
+
+ // For each high-score doc, boost its graph neighbors
+ for (doc_id, base_score) in &high_score_docs {
+ let neighbors = graph.get_neighbors(doc_id);
+ for edge in neighbors {
+ // Find results from the neighbor doc and boost them
+ for result in results.iter_mut() {
+ if result.0 == edge.target_doc_id {
+ let boost = boost_factor * edge.weight * base_score;
+ result.2.score += boost;
+ }
+ }
+ }
+ }
+
+ // Re-sort by score after boosting
+ results.sort_by(|a, b| {
+ b.2.score
+ .partial_cmp(&a.2.score)
+ .unwrap_or(std::cmp::Ordering::Equal)
+ });
+ }
+
/// Search a single document and return results.
async fn search_document(
&self,
@@ -251,6 +310,26 @@ impl CrossDocumentStrategy {
all_results.truncate(self.config.max_total_results);
all_results
}
+
+ MergeStrategy::GraphBoosted => {
+ // First do TopK merge
+ let mut all_results: Vec<_> = doc_results
+ .into_iter()
+ .flat_map(|doc| {
+ doc.evaluations.into_iter().map(move |(node_id, eval)| {
+ (doc.doc_id.clone(), node_id, eval)
+ })
+ })
+ .collect();
+
+ all_results.sort_by(|a, b| b.2.score.partial_cmp(&a.2.score).unwrap_or(std::cmp::Ordering::Equal));
+
+ // Apply graph-based boosting
+ self.apply_graph_boost(&mut all_results, 0.15);
+
+ all_results.truncate(self.config.max_total_results);
+ all_results
+ }
}
}
}
diff --git a/rust/src/retrieval/stream.rs b/rust/src/retrieval/stream.rs
new file mode 100644
index 00000000..33aa75b7
--- /dev/null
+++ b/rust/src/retrieval/stream.rs
@@ -0,0 +1,128 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Streaming retrieval events.
+//!
+//! When `RetrieveOptions::streaming` is enabled, retrieval emits
+//! [`RetrieveEvent`]s incrementally as the pipeline progresses through
+//! its stages (Analyze → Plan → Search → Evaluate).
+//!
+//! # Example
+//!
+//! ```rust,ignore
+//! let options = RetrieveOptions::new().with_streaming(true);
+//! let rx = client.query_stream(&tree, "query", &options).await?;
+//!
+//! while let Some(event) = rx.recv().await {
+//! match event {
+//! RetrieveEvent::Started { query, .. } => println!("Started: {query}"),
+//! RetrieveEvent::StageCompleted { stage, .. } => println!("Done: {stage}"),
+//! RetrieveEvent::Completed { response } => {
+//! println!("Confidence: {}", response.confidence);
+//! break;
+//! }
+//! RetrieveEvent::Error { message } => {
+//! eprintln!("Error: {message}");
+//! break;
+//! }
+//! _ => {}
+//! }
+//! }
+//! ```
+
+use tokio::sync::mpsc;
+
+use super::types::{RetrieveResponse, SufficiencyLevel};
+
+/// Events emitted during streaming retrieval.
+///
+/// Each event represents a meaningful milestone in the retrieval pipeline.
+/// The stream always terminates with either [`Completed`](RetrieveEvent::Completed)
+/// or [`Error`](RetrieveEvent::Error).
+#[derive(Debug, Clone)]
+pub enum RetrieveEvent {
+ /// Retrieval pipeline started.
+ Started {
+ /// The query string.
+ query: String,
+ /// Planned retrieval strategy name.
+ strategy: String,
+ },
+
+ /// A pipeline stage completed.
+ StageCompleted {
+ /// Stage name (analyze, plan, search, evaluate).
+ stage: String,
+ /// Time spent in this stage (ms).
+ elapsed_ms: u64,
+ },
+
+ /// A node was visited during tree traversal.
+ NodeVisited {
+ /// Node ID.
+ node_id: String,
+ /// Node title.
+ title: String,
+ /// Relevance score (0.0 - 1.0).
+ score: f32,
+ },
+
+ /// Relevant content was found.
+ ContentFound {
+ /// Node ID.
+ node_id: String,
+ /// Node title.
+ title: String,
+ /// Short preview of the content.
+ preview: String,
+ /// Relevance score.
+ score: f32,
+ },
+
+ /// Pipeline is backtracking to an earlier stage.
+ Backtracking {
+ /// Stage backtracking from.
+ from: String,
+ /// Stage backtracking to.
+ to: String,
+ /// Reason for backtracking.
+ reason: String,
+ },
+
+ /// Sufficiency check result.
+ SufficiencyCheck {
+ /// Sufficiency level.
+ level: SufficiencyLevel,
+ /// Total tokens collected so far.
+ tokens: usize,
+ },
+
+ /// Retrieval completed successfully with final results.
+ Completed {
+ /// The full retrieval response.
+ response: RetrieveResponse,
+ },
+
+ /// An error occurred during retrieval.
+ Error {
+ /// Error message.
+ message: String,
+ },
+}
+
+/// Sender half for streaming retrieval events.
+pub(crate) type RetrieveEventSender = mpsc::Sender;
+
+/// Receiver half for streaming retrieval events.
+pub type RetrieveEventReceiver = mpsc::Receiver;
+
+/// Create a bounded channel for streaming retrieval events.
+///
+/// The bound defaults to 64 events. The sender will apply backpressure
+/// when the receiver cannot keep up, preventing unbounded memory growth.
+pub(crate) fn channel(bound: usize) -> (RetrieveEventSender, RetrieveEventReceiver) {
+ mpsc::channel(bound)
+}
+
+/// Default channel bound for streaming events.
+pub const DEFAULT_STREAM_BOUND: usize = 64;
diff --git a/rust/src/retrieval/types.rs b/rust/src/retrieval/types.rs
index 3057b7dc..fa1b7e1c 100644
--- a/rust/src/retrieval/types.rs
+++ b/rust/src/retrieval/types.rs
@@ -118,6 +118,13 @@ pub struct RetrieveOptions {
/// Whether to use async context building for large documents.
pub use_async_context: bool,
+
+ /// Enable streaming retrieval results.
+ ///
+ /// When enabled, use `query_stream()` to receive incremental
+ /// `RetrieveEvent`s as each pipeline stage completes. When disabled
+ /// (default), the standard `query()` returns a single final result.
+ pub streaming: bool,
}
impl Default for RetrieveOptions {
@@ -136,6 +143,7 @@ impl Default for RetrieveOptions {
pruning_strategy: super::PruningStrategy::default(),
token_estimation: super::TokenEstimation::default(),
use_async_context: false,
+ streaming: false,
}
}
}
@@ -237,6 +245,13 @@ impl RetrieveOptions {
self.use_async_context = enable;
self
}
+
+ /// Enable streaming retrieval results.
+ #[must_use]
+ pub fn with_streaming(mut self, enable: bool) -> Self {
+ self.streaming = enable;
+ self
+ }
}
/// A single retrieval result.
@@ -343,8 +358,8 @@ pub struct RetrieveResponse {
/// Detected query complexity.
pub complexity: QueryComplexity,
- /// Search trace for debugging.
- pub trace: Vec,
+ /// Reasoning chain explaining how results were found.
+ pub reasoning_chain: ReasoningChain,
/// Total tokens used.
pub tokens_used: usize,
@@ -359,7 +374,7 @@ impl Default for RetrieveResponse {
is_sufficient: false,
strategy_used: String::new(),
complexity: QueryComplexity::Medium,
- trace: Vec::new(),
+ reasoning_chain: ReasoningChain::default(),
tokens_used: 0,
}
}
@@ -420,6 +435,135 @@ pub enum NavigationDecision {
Skip,
}
+/// Pipeline stage name for reasoning chain provenance.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
+pub enum StageName {
+ /// Query analysis stage.
+ Analyze,
+ /// Strategy planning stage.
+ Plan,
+ /// Tree search stage.
+ Search,
+ /// Sufficiency evaluation stage.
+ Evaluate,
+}
+
+impl std::fmt::Display for StageName {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ match self {
+ Self::Analyze => write!(f, "analyze"),
+ Self::Plan => write!(f, "plan"),
+ Self::Search => write!(f, "search"),
+ Self::Evaluate => write!(f, "evaluate"),
+ }
+ }
+}
+
+/// Summary of an LLM call made during a reasoning step.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct LlmCallSummary {
+ /// Truncated prompt summary for display.
+ pub prompt_summary: String,
+ /// Tokens consumed by this call.
+ pub tokens_used: usize,
+ /// Model identifier.
+ pub model: String,
+}
+
+/// A candidate node considered but not selected during reasoning.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct ReasoningCandidate {
+ /// Node ID.
+ pub node_id: String,
+ /// Node title.
+ pub title: String,
+ /// Relevance score of this candidate.
+ pub score: f32,
+}
+
+/// A single step in the reasoning chain.
+///
+/// Unlike `NavigationStep` which only records "where" the search went,
+/// `ReasoningStep` also records "why" — the decision rationale,
+/// candidates considered, strategy used, and any LLM calls made.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct ReasoningStep {
+ /// Which pipeline stage produced this step.
+ pub stage: StageName,
+ /// Node ID visited (if applicable).
+ pub node_id: Option,
+ /// Node title (if applicable).
+ pub title: Option,
+ /// Relevance score at this step.
+ pub score: f32,
+ /// Decision made at this step.
+ pub decision: NavigationDecision,
+ /// Depth in tree.
+ pub depth: usize,
+ /// Human-readable explanation of why this decision was made.
+ pub reasoning: String,
+ /// Candidates considered but not selected at this step.
+ pub candidates: Vec,
+ /// Strategy used at this step (e.g. "keyword", "hybrid").
+ pub strategy_used: Option,
+ /// LLM call summary, if an LLM was consulted.
+ pub llm_call: Option