diff --git a/README.md b/README.md index 77f8eb78..96b68658 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,20 @@
-Vectorless - -

Document inteligence engine for AI

+
+ Vectorless +   + + Vectorless + +
+

Reasoning-native Document Intelligence Engine

[![PyPI](https://img.shields.io/pypi/v/vectorless.svg)](https://pypi.org/project/vectorless/) [![Python](https://img.shields.io/pypi/pyversions/vectorless.svg)](https://pypi.org/project/vectorless/) [![PyPI Downloads](https://static.pepy.tech/badge/vectorless/month)](https://pepy.tech/projects/vectorless) [![Crates.io](https://img.shields.io/crates/v/vectorless.svg)](https://crates.io/crates/vectorless) -[![Crates.io Downloads](https://img.shields.io/crates/d/vectorless.svg)](https://crates.io/crates/vectorless) +[![Crates.io Downloads](https://img.shields.io/crates/v/vectorless.svg)](https://crates.io/crates/vectorless) [![Docs](https://docs.rs/vectorless/badge.svg)](https://docs.rs/vectorless) [![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE) [![Rust](https://img.shields.io/badge/rust-1.85%2B-orange.svg)](https://www.rust-lang.org/) @@ -18,102 +23,40 @@ **Vectorless** is an ultra-performant reasoning-native document intelligence engine for AI, with the core written in Rust. It transforms documents into rich semantic trees and uses LLMs to intelligently traverse the hierarchy — retrieving the most relevant content through structural reasoning and deep contextual understanding. -⭐ Drop a star to help us grow! - -## How It Works - -How it works - -### 1. Index: Build a Navigable Tree - -``` -Technical Manual (root) -├── Chapter 1: Introduction -├── Chapter 2: Architecture -│ ├── 2.1 System Design -│ └── 2.2 Implementation -└── Chapter 3: API Reference -``` - -Each node gets an AI-generated summary, enabling fast navigation. - -### 2. Query: Navigate with LLM - -When you ask "How do I reset the device?": - -1. **Analyze** — Understand query intent and complexity -2. **Navigate** — LLM guides tree traversal -3. **Retrieve** — Return the exact section with context -4. **Verify** — Check if more information is needed +Vectorless -## Traditional RAG vs Vectorless -Traditional RAG vs Vectorless - -| Aspect | Traditional RAG | Vectorless | -|--------|----------------|------------| -| **Infrastructure** | Vector DB + Embedding Model | Just LLM API | -| **Document Structure** | Lost in chunking | Preserved | -| **Context** | Fragment only | Section + surrounding context | -| **Setup Time** | Hours to Days | Minutes | -| **Best For** | Unstructured text | Structured documents | - -## Example - -**Input:** -``` -Document: 100-page technical manual (PDF) -Query: "How do I reset the device?" -``` +## Quick Start -**Output:** -``` -Answer: "To reset the device, hold the power button for 10 seconds -until the LED flashes blue, then release..." +### Install -Source: Chapter 4 > Section 4.2 > Reset Procedure +```bash +pip install vectorless ``` -## When to Use - -✅ **Good fit:** -- Technical documentation -- Manuals and guides -- Structured reports -- Policy documents -- Any document with clear hierarchy - -❌ **Not ideal:** -- Unstructured text (tweets, chat logs) -- Very short documents (< 1 page) -- Pure Q&A datasets without structure - -## Quick Start - -
-Python +### Set your API key ```bash -pip install vectorless +export OPENAI_API_KEY="sk-..." ``` +### Index and Query + ```python from vectorless import Engine, IndexContext -# Create engine (uses OPENAI_API_KEY env var) +# Create engine with a workspace directory engine = Engine(workspace="./data") -# Index a document -ctx = IndexContext.from_file("./report.pdf") -doc_id = engine.index(ctx) +# Index a document (PDF, Markdown, DOCX, HTML) +doc_id = engine.index(IndexContext.from_file("./report.pdf")) # Query result = engine.query(doc_id, "What is the total revenue?") -print(f"Answer: {result.content}") +print(result.content) +print(f"Score: {result.score}") ``` -
-
Rust @@ -122,158 +65,30 @@ print(f"Answer: {result.content}") vectorless = "0.1" ``` -```bash -cp vectorless.example.toml ./vectorless.toml -``` - ```rust -use vectorless::Engine; +use vectorless::client::{Engine, EngineBuilder, IndexContext}; #[tokio::main] async fn main() -> vectorless::Result<()> { - let client = Engine::builder() - .with_workspace("./workspace") - .build()?; - - let doc_id = client.index("./document.pdf").await?; + let engine = EngineBuilder::new() + .with_workspace("./data") + .build() + .await?; - let result = client.query(&doc_id, - "What are the system requirements?").await?; + // Index + let doc_id = engine.index(IndexContext::from_path("./report.pdf")).await?; + // Query + let result = engine.query(&doc_id, "What is the total revenue?").await?; println!("Answer: {}", result.content); - println!("Source: {}", result.path); Ok(()) } ``` - -
- -## Features - -| Feature | Description | -|---------|-------------| -| **Zero Infrastructure** | No vector DB, no embedding model — just an LLM API | -| **Multi-format Support** | PDF, Markdown, DOCX, HTML out of the box | -| **Incremental Updates** | Add/remove documents without full re-index | -| **Traceable Results** | See the exact navigation path taken | -| **Feedback Learning** | Improves from user feedback over time | -| **Multi-turn Queries** | Handles complex questions with decomposition | - -## Configuration - -### Zero Configuration (Recommended) - -Just set `OPENAI_API_KEY` and you're ready to go: - -```bash -export OPENAI_API_KEY="sk-..." -``` - -
-Python - -```python -from vectorless import Engine - -# Uses OPENAI_API_KEY from environment -engine = Engine(workspace="./data") -``` -
-
-Rust - -```rust -use vectorless::Engine; - -let client = Engine::builder() - .with_workspace("./workspace") - .build().await?; -``` - -
- -### Environment Variables - -| Variable | Description | -|----------|-------------| -| `OPENAI_API_KEY` | LLM API key | -| `VECTORLESS_MODEL` | Default model (e.g., `gpt-4o-mini`) | -| `VECTORLESS_ENDPOINT` | API endpoint URL | -| `VECTORLESS_WORKSPACE` | Workspace directory | - -### Advanced Configuration - -For fine-grained control, use a config file: - -```bash -cp config.toml ./vectorless.toml -``` - -
-Python - -```python -from vectorless import Engine - -# Use full configuration file -engine = Engine(config_path="./vectorless.toml") - -# Or override specific settings -engine = Engine( - config_path="./vectorless.toml", - model="gpt-4o", # Override model from config -) -``` - -
- -
-Rust - -```rust -use vectorless::Engine; - -// Use full configuration file -let client = Engine::builder() - .with_config_path("./vectorless.toml") - .build().await?; - -// Or override specific settings -let client = Engine::builder() - .with_config_path("./vectorless.toml") - .with_model("gpt-4o", None) // Override model - .build().await?; -``` - -
- -### Configuration Priority - -Later overrides earlier: - -1. Default configuration -2. Auto-detected config file (`vectorless.toml`, `config.toml`, `.vectorless.toml`) -3. Explicit config file (`config_path` / `with_config_path`) -4. Environment variables -5. Constructor/builder parameters (highest priority) - -## Architecture - -Architecture - -### Core Components - -- **Index Pipeline** — Parses documents, builds tree, generates summaries -- **Retrieval Pipeline** — Analyzes query, navigates tree, returns results -- **Pilot** — LLM-powered navigator that guides retrieval decisions -- **Metrics Hub** — Unified observability for LLM calls, retrieval, and feedback - ## Examples - -See the [examples/](examples/) directory for more usage patterns. +See [examples/](examples/) for more Rust patterns — streaming, document graph, custom pilot, cross-document retrieval, and more.| ## Contributing diff --git a/docs/README.md b/docs/README.md deleted file mode 100644 index 9e0b0096..00000000 --- a/docs/README.md +++ /dev/null @@ -1,54 +0,0 @@ -# Vectorless Documentation - -Welcome to the Vectorless documentation. - -## What is Vectorless? - -Vectorless is a **reasoning-native document intelligence engine** that uses LLM-powered tree navigation instead of vector embeddings. It preserves document structure and uses intelligent navigation to find relevant content. - -## Key Features - -- **Dual Pipeline Architecture** - Separate Index and Retrieval pipelines -- **Pilot System** - LLM-guided navigation with layered fallback -- **Multi-Strategy Retrieval** - Keyword, LLM, and Structure-aware strategies -- **Zero Infrastructure** - No vector database, no embeddings -- **Multi-Format Support** - Markdown, PDF, DOCX, HTML - -## Getting Started - -- [Quick Start Guide](guides/quick-start.md) - Get up and running in 5 minutes - -## Guides - -| Guide | Description | -|-------|-------------| -| [Quick Start](guides/quick-start.md) | Get up and running quickly | -| [Dual Pipeline](guides/dual-pipeline.md) | Understand Index + Retrieval pipelines | -| [Pilot System](guides/pilot-system.md) | LLM-guided navigation | -| [Multi-Strategy Retrieval](guides/multi-strategy.md) | Keyword, LLM, Structure strategies | - -## Design Documents - -System architecture and core mechanism documentation. - -| Document | Description | -|----------|-------------| -| [pilot.md](design/pilot.md) | Pilot system design | -| [content-aggregation.md](design/content-aggregation.md) | Content aggregation design | -| [client-module.md](design/client-module.md) | Client API design | -| [v3.md](design/v3.md) | Version 3 architecture | - -## RFCs (Feature Proposals) - -Detailed design documents for new features. - -| RFC | Title | Status | -|-----|-------|--------| -| [0001](rfcs/0001-docx-parser.md) | DOCX Parser | Implemented | -| [0002](rfcs/0002-html-parser.md) | HTML Parser | Implemented | - -### RFC Process - -1. Create `rfcs/0XXX-feature-name.md` using the [template](rfcs/template.md) -2. Discuss and refine the design -3. Once approved, implement and update status to "Implemented" diff --git a/docs/design/architecture.svg b/docs/design/architecture.svg deleted file mode 100644 index cb782610..00000000 --- a/docs/design/architecture.svg +++ /dev/null @@ -1,198 +0,0 @@ - - - - - - Vectorless Architecture - - - - Engine Client - - - - Config (TOML) - • LLM pool settings - • Metrics config - • Pilot + feedback - • Scoring strategy - - - - Workspace - • Persistence (JSON) - • LRU Cache - • Feedback Store - - - - Index Pipeline - - - - Parse - MD/PDF - - - - - Build - Tree - - - - - Enhance - ToC - Sections - - - - - Enrich - LLM - Summary - - - - - Optimize - Thin - - - - Retrieval Pipeline - - - - Analyze - • Complexity detect - • Decompose query - - - - - Plan - • Strategy select - • Algorithm config - - - - - Search - • Tree traversal - • Pilot guidance - - - - - Judge - • Sufficiency check - • Backtrack control - - - Scoring Strategies: - - - Keyword Only - • TF-IDF overlap - • Fast, no API calls - - - BM25 - • IDF + TF normalization - • Better relevance - - - Hybrid (Default) - • 40% keyword + 60% BM25 - • Best balance - - - - Query Decomposition - • Split complex queries into sub-queries - • Execute in dependency order - - - - Pilot + Feedback Learning - • LLM-guided navigation - • Learns from user feedback - - - - LLM Executor - • Throttle control - • Retry with backoff - • Fallback chain - • Unified metrics - - - - Unified Metrics Hub - - - LLM Metrics - calls • tokens • latency • cost - - - Pilot Metrics - decisions • confidence • accuracy - - - Retrieval Metrics - paths • scores • cache hits - - - Feedback Stats - accuracy • samples • trends - - - - NeedMoreData / Backtrack - - - - Feedback Loop: User feedback → Store → Learner → Adjusted decisions - - - - - - - - Design Philosophy - - Zero Vectors - No embedding model - LLM-powered navigation - - Algorithm + LLM - Efficient + Semantic - Hybrid scoring - - Feedback Learning - Continuous improvement - Context-aware adjustments - - Multi-turn Support - Query decomposition - Dependency ordering - - - - - - - - - - - - - - - - - diff --git a/docs/design/client-module.md b/docs/design/client-module.md deleted file mode 100644 index e4ab796b..00000000 --- a/docs/design/client-module.md +++ /dev/null @@ -1,794 +0,0 @@ -# Client Module Refactoring Design - -## Overview - -This document describes the refactoring of the `client` module to achieve a more professional, product-level architecture with clear separation of concerns. - -## Current Problems - -### 1. God Object Anti-pattern -`engine.rs` (600+ lines) handles too many responsibilities: -- Document indexing -- Document retrieval -- Workspace management -- Configuration management -- Format detection -- Page parsing - -### 2. Mixed Abstraction Levels -High-level operations (`query()`) mixed with low-level utilities (`parse_page_range()`). - -### 3. No Session Management -Each operation is independent; no way to maintain context across multiple operations. - -### 4. Missing Event System -No progress callbacks or event hooks for long-running operations. - -### 5. Scattered State Management -State split across `Arc>`, `Arc>`, `Arc`. - ---- - -## Proposed Architecture - -### Module Structure - -``` -src/client/ -├── mod.rs # Re-exports and documentation -├── engine.rs # Core orchestrator (simplified) -├── builder.rs # Builder pattern (enhanced) -├── types.rs # Public API types -├── context.rs # Request context and configuration -├── session.rs # Session management -├── indexer.rs # Document indexing operations -├── retriever.rs # Query and retrieval operations -├── workspace.rs # Workspace operations (CRUD) -└── events.rs # Event system and callbacks -``` - -### Architecture Diagram - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ Client API │ -├─────────────────────────────────────────────────────────────────┤ -│ │ -│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ -│ │ EngineBuilder │───▶│ Engine │◀───│ Session │ │ -│ └──────────────┘ └──────┬───────┘ └──────────────┘ │ -│ │ │ -│ ┌──────────────┼──────────────┐ │ -│ ▼ ▼ ▼ │ -│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ -│ │ Indexer │ │ Retriever │ │ Workspace │ │ -│ │ Client │ │ Client │ │ Client │ │ -│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ -│ │ │ │ │ -│ └───────────────┴───────────────┘ │ -│ │ │ -│ ▼ │ -│ ┌────────────────┐ │ -│ │ Context │ │ -│ │ (Request State)│ │ -│ └────────────────┘ │ -│ │ -│ ┌────────────────┐ │ -│ │ Events │ │ -│ │ (Callbacks) │ │ -│ └────────────────┘ │ -└─────────────────────────────────────────────────────────────────┘ -``` - ---- - -## Component Design - -### 1. Context (`context.rs`) - -Request-scoped configuration and state management. - -```rust -/// Request context for client operations. -pub struct ClientContext { - /// Unique request ID for tracing. - pub request_id: Uuid, - - /// Request-specific configuration overrides. - pub config: RequestContextConfig, - - /// Event emitter for this request. - pub events: EventEmitter, - - /// Request metadata. - pub metadata: HashMap, - - /// Request deadline (for timeout). - pub deadline: Option, -} - -/// Request-specific configuration overrides. -pub struct RequestContextConfig { - /// Override top_k for retrieval. - pub top_k: Option, - - /// Override token budget. - pub token_budget: Option, - - /// Override content format. - pub content_format: Option, - - /// Enable/disable features. - pub features: FeatureFlags, -} - -/// Feature flags for request. -pub struct FeatureFlags { - pub include_summaries: bool, - pub include_content: bool, - pub enable_cache: bool, - pub enable_sufficiency_check: bool, -} -``` - -### 2. Session (`session.rs`) - -Multi-document session management. - -```rust -/// Session for managing multiple document operations. -pub struct Session { - /// Session ID. - pub id: Uuid, - - /// Session configuration. - config: SessionConfig, - - /// Active document contexts. - documents: HashMap, - - /// Shared engine reference. - engine: Engine, - - /// Session statistics. - stats: SessionStats, - - /// Created at timestamp. - created_at: DateTime, -} - -/// Document context within a session. -pub struct DocumentContext { - /// Document ID. - pub doc_id: String, - - /// Preloaded tree (cached). - tree: Option>, - - /// Document metadata. - meta: DocumentMeta, - - /// Access statistics. - access_count: usize, - last_accessed: DateTime, -} - -/// Session configuration. -pub struct SessionConfig { - /// Maximum documents to keep in memory. - pub max_cached_documents: usize, - - /// Preload strategy. - pub preload_strategy: PreloadStrategy, - - /// Cache eviction policy. - pub eviction_policy: EvictionPolicy, -} - -impl Session { - /// Create a new session. - pub fn new(engine: Engine) -> Self; - - /// Index a document into this session. - pub async fn index(&self, path: impl AsRef) -> Result; - - /// Query a document within this session. - pub async fn query(&self, doc_id: &str, question: &str) -> Result; - - /// Query across all documents in session. - pub async fn query_all(&self, question: &str) -> Result>; - - /// Get document tree (cached). - pub fn get_tree(&self, doc_id: &str) -> Result>; - - /// Preload documents for faster access. - pub async fn preload(&self, doc_ids: &[&str]) -> Result<()>; - - /// Clear session cache. - pub fn clear_cache(&self); - - /// Get session statistics. - pub fn stats(&self) -> &SessionStats; -} -``` - -### 3. Indexer Client (`indexer.rs`) - -Document indexing operations. - -```rust -/// Document indexing client. -pub struct IndexerClient { - /// Pipeline executor. - executor: Arc>, - - /// Configuration. - config: IndexerConfig, -} - -/// Indexing configuration. -pub struct IndexerConfig { - /// Default index mode. - pub default_mode: IndexMode, - - /// Summary generation strategy. - pub summary_strategy: SummaryStrategy, - - /// Whether to generate node IDs. - pub generate_ids: bool, - - /// Whether to generate descriptions. - pub generate_descriptions: bool, -} - -impl IndexerClient { - /// Create a new indexer client. - pub fn new(executor: PipelineExecutor) -> Self; - - /// Index a document from file. - pub async fn index_file( - &self, - path: impl AsRef, - options: IndexOptions, - events: &EventEmitter, - ) -> Result; - - /// Index from raw content. - pub async fn index_content( - &self, - content: &str, - format: DocumentFormat, - options: IndexOptions, - ) -> Result; - - /// Detect document format. - pub fn detect_format(&self, path: &Path, options: &IndexOptions) -> Result; - - /// Validate document before indexing. - pub fn validate(&self, path: &Path) -> Result; -} - -/// Indexing events. -pub enum IndexEvent { - /// Started indexing. - Started { path: String }, - - /// Format detected. - FormatDetected { format: DocumentFormat }, - - /// Parsing progress. - ParsingProgress { percent: u8 }, - - /// Tree building complete. - TreeBuilt { node_count: usize }, - - /// Summary generation progress. - SummaryProgress { completed: usize, total: usize }, - - /// Indexing complete. - Complete { doc_id: String }, - - /// Error occurred. - Error { message: String }, -} -``` - -### 4. Retriever Client (`retriever.rs`) - -Query and retrieval operations. - -```rust -/// Document retrieval client. -pub struct RetrieverClient { - /// Pipeline retriever. - retriever: Arc, - - /// Configuration. - config: RetrieverConfig, -} - -/// Retrieval configuration. -pub struct RetrieverConfig { - /// Default top_k. - pub default_top_k: usize, - - /// Default token budget. - pub default_token_budget: usize, - - /// Content aggregator config. - pub content_config: ContentAggregatorConfig, - - /// Enable caching. - pub enable_cache: bool, -} - -impl RetrieverClient { - /// Create a new retriever client. - pub fn new(retriever: PipelineRetriever) -> Self; - - /// Query a document tree. - pub async fn query( - &self, - tree: &DocumentTree, - question: &str, - options: RetrieveOptions, - ctx: &ClientContext, - ) -> Result; - - /// Query with streaming results. - pub async fn query_stream( - &self, - tree: &DocumentTree, - question: &str, - options: RetrieveOptions, - ) -> impl Stream; - - /// Get similar nodes. - pub fn find_similar( - &self, - tree: &DocumentTree, - node_id: NodeId, - top_k: usize, - ) -> Result>; - - /// Get node context (ancestors + siblings). - pub fn get_node_context( - &self, - tree: &DocumentTree, - node_id: NodeId, - depth: usize, - ) -> Result; -} - -/// Query events for streaming. -pub enum QueryEvent { - /// Search started. - SearchStarted { query: String }, - - /// Node visited during search. - NodeVisited { node_id: String, title: String, score: f32 }, - - /// Candidate found. - CandidateFound { node_id: String, score: f32 }, - - /// Sufficiency check result. - SufficiencyCheck { level: SufficiencyLevel, tokens: usize }, - - /// Result ready. - ResultReady { result: RetrievalResult }, - - /// Query complete. - Complete { total_results: usize, confidence: f32 }, -} -``` - -### 5. Workspace Client (`workspace.rs`) - -Document persistence operations. - -```rust -/// Workspace management client. -pub struct WorkspaceClient { - /// Workspace storage. - workspace: Arc>, - - /// Configuration. - config: WorkspaceConfig, -} - -/// Workspace configuration. -pub struct WorkspaceConfig { - /// Auto-save interval (seconds). - pub auto_save_interval: Option, - - /// Maximum cache size. - pub max_cache_size: usize, -} - -impl WorkspaceClient { - /// Create a new workspace client. - pub fn new(workspace: Workspace) -> Self; - - /// Save a document. - pub fn save(&self, doc: &PersistedDocument) -> Result<()>; - - /// Load a document. - pub fn load(&self, doc_id: &str) -> Result>; - - /// Remove a document. - pub fn remove(&self, doc_id: &str) -> Result; - - /// Check if document exists. - pub fn exists(&self, doc_id: &str) -> Result; - - /// List all documents. - pub fn list(&self) -> Result>; - - /// Get document metadata. - pub fn get_meta(&self, doc_id: &str) -> Result>; - - /// Batch operations. - pub fn batch_remove(&self, doc_ids: &[&str]) -> Result; - - /// Clear workspace. - pub fn clear(&self) -> Result; - - /// Get workspace statistics. - pub fn stats(&self) -> WorkspaceStats; -} - -/// Workspace statistics. -pub struct WorkspaceStats { - pub document_count: usize, - pub total_size_bytes: u64, - pub cache_hit_rate: f32, - pub oldest_document: Option>, - pub newest_document: Option>, -} -``` - -### 6. Events (`events.rs`) - -Event system for callbacks and progress reporting. - -```rust -/// Event emitter for client operations. -pub struct EventEmitter { - /// Event handlers. - handlers: Vec>, - - /// Async handlers (for non-blocking events). - async_handlers: Vec>, -} - -/// Event handler trait. -pub trait EventHandler: Send + Sync { - fn handle(&self, event: &Event); -} - -/// Async event handler trait. -#[async_trait] -pub trait AsyncEventHandler: Send + Sync { - async fn handle(&self, event: &Event); -} - -/// Event types. -#[derive(Debug, Clone)] -pub enum Event { - /// Indexing events. - Index(IndexEvent), - - /// Query events. - Query(QueryEvent), - - /// Workspace events. - Workspace(WorkspaceEvent), - - /// Session events. - Session(SessionEvent), -} - -/// Workspace events. -pub enum WorkspaceEvent { - DocumentSaved { doc_id: String }, - DocumentLoaded { doc_id: String, cache_hit: bool }, - DocumentRemoved { doc_id: String }, - WorkspaceCleared { count: usize }, -} - -/// Session events. -pub enum SessionEvent { - SessionCreated { session_id: Uuid }, - DocumentAdded { doc_id: String }, - DocumentEvicted { doc_id: String, reason: EvictionReason }, - SessionClosed { session_id: Uuid }, -} - -impl EventEmitter { - /// Create a new event emitter. - pub fn new() -> Self; - - /// Add a sync handler. - pub fn on(mut self, handler: H) -> Self; - - /// Add an async handler. - pub fn on_async(mut self, handler: Arc) -> Self; - - /// Emit an event. - pub fn emit(&self, event: Event); - - /// Emit an event asynchronously. - pub async fn emit_async(&self, event: Event); -} - -/// Convenience handler builders. -impl EventEmitter { - /// Create handler from closure. - pub fn on_index(self, f: F) -> Self; - - /// Create handler from closure. - pub fn on_query(self, f: F) -> Self; - - /// Create progress callback. - pub fn on_progress(self, f: F) -> Self; -} - -/// Progress information. -pub struct Progress { - pub operation: Operation, - pub current: usize, - pub total: usize, - pub message: String, -} - -pub enum Operation { - Indexing, - Querying, - Loading, - Saving, -} -``` - -### 7. Simplified Engine (`engine.rs`) - -The main orchestrator, now much simpler. - -```rust -/// The main Engine client - orchestrates sub-clients. -pub struct Engine { - /// Configuration. - config: Arc, - - /// Indexer client. - indexer: IndexerClient, - - /// Retriever client. - retriever: RetrieverClient, - - /// Workspace client (optional). - workspace: Option, - - /// Event emitter. - events: EventEmitter, -} - -impl Engine { - /// Create a builder for custom configuration. - pub fn builder() -> EngineBuilder; - - // ============================================================ - // Convenience Methods (delegate to sub-clients) - // ============================================================ - - /// Index a document. - pub async fn index(&self, path: impl AsRef) -> Result { - self.index_with_options(path, IndexOptions::default()).await - } - - /// Index with options. - pub async fn index_with_options( - &self, - path: impl AsRef, - options: IndexOptions, - ) -> Result; - - /// Query a document. - pub async fn query(&self, doc_id: &str, question: &str) -> Result; - - /// Create a session for multi-document operations. - pub fn session(&self) -> Session; - - /// Get the indexer client. - pub fn indexer(&self) -> &IndexerClient; - - /// Get the retriever client. - pub fn retriever(&self) -> &RetrieverClient; - - /// Get the workspace client. - pub fn workspace(&self) -> Option<&WorkspaceClient>; - - /// Get configuration. - pub fn config(&self) -> &Config; - - // ============================================================ - // Document Operations (delegate to workspace) - // ============================================================ - - /// List documents. - pub fn list_documents(&self) -> Vec; - - /// Get document structure. - pub fn get_structure(&self, doc_id: &str) -> Result; - - /// Get page content. - pub fn get_page_content(&self, doc_id: &str, pages: &str) -> Result; - - /// Remove document. - pub fn remove(&self, doc_id: &str) -> Result; - - /// Check existence. - pub fn exists(&self, doc_id: &str) -> Result; -} -``` - ---- - -## API Examples - -### Basic Usage (Same as Before) - -```rust -let client = EngineBuilder::new() - .with_workspace("./workspace") - .build()?; - -// Index -let doc_id = client.index("./document.md").await?; - -// Query -let result = client.query(&doc_id, "What is this?").await?; -``` - -### With Events - -```rust -let client = EngineBuilder::new() - .with_workspace("./workspace") - .with_events( - EventEmitter::new() - .on_index(|e| match e { - IndexEvent::Complete { doc_id } => println!("Indexed: {}", doc_id), - _ => {} - }) - .on_query(|e| match e { - QueryEvent::NodeVisited { title, score, .. } => { - println!("Visited: {} (score: {:.2})", title, score); - } - _ => {} - }) - ) - .build()?; -``` - -### Session-Based Multi-Document - -```rust -let client = EngineBuilder::new() - .with_workspace("./workspace") - .build()?; - -// Create session -let session = client.session(); - -// Index multiple documents -let doc1 = session.index("./doc1.md").await?; -let doc2 = session.index("./doc2.md").await?; -let doc3 = session.index("./doc3.md").await?; - -// Query across all documents -let results = session.query_all("What is the architecture?").await?; - -// Query single document (cached tree) -let result = session.query(&doc1, "Summary?").await?; - -// Session stats -println!("Cache hit rate: {:.2}%", session.stats().cache_hit_rate * 100.0); -``` - -### Streaming Query - -```rust -let client = EngineBuilder::new() - .with_workspace("./workspace") - .build()?; - -// Stream query results -let mut stream = client.retriever() - .query_stream(&tree, "What is X?", RetrieveOptions::default()); - -while let Some(event) = stream.next().await { - match event { - QueryEvent::NodeVisited { title, score, .. } => { - println!("Exploring: {}", title); - } - QueryEvent::ResultReady { result } => { - println!("Found: {}", result.title); - } - QueryEvent::Complete { total_results, confidence } => { - println!("Done: {} results, confidence: {:.2}", total_results, confidence); - } - _ => {} - } -} -``` - -### Request Context - -```rust -let ctx = ClientContext::new() - .with_top_k(10) - .with_token_budget(8000) - .with_deadline(Duration::from_secs(30)); - -let result = client.retriever() - .query(&tree, "complex question", options, &ctx) - .await?; -``` - ---- - -## Migration Path - -### Phase 1: Add New Modules (Non-Breaking) -1. Create `context.rs`, `events.rs` -2. Create `indexer.rs`, `retriever.rs`, `workspace.rs` as wrappers -3. Update `engine.rs` to use sub-clients internally -4. All existing API remains unchanged - -### Phase 2: Add Session Support (Non-Breaking) -1. Add `session.rs` -2. Add `Engine::session()` method -3. Add multi-document query support - -### Phase 3: Enhance Events (Non-Breaking) -1. Add streaming query support -2. Add progress callbacks -3. Add async event handlers - -### Phase 4: Deprecate Old API (Breaking, Future) -1. Mark direct workspace access as deprecated -2. Encourage use of sub-clients -3. Eventually remove deprecated methods - ---- - -## File Structure After Refactoring - -``` -src/client/ -├── mod.rs # ~50 lines - exports and docs -├── engine.rs # ~150 lines - orchestration only -├── builder.rs # ~200 lines - enhanced builder -├── types.rs # ~250 lines - public types -├── context.rs # ~150 lines - request context -├── session.rs # ~200 lines - session management -├── indexer.rs # ~200 lines - indexing ops -├── retriever.rs # ~200 lines - retrieval ops -├── workspace.rs # ~150 lines - workspace ops -└── events.rs # ~200 lines - event system -``` - -Total: ~1750 lines (vs current ~1000 lines, but much better organized) - ---- - -## Benefits - -1. **Single Responsibility**: Each module has one clear purpose -2. **Testability**: Sub-clients can be tested independently -3. **Extensibility**: Easy to add new features without touching Engine -4. **Performance**: Session caching reduces redundant loads -5. **Observability**: Events provide visibility into operations -6. **API Clarity**: Clear separation between indexing, retrieval, and storage -7. **Streaming**: Support for progressive results -8. **Context Management**: Request-scoped configuration diff --git a/docs/design/comparison.svg b/docs/design/comparison.svg deleted file mode 100644 index d78e3ea7..00000000 --- a/docs/design/comparison.svg +++ /dev/null @@ -1,134 +0,0 @@ - - - - - - - Traditional RAG - - - - Document - - - - - - - - - chunk - - - - - - - Chunks - - - - embed - - - - Vector DB - [0.12, 0.45, ...] - [0.33, 0.21, ...] - [0.87, 0.03, ...] - [0.56, 0.78, ...] - ... - - - - Result - Fragment #47 - (no context) - - - - - - ❌ Structure lost - ❌ No context - - - - - - - vs - - - - Vectorless - - - - Document - - - - Root - - - Ch.1 - - - Ch.2 - - - 2.1 - - - 2.2 - - - - query - - - - LLM - Navigator - - - "Ch.2 looks right" - - - "Try section 2.1" - - - - - - - Result - Section 2.1 - + parent context - + sibling context - traceable path - - - ✓ Structure preserved - ✓ Full context - - - - Infrastructure: Vector DB + Embedding Model + Chunking Strategy - Setup time: Hours to Days - - - Infrastructure: Just an LLM API - Setup time: Minutes - - - - - - - - - - - diff --git a/docs/design/content-aggregation.md b/docs/design/content-aggregation.md deleted file mode 100644 index 22a7d7dd..00000000 --- a/docs/design/content-aggregation.md +++ /dev/null @@ -1,361 +0,0 @@ -# Content Aggregation Design - -> Version: 1.0 -> Status: Draft -> Last Updated: 2026-04-04 - -## Overview - -Content Aggregation is the final stage of the retrieval pipeline that transforms candidate nodes into structured, relevant content for the user. This document describes the design for a precision-focused, budget-aware content aggregation system. - -## Problem Statement - -### Current Implementation - -The current `aggregate_content` in `JudgeStage` collects content naively: - -``` -Candidate Node → Node's own content + ALL descendant leaf content -``` - -### Issues - -| Issue | Impact | -|-------|--------| -| **No relevance filtering** | Returns all content from subtree, including irrelevant parts | -| **No token budget** | Large documents may return tens of thousands of tokens | -| **No prioritization** | All leaf content treated equally | -| **Lost structure** | Flat concatenation loses hierarchical context | - -## Design Goals - -1. **Precision First** - Only return truly relevant content -2. **Budget Aware** - Optimize within token constraints -3. **Structure Aware** - Maintain hierarchical context -4. **Incremental** - Support progressive refinement -5. **Explainable** - Traceable selection decisions - -## Architecture - -### High-Level Flow - -``` -┌─────────────────────────────────────────────────────────────┐ -│ Content Aggregator │ -├─────────────────────────────────────────────────────────────┤ -│ │ -│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ -│ │ Relevance │ │ Budget │ │ Structure │ │ -│ │ Scorer │─▶│ Allocator │─▶│ Builder │ │ -│ └──────────────┘ └──────────────┘ └──────────────┘ │ -│ ↑ ↑ ↑ │ -│ │ │ │ │ -│ ┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐ │ -│ │ Query- │ │ Token │ │ Hierarchy │ │ -│ │ Node │ │ Budget │ │ Context │ │ -│ │ Scoring │ │ Config │ │ Assembly │ │ -│ └─────────────┘ └─────────────┘ └─────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────┘ -``` - -### Processing Pipeline - -``` -Candidate Nodes - │ - ▼ -┌─────────────────┐ -│ 1. Collect │ Gather all nodes from candidates + descendants -│ Nodes │ -└────────┬────────┘ - │ - ▼ -┌─────────────────┐ -│ 2. Score │ Compute relevance score for each content chunk -│ Relevance │ -└────────┬────────┘ - │ - ▼ -┌─────────────────┐ -│ 3. Filter │ Remove content below relevance threshold -│ by Score │ -└────────┬────────┘ - │ - ▼ -┌─────────────────┐ -│ 4. Allocate │ Distribute token budget optimally -│ Budget │ -└────────┬────────┘ - │ - ▼ -┌─────────────────┐ -│ 5. Build │ Assemble structured output -│ Structure │ -└────────┬────────┘ - │ - ▼ - Final Content -``` - -## Module Design - -### 1. RelevanceScorer - -Computes fine-grained relevance scores for content. - -```rust -pub struct RelevanceScorer { - query_keywords: Vec, - strategy: ScoringStrategy, -} - -pub enum ScoringStrategy { - /// Fast keyword matching only - KeywordOnly, - /// Keyword + BM25 scoring - KeywordWithBM25, - /// Keyword + LLM reranking - Hybrid { rerank_top_k: usize }, -} - -pub struct ContentRelevance { - pub node_id: NodeId, - pub chunk: ContentChunk, - pub score: f32, - pub components: ScoreComponents, -} - -pub struct ScoreComponents { - pub keyword_score: f32, // Keyword match quality - pub depth_penalty: f32, // Distance from candidate node - pub path_bonus: f32, // Parent node relevance - pub density_score: f32, // Information density -} -``` - -#### Scoring Formula - -``` -final_score = ( - keyword_score * 0.50 + - depth_penalty * 0.20 + - path_bonus * 0.15 + - density_score * 0.15 -).clamp(0.0, 1.0) - -where: - depth_penalty = 0.9^depth // 10% penalty per level - path_bonus = parent_score * 0.2 - density_score = (1 - stopword_ratio) * 0.7 + entity_ratio * 0.3 -``` - -### 2. BudgetAllocator - -Distributes token budget across scored content. - -```rust -pub struct BudgetAllocator { - total_budget: usize, - strategy: AllocationStrategy, -} - -pub enum AllocationStrategy { - /// Select highest-scoring content first - Greedy, - /// Distribute proportionally to scores - Proportional, - /// Ensure each depth level has representation - Hierarchical { min_per_level: f32 }, -} - -pub struct AllocationResult { - pub selected: Vec, - pub tokens_used: usize, - pub remaining_budget: usize, -} - -pub struct SelectedContent { - pub node_id: NodeId, - pub content: String, - pub tokens: usize, - pub score: f32, - pub truncation: Option, -} -``` - -#### Hierarchical Allocation - -``` -For each depth level (0 to max_depth): - 1. Sort content by score - 2. Allocate up to min_per_level budget - 3. Continue until level budget exhausted - 4. Move to next level - -Benefits: -- Ensures context from all levels -- Prevents shallow-only or deep-only results -- Maintains document structure awareness -``` - -### 3. StructureBuilder - -Assembles selected content into structured output. - -```rust -pub struct StructureBuilder { - format: OutputFormat, - include_metadata: bool, -} - -pub enum OutputFormat { - Markdown, - Json, - Tree, - Flat, -} - -pub struct StructuredContent { - pub content: String, - pub structure: Option, - pub metadata: ContentMetadata, -} -``` - -#### Markdown Output Format - -```markdown -## Parent Section -Parent content here... - -### Child Section A -Child A content here... - -### Child Section B -Child B content here... -``` - -## Configuration - -```toml -[retrieval.content] -# Maximum tokens to return -token_budget = 4000 - -# Minimum relevance score (0.0 - 1.0) -min_relevance_score = 0.3 - -# Scoring strategy: "keyword_only" | "keyword_bm25" | "hybrid" -scoring_strategy = "keyword_bm25" - -# Output format: "markdown" | "json" | "tree" -output_format = "markdown" - -# Include relevance scores in output -include_scores = false - -# Hierarchical allocation minimum per level -hierarchical_min_per_level = 0.1 -``` - -## Integration Points - -### JudgeStage Integration - -```rust -impl JudgeStage { - pub fn with_content_aggregator(mut self, config: ContentAggregatorConfig) -> Self { - self.content_aggregator = Some(ContentAggregator::new(config)); - self - } - - fn aggregate_content(&self, ctx: &PipelineContext) -> (String, usize) { - if let Some(aggregator) = &self.content_aggregator { - aggregator.aggregate(&ctx.candidates, &ctx.tree, &ctx.query) - } else { - // Fallback to legacy behavior - self.aggregate_content_legacy(ctx) - } - } -} -``` - -### RetrieveOptions Extension - -```rust -impl RetrieveOptions { - pub fn with_content_config(mut self, config: ContentAggregatorConfig) -> Self { - self.content_config = Some(config); - self - } -} -``` - -## Performance Characteristics - -### Latency by Strategy - -| Strategy | Latency | Precision | Use Case | -|----------|---------|-----------|----------| -| `KeywordOnly` | ~1ms | Medium | Quick preview | -| `KeywordWithBM25` | ~5ms | High | Default choice | -| `Hybrid` | ~200ms | Highest | Precision queries | - -### Memory Usage - -- Scorer: O(n) where n = total content length -- Allocator: O(m) where m = number of chunks -- Builder: O(k) where k = selected content size - -## Future Enhancements - -1. **Semantic Chunking** - Split content by semantic boundaries, not just nodes -2. **LLM Reranking** - Use LLM to rerank top-k chunks -3. **Query-Aware Truncation** - Truncate based on query relevance, not just length -4. **Caching** - Cache aggregation results for repeated queries -5. **Streaming** - Stream content as it's selected - -## File Structure - -``` -src/retrieval/content/ -├── mod.rs # Module entry point -├── aggregator.rs # Main aggregator logic -├── scorer.rs # Relevance scoring -├── budget.rs # Token budget allocation -├── builder.rs # Structured output building -├── truncation.rs # Smart truncation utilities -└── config.rs # Configuration types -``` - -## Implementation Priority - -| Phase | Component | Priority | -|-------|-----------|----------| -| P0 | `RelevanceScorer` (keyword) | High | -| P0 | `BudgetAllocator` (greedy) | High | -| P1 | `StructureBuilder` (markdown) | Medium | -| P1 | BM25 scoring | Medium | -| P2 | Hybrid strategy (LLM rerank) | Low | -| P2 | Caching layer | Low | - -## Testing Strategy - -### Unit Tests - -- Scorer: Test keyword extraction, BM25 calculation, density scoring -- Allocator: Test budget distribution, truncation, edge cases -- Builder: Test output formats, structure preservation - -### Integration Tests - -- End-to-end aggregation with real documents -- Performance benchmarks -- Token budget compliance - -### Quality Metrics - -- Precision@k: Relevance of top-k selected chunks -- Recall: Coverage of relevant content -- Latency: P50, P95, P99 response times diff --git a/docs/design/feedback-learning.md b/docs/design/feedback-learning.md deleted file mode 100644 index 3bb90076..00000000 --- a/docs/design/feedback-learning.md +++ /dev/null @@ -1,587 +0,0 @@ -# Feedback Learning Design Document - -> Pilot Feedback Learning System - Continuously improving decisions from user feedback - -## Overview - -Feedback Learning is Pilot's learning subsystem that continuously optimizes Pilot's decision-making capabilities by collecting user feedback on retrieval results. The system tracks decision accuracy across different scenarios and adjusts confidence levels and strategies for subsequent decisions accordingly. - -### Design Goals - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ Design Goals │ -├─────────────────────────────────────────────────────────────────┤ -│ 1. Collect Feedback - Record user ratings on retrieval results │ -│ 2. Learn Patterns - Identify scenarios where Pilot performs │ -│ well or poorly │ -│ 3. Adjust Decisions - Modify confidence and strategies based │ -│ on historical performance │ -│ 4. Continuous Improvement - Decision quality improves over time │ -│ as data accumulates │ -└─────────────────────────────────────────────────────────────────┘ -``` - ---- - -## 1. Overall Architecture - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Feedback Learning System Architecture │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ ┌───────────────────────────────────────────────────────────────────────┐ │ -│ │ Data Flow │ │ -│ │ │ │ -│ │ Retrieval Complete │ │ -│ │ │ │ │ -│ │ ▼ │ │ -│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ -│ │ │ Feedback │────▶│ Feedback │────▶│ Pilot │ │ │ -│ │ │ Record │ │ Store │ │ Learner │ │ │ -│ │ └─────────────┘ └─────────────┘ └──────┬──────┘ │ │ -│ │ │ │ │ -│ │ ▼ │ │ -│ │ ┌─────────────┐ │ │ -│ │ │ Decision │ │ │ -│ │ │ Adjustment │ │ │ -│ │ └─────────────┘ │ │ -│ │ │ │ │ -│ │ ▼ │ │ -│ │ Next Retrieval Decision │ │ -│ │ │ │ -│ └───────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - ---- - -## 2. Core Components - -### 2.1 FeedbackRecord - Feedback Record - -```rust -/// Feedback record -pub struct FeedbackRecord { - /// Unique feedback ID - pub id: FeedbackId, - /// Associated decision ID - pub decision_id: DecisionId, - /// Whether the decision was correct - pub was_correct: bool, - /// Pilot's confidence at that time - pub pilot_confidence: f64, - /// Intervention point type - pub intervention_point: InterventionPoint, - /// Query hash (for aggregating similar queries) - pub query_hash: u64, - /// Path hash (for aggregating similar paths) - pub path_hash: u64, - /// Timestamp - pub timestamp_ms: u64, - /// Optional user comment - pub comment: Option, -} -``` - -### 2.2 FeedbackStore - Feedback Storage - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ FeedbackStore Architecture │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ FeedbackStore │ │ -│ │ │ │ -│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ -│ │ │ records │ │ intervention_ │ │ query_stats │ │ │ -│ │ │ Vec │ │ stats │ │ HashMap │ │ │ -│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ -│ │ │ │ -│ │ ┌─────────────────┐ │ │ -│ │ │ path_stats │ │ │ -│ │ │ HashMap │ │ │ -│ │ └─────────────────┘ │ │ -│ │ │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ Statistics Dimensions: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ 1. Aggregate by InterventionPoint │ │ -│ │ - Accuracy for each: START / FORK / BACKTRACK / EVALUATE │ │ -│ │ │ │ -│ │ 2. Aggregate by Query │ │ -│ │ - Historical performance for similar queries │ │ -│ │ │ │ -│ │ 3. Aggregate by Path │ │ -│ │ - Historical performance for similar paths │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -### 2.3 PilotLearner - Learner - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ PilotLearner Workflow │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ Input: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ - intervention_point: Current intervention point type │ │ -│ │ - query_hash: Hash value of the query │ │ -│ │ - path_hash: Hash value of the path │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ Query Historical Statistics: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ 1. Get overall accuracy for intervention_point │ │ -│ │ 2. Get specific accuracy for query_hash (if available) │ │ -│ │ 3. Get specific accuracy for path_hash (if available) │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ Output DecisionAdjustment: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ pub struct DecisionAdjustment { │ │ -│ │ /// Confidence adjustment (added to Pilot confidence) │ │ -│ │ pub confidence_delta: f64, │ │ -│ │ /// Whether to skip intervention (trust algorithm) │ │ -│ │ pub skip_intervention: bool, │ │ -│ │ /// Algorithm weight vs LLM weight │ │ -│ │ pub algorithm_weight: f64, │ │ -│ │ } │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - ---- - -## 3. Learning Strategies - -### 3.1 Accuracy Thresholds - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Accuracy Threshold Strategy │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ Configuration Parameters: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ min_samples: 10 // Minimum samples before adjusting │ │ -│ │ high_accuracy_threshold: 0.8 // High accuracy threshold │ │ -│ │ low_accuracy_threshold: 0.5 // Low accuracy threshold │ │ -│ │ max_confidence_delta: 0.2 // Maximum confidence adjustment │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ Decision Logic: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ │ │ -│ │ if accuracy >= high_accuracy_threshold (0.8): │ │ -│ │ // High accuracy: trust LLM, boost confidence │ │ -│ │ confidence_delta = +0.2 │ │ -│ │ algorithm_weight = 0.3 // More reliance on LLM │ │ -│ │ │ │ -│ │ elif accuracy <= low_accuracy_threshold (0.5): │ │ -│ │ // Low accuracy: trust algorithm, reduce confidence │ │ -│ │ confidence_delta = -0.2 │ │ -│ │ algorithm_weight = 0.7 // More reliance on algorithm │ │ -│ │ │ │ -│ │ if accuracy < 0.3: │ │ -│ │ // Very low: skip LLM call, use algorithm only │ │ -│ │ skip_intervention = true │ │ -│ │ │ │ -│ │ else: │ │ -│ │ // Medium accuracy: keep defaults │ │ -│ │ confidence_delta = 0.0 │ │ -│ │ algorithm_weight = 0.5 │ │ -│ │ │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -### 3.2 Multi-Layer Statistics Fusion - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Multi-Layer Statistics Fusion │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ Three-Layer Statistics: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ │ │ -│ │ Layer 1: InterventionPoint Level (Coarse-grained) │ │ -│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ -│ │ │ Example: FORK point overall accuracy = 0.75 │ │ │ -│ │ │ Impact: Base adjustment │ │ │ -│ │ └─────────────────────────────────────────────────────────────┘ │ │ -│ │ │ │ -│ │ Layer 2: Query Level (Medium-grained) │ │ -│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ -│ │ │ Example: Similar query accuracy = 0.85 │ │ │ -│ │ │ Impact: If higher than overall, +0.05 confidence │ │ │ -│ │ └─────────────────────────────────────────────────────────────┘ │ │ -│ │ │ │ -│ │ Layer 3: Path Level (Fine-grained) │ │ -│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ -│ │ │ Example: Similar path accuracy = 0.92 │ │ │ -│ │ │ Impact: If very high, +0.05 confidence │ │ │ -│ │ └─────────────────────────────────────────────────────────────┘ │ │ -│ │ │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ Fusion Example: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ │ │ -│ │ Scenario: FORK point, similar query, similar path │ │ -│ │ │ │ -│ │ 1. FORK overall accuracy 0.75 → confidence_delta = +0.1 │ │ -│ │ 2. Query-specific accuracy 0.85 > 0.75 → confidence_delta += 0.05 │ │ -│ │ 3. Path-specific accuracy 0.92 > 0.9 → confidence_delta += 0.05 │ │ -│ │ │ │ -│ │ Final: confidence_delta = +0.2 (reached maximum) │ │ -│ │ │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - ---- - -## 4. Integration with LlmPilot - -### 4.1 Integration Points - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ LlmPilot and Learner Integration │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ LlmPilot Structure: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ pub struct LlmPilot { │ │ -│ │ client: LlmClient, │ │ -│ │ executor: Option>, │ │ -│ │ config: PilotConfig, │ │ -│ │ budget: BudgetController, │ │ -│ │ context_builder: ContextBuilder, │ │ -│ │ prompt_builder: PromptBuilder, │ │ -│ │ response_parser: ResponseParser, │ │ -│ │ learner: Option>, // ← Feedback learner │ │ -│ │ } │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ Key Methods: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ │ │ -│ │ // Add learner │ │ -│ │ pub fn with_learner(self, learner: Arc) -> Self │ │ -│ │ │ │ -│ │ // Create learner from feedback store │ │ -│ │ pub fn with_feedback_store(self, store: Arc) -> Self│ │ -│ │ │ │ -│ │ // Record feedback │ │ -│ │ pub fn record_feedback(&self, record: FeedbackRecord) │ │ -│ │ │ │ -│ │ // Get learner (read-only) │ │ -│ │ pub fn learner(&self) -> Option<&PilotLearner> │ │ -│ │ │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -### 4.2 Decision Flow - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Decision Flow with Learning │ -└─────────────────────────────────────────────────────────────────────────────┘ - - ┌─────────────────┐ - │ call_llm() │ - └────────┬────────┘ - │ - ▼ - ┌──────────────────────────────┐ - │ 1. Build Context (Builder) │ - │ - query_section │ - │ - path_section │ - │ - candidates_section │ - └──────────────┬───────────────┘ - │ - ▼ - ┌──────────────────────────────┐ - │ 2. Get Learner Adjustment │ - │ if learner.is_some() { │ - │ query_hash = ctx.hash() │ - │ path_hash = ctx.hash() │ - │ adjustment = learner │ - │ .get_adjustment( │ - │ point, │ - │ query_hash, │ - │ path_hash │ - │ ) │ - │ } │ - └──────────────┬───────────────┘ - │ - ▼ - ┌──────────────────────────────┐ - │ 3. Check Skip Intervention │ - │ if adjustment.skip { │ - │ return default_decision │ - │ } │ - └──────────────┬───────────────┘ - │ - ▼ - ┌──────────────────────────────┐ - │ 4. Call LLM for Decision │ - │ decision = llm.complete() │ - └──────────────┬───────────────┘ - │ - ▼ - ┌──────────────────────────────┐ - │ 5. Apply Learner Adjustment │ - │ decision.confidence += │ - │ adjustment.confidence │ - │ .delta │ - └──────────────┬───────────────┘ - │ - ▼ - ┌─────────────────┐ - │ Return Adjusted │ - │ Decision │ - └─────────────────┘ -``` - ---- - -## 5. Usage Examples - -### 5.1 Basic Usage - -```rust -use std::sync::Arc; -use vectorless::retrieval::pilot::{ - LlmPilot, PilotConfig, - FeedbackStore, FeedbackRecord, PilotLearner, -}; -use vectorless::llm::LlmClient; - -// 1. Create feedback store -let store = Arc::new(FeedbackStore::in_memory()); - -// 2. Create Pilot with learner -let client = LlmClient::for_model("gpt-4o-mini"); -let pilot = LlmPilot::new(client, PilotConfig::default()) - .with_feedback_store(store.clone()); - -// 3. Execute retrieval (Pilot automatically applies learning adjustments) -let decision = pilot.decide(&state).await; - -// 4. Record user feedback -let record = FeedbackRecord::new( - decision_id, - was_correct, // User rating - decision.confidence as f64, - InterventionPoint::Fork, - query_hash, - path_hash, -); -pilot.record_feedback(record); - -// 5. Subsequent retrievals automatically leverage historical feedback -``` - -### 5.2 Persisting Feedback - -```rust -use vectorless::retrieval::pilot::feedback::FeedbackStoreConfig; - -// Create feedback store with persistence -let config = FeedbackStoreConfig::with_persistence("./data/feedback.json"); -let store = Arc::new(FeedbackStore::new(config)); - -// Load historical feedback at startup -store.load()?; - -// Persist periodically -store.persist()?; -``` - -### 5.3 Viewing Learning Effects - -```rust -// Get overall accuracy -let accuracy = learner.overall_accuracy(); -println!("Overall accuracy: {:.2}%", accuracy * 100.0); - -// Get statistics by intervention point -let stats = store.intervention_stats(); -println!("Fork accuracy: {:.2}%", stats.fork.accuracy() * 100.0); -println!("Start accuracy: {:.2}%", stats.start.accuracy() * 100.0); - -// Check if sufficient data exists -if learner.has_sufficient_data() { - println!("Learner has sufficient data for adjustments"); -} -``` - ---- - -## 6. Configuration Options - -```rust -/// Feedback store configuration -pub struct FeedbackStoreConfig { - /// Maximum number of records (memory limit) - pub max_records: usize, - /// Whether to persist - pub persist: bool, - /// Persistence path - pub storage_path: Option, -} - -/// Learner configuration -pub struct LearnerConfig { - /// Minimum samples (no adjustment below this) - pub min_samples: u64, - /// High accuracy threshold - pub high_accuracy_threshold: f64, - /// Low accuracy threshold - pub low_accuracy_threshold: f64, - /// Maximum confidence adjustment - pub max_confidence_delta: f64, -} - -impl Default for LearnerConfig { - fn default() -> Self { - Self { - min_samples: 10, - high_accuracy_threshold: 0.8, - low_accuracy_threshold: 0.5, - max_confidence_delta: 0.2, - } - } -} -``` - ---- - -## 7. Implementation Details - -### 7.1 Hash Calculation - -```rust -impl PilotContext { - /// Calculate query hash (for aggregating similar queries) - pub fn query_hash(&self) -> u64 { - use std::collections::hash_map::DefaultHasher; - use std::hash::{Hash, Hasher}; - let mut hasher = DefaultHasher::new(); - self.query_section.hash(&mut hasher); - hasher.finish() - } - - /// Calculate path hash (for aggregating similar paths) - pub fn path_hash(&self) -> u64 { - use std::collections::hash_map::DefaultHasher; - use std::hash::{Hash, Hasher}; - let mut hasher = DefaultHasher::new(); - self.path_section.hash(&mut hasher); - hasher.finish() - } -} -``` - -### 7.2 Statistics Calculation - -```rust -impl ContextStats { - /// Calculate accuracy - pub fn accuracy(&self) -> f64 { - if self.total == 0 { - 0.0 - } else { - self.correct as f64 / self.total as f64 - } - } - - /// Record new feedback (incremental update) - fn record(&mut self, was_correct: bool, confidence: f64) { - self.total += 1; - if was_correct { - self.correct += 1; - // Incremental update of average confidence - self.avg_confidence_correct = - (self.avg_confidence_correct * (self.correct - 1) as f64 + confidence) - / self.correct as f64; - } else { - let incorrect = self.total - self.correct; - self.avg_confidence_incorrect = - (self.avg_confidence_incorrect * (incorrect - 1) as f64 + confidence) - / incorrect as f64; - } - } -} -``` - ---- - -## 8. Future Extensions - -### 8.1 Potential Improvements - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Future Extension Directions │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ 1. Semantic Similarity Aggregation │ -│ ┌─────────────────────────────────────────────────────────────────┐ │ -│ │ Current: Aggregate using exact hash │ │ -│ │ Future: Use embeddings to calculate semantic similarity, │ │ -│ │ aggregate semantically similar queries │ │ -│ └─────────────────────────────────────────────────────────────────┘ │ -│ │ -│ 2. Time Decay │ -│ ┌─────────────────────────────────────────────────────────────────┐ │ -│ │ Current: All historical feedback has equal weight │ │ -│ │ Future: Recent feedback has higher weight, old feedback │ │ -│ │ gradually decays │ │ -│ └─────────────────────────────────────────────────────────────────┘ │ -│ │ -│ 3. Online Learning │ -│ ┌─────────────────────────────────────────────────────────────────┐ │ -│ │ Current: Offline analysis, online application │ │ -│ │ Future: Real-time model parameter updates │ │ -│ └─────────────────────────────────────────────────────────────────┘ │ -│ │ -│ 4. Personalized Learning │ -│ ┌─────────────────────────────────────────────────────────────────┐ │ -│ │ Current: Global learning │ │ -│ │ Future: Learn separately per user/scenario │ │ -│ └─────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - ---- - -## 9. Code Structure - -``` -src/retrieval/pilot/ -├── mod.rs # Module entry point -├── feedback.rs # FeedbackStore, PilotLearner implementation -├── llm_pilot.rs # LlmPilot (integrates learner) -├── builder.rs # ContextBuilder (adds hash methods) -└── ... -``` diff --git a/docs/design/how-it-works.svg b/docs/design/how-it-works.svg deleted file mode 100644 index 62f4d139..00000000 --- a/docs/design/how-it-works.svg +++ /dev/null @@ -1,98 +0,0 @@ - - - - - - - 1. Your Document - - - - manual.pdf - - - - - - PDF, MD, DOCX - HTML, Text - - - - - - - 2. Build Tree - - - - 📖 Technical Manual - - - Ch.1 Introduction - - - Ch.2 Architecture - - - 2.1 System - - - 2.2 Implementation - - - - - - - 3. Query - - - - "How do I - reset?" - - - - 🧠 LLM Navigator - "Check Chapter 4" - "Try section 4.2" - - - - - - - 4. Result - - - - Section 4.2 - ## Reset Procedure - To reset, hold the - power button for... - ... - - - Path: Ch.4 → 4.2 - + surrounding context - - - - - 💡 Like reading a table of contents, then going to the right chapter — instead of searching every word in the book. - - - The LLM navigates the tree structure, just like a human would. - - - - - - - - - - - - diff --git a/docs/design/logo-horizontal.svg b/docs/design/logo-horizontal.svg deleted file mode 100644 index 5e31759e..00000000 --- a/docs/design/logo-horizontal.svg +++ /dev/null @@ -1,20 +0,0 @@ - - - - - - - - - - -vectorless - - - diff --git a/docs/design/lovable-vectorless.png b/docs/design/lovable-vectorless.png new file mode 100644 index 00000000..40bb7047 Binary files /dev/null and b/docs/design/lovable-vectorless.png differ diff --git a/docs/design/memo.md b/docs/design/memo.md deleted file mode 100644 index 99d6f851..00000000 --- a/docs/design/memo.md +++ /dev/null @@ -1,314 +0,0 @@ -# LLM Memoization System - -## Overview - -The memoization system provides intelligent caching for expensive LLM operations, reducing API costs and latency while maintaining semantic correctness. - -## Architecture - -``` -┌─────────────────────────────────────────────────────────────────────┐ -│ Memoization Layer │ -├─────────────────────────────────────────────────────────────────────┤ -│ │ -│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ -│ │ Engine │───▶│ Retriever │───▶│ LlmPilot │ │ -│ │ Builder │ │ Pipeline │ │ │ │ -│ └──────────────┘ └──────────────┘ └──────────────┘ │ -│ │ │ │ │ -│ └───────────────────┴───────────────────┘ │ -│ │ │ -│ ┌────────▼────────┐ │ -│ │ MemoStore │ │ -│ │ │ │ -│ │ ┌───────────┐ │ │ -│ │ │ LRU Cache │ │ │ -│ │ └───────────┘ │ │ -│ │ ┌───────────┐ │ │ -│ │ │ Stats │ │ │ -│ │ └───────────┘ │ │ -│ │ ┌───────────┐ │ │ -│ │ │ TTL │ │ │ -│ │ └───────────┘ │ │ -│ └─────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────┘ -``` - -## Key Components - -### MemoKey - -Content-addressed cache key that ensures cache hits only occur when inputs are semantically identical. - -```rust -pub struct MemoKey { - /// Type of operation (Summary, PilotDecision, QueryAnalysis, etc.) - pub op_type: MemoOpType, - - /// Fingerprint of the input content (BLAKE2b-128) - pub input_fp: Fingerprint, - - /// Model identifier for cache invalidation when model changes - pub model_id: Option, - - /// Version for cache invalidation when algorithm changes - pub version: u32, - - /// Additional context fingerprint (e.g., navigation context for pilot) - pub context_fp: Fingerprint, -} -``` - -### MemoStore - -Thread-safe LRU cache with TTL expiration and optional disk persistence. - -```rust -pub struct MemoStore { - cache: Arc>>, - stats: Arc>, - ttl: Duration, - model_id: Option, - version: u32, -} -``` - -**Features:** -- LRU eviction policy (default: 10,000 entries) -- TTL-based expiration (default: 7 days) -- Optional disk persistence (JSON format) -- Thread-safe access via `parking_lot::RwLock` - -### Integration Points - -| Component | Operation Type | Description | -|-----------|---------------|-------------| -| `LlmSummaryGenerator` | `Summary` | Node summary generation | -| `LlmPilot` | `PilotDecision` | Navigation decision caching | -| Query Analyzer | `QueryAnalysis` | Query complexity/intent analysis | -| Content Extractor | `Extraction` | Structured data extraction | - -## Design Principles - -### 1. Layered Architecture - -Each layer can be independently configured and tested: - -``` -Engine → PipelineRetriever → LlmPilot → MemoStore -``` - -Benefits: -- `MemoStore` can be reused by multiple components -- Each layer has single responsibility -- Easy to mock for testing - -### 2. Non-Intrusive Integration - -Memoization is optional and doesn't break existing APIs: - -```rust -// Without memoization (works as before) -let pilot = LlmPilot::new(client, config); - -// With memoization (opt-in) -let pilot = LlmPilot::new(client, config) - .with_memo_store(store); -``` - -### 3. Smart Cache Key Design - -Cache keys include semantic context for precise invalidation: - -```rust -// Key automatically invalidates when: -// - Model changes (model_id field) -// - Algorithm version changes (version field) -// - Input content changes (input_fp field) -// - Navigation context changes (context_fp field) -``` - -### 4. Cost Tracking - -The system tracks savings to quantify the value of caching: - -```rust -pub struct MemoStats { - pub entries: usize, - pub hits: u64, - pub misses: u64, - pub tokens_saved: u64, - pub cost_saved: f64, -} - -impl MemoStats { - pub fn hit_rate(&self) -> f64 { - let total = self.hits + self.misses; - if total == 0 { 0.0 } else { self.hits as f64 / total as f64 } - } -} -``` - -### 5. Flexible Invalidation Strategies - -```rust -// Time-based (automatic) -store.with_ttl(Duration::days(7)) - -// By operation type -store.invalidate_by_op_type(MemoOpType::PilotDecision) - -// By model prefix -store.invalidate_by_model_prefix("gpt-4") - -// Manual -store.remove(&key) -store.clear() -``` - -## Usage Examples - -### Basic Setup - -```rust -use vectorless::memo::MemoStore; -use chrono::Duration; - -// Create with custom settings -let store = MemoStore::new() - .with_ttl(Duration::days(7)) - .with_model("gpt-4o") - .with_version(1); -``` - -### With Engine Builder - -```rust -use vectorless::client::EngineBuilder; - -// Option 1: Custom memo store -let memo_store = MemoStore::new() - .with_ttl(Duration::days(7)) - .with_model("gpt-4o"); - -let engine = EngineBuilder::new() - .with_workspace("./data") - .with_memo_store(memo_store) - .with_openai(api_key) - .build() - .await?; - -// Option 2: Default (auto-created with config model) -let engine = EngineBuilder::new() - .with_workspace("./data") - .with_openai(api_key) - .build() - .await?; -``` - -### Monitoring Cache Performance - -```rust -// Async stats (includes all metrics) -let stats = store.stats().await; -println!("Hit rate: {:.2}%", stats.hit_rate() * 100.0); -println!("Tokens saved: {}", stats.tokens_saved); - -// Sync snapshot (for monitoring without async) -let stats = store.stats_snapshot(); -println!("Cache entries: {}", stats.entries); -``` - -### Cache Invalidation - -```rust -// When switching models -store.invalidate_by_model_prefix("gpt-3.5"); - -// When algorithm changes -store.invalidate_by_op_type(MemoOpType::PilotDecision); - -// Manual pruning of expired entries -let removed = store.prune_expired(); -``` - -### Persistence - -```rust -// Save to disk -store.save(Path::new("./cache/memo.json")).await?; - -// Load from disk (on startup) -store.load(Path::new("./cache/memo.json")).await?; -``` - -## Performance Characteristics - -### Concurrency - -| Component | Lock Type | Rationale | -|-----------|-----------|-----------| -| LRU Cache | `parking_lot::RwLock` | High-performance, allows concurrent reads | -| Statistics | `tokio::sync::RwLock` | Async-compatible for integration | -| Atomic Stats | `AtomicU64` | Lock-free for hot paths | - -### Memory - -- Default capacity: 10,000 entries -- Per-entry overhead: ~200-500 bytes (depending on cached value size) -- Estimated memory: 2-5 MB at full capacity - -### Latency - -| Operation | Typical Latency | -|-----------|-----------------| -| Cache hit | < 1 µs | -| Cache miss (no compute) | < 5 µs | -| Cache miss (with LLM) | 100-2000 ms | - -## Cost Savings Estimation - -### Typical Document Retrieval Scenario - -| Scenario | Without Cache | With Cache | Savings | -|----------|---------------|------------|---------| -| First query | 5-10 LLM calls | 5-10 LLM calls | 0% | -| Repeated query | 5-10 LLM calls | 0-1 LLM calls | **80-100%** | -| Similar query | 5-10 LLM calls | 2-3 LLM calls | **50-70%** | - -### Token Savings Example - -```rust -// Assuming GPT-4 pricing: $0.03 / 1K input tokens, $0.06 / 1K output tokens -// Average Pilot decision: 500 input tokens, 100 output tokens - -// Without cache (100 queries): -// Cost = 100 * (500 * 0.03/1000 + 100 * 0.06/1000) = $2.10 - -// With 80% hit rate: -// Cost = 20 * $0.021 = $0.42 -// Savings = $1.68 (80%) -``` - -## Future Improvements - -### Potential Enhancements - -1. **Semantic Cache Keys**: Use embedding similarity for fuzzy matching -2. **Distributed Cache**: Share cache across multiple instances via Redis -3. **Compression**: Compress cached values for large responses -4. **Warm-up**: Pre-populate cache with common patterns -5. **Analytics Dashboard**: Real-time visualization of cache performance - -### Implementation Notes - -- Consider using `AtomicU64` for all stats to eliminate async lock overhead -- Cache `MemoKey::fingerprint()` result for frequently used keys -- Add automatic periodic persistence with configurable interval - -## Related Documentation - -- [Fingerprint System](./fingerprint.md) - Content-addressed hashing -- [Incremental Indexing](./incremental.md) - Change detection for reindexing -- [Pilot Architecture](./pilot.md) - LLM-based navigation intelligence diff --git a/docs/design/opt.md b/docs/design/opt.md deleted file mode 100644 index 0840e197..00000000 --- a/docs/design/opt.md +++ /dev/null @@ -1,414 +0,0 @@ -# Phase 2: Performance Optimization Design - -## Overview - -This document outlines the performance optimization strategies for vectorless v0.3.0, targeting millisecond-level response times. The optimizations are prioritized based on infrastructure readiness and expected impact. - -## Priority Order - -| Priority | Task | Status | Estimated Effort | -|----------|------|--------|------------------| -| 1 | Cache Strategy Optimization | **Ready** | 1 day | -| 2 | Incremental Indexing Optimization | **Ready** | 1 day | -| 3 | Parallel Retrieval Optimization | Needs baseline | 2 days | -| 4 | Memory Footprint Optimization | Needs evaluation | 2 days | - ---- - -## 1. Cache Strategy Optimization - -### Current State - -The `MemoStore` is now integrated with `LlmPilot` for caching navigation decisions. However, cache hit rates can be improved through smarter caching strategies. - -### Problem Statement - -- Cache keys are based on exact content fingerprints -- Similar queries with slightly different phrasing cause cache misses -- No semantic similarity matching -- Cache warming is manual - -### Proposed Improvements - -#### 1.1 Semantic Cache Keys - -Instead of exact fingerprint matching, use semantic similarity for cache lookups: - -``` -Current: query_fp == cached_query_fp → hit -Proposed: similarity(query_embedding, cached_embedding) > threshold → hit -``` - -**Approach:** -- Pre-compute embeddings for cached queries -- Use cosine similarity or dot product for matching -- Threshold: 0.85+ similarity for cache hit -- Store top-k similar queries for approximate matching - -**Benefits:** -- Higher hit rate for semantically equivalent queries -- Reduced LLM calls for similar user questions - -#### 1.2 Cache Warming - -Pre-populate cache with common query patterns: - -**Approach:** -- Analyze historical query logs -- Identify top-N most frequent query patterns -- Pre-compute and cache Pilot decisions for common document structures -- Support configurable warm-up on engine startup - -**Configuration:** -```toml -[memo] -warmup_enabled = true -warmup_top_queries = 100 -warmup_on_startup = true -``` - -#### 1.3 Adaptive TTL - -Adjust TTL based on content stability: - -**Approach:** -- Static content (documentation): longer TTL (30 days) -- Dynamic content (news, logs): shorter TTL (1 day) -- Track content change frequency per document -- Adjust TTL dynamically based on change history - -#### 1.4 Multi-Level Caching - -Implement hierarchical caching: - -``` -L1: In-memory LRU (current MemoStore) - microseconds -L2: Local disk (persisted cache) - milliseconds -L3: Redis (distributed cache) - milliseconds -``` - -**Use Cases:** -- L1: Single-session hot data -- L2: Cross-session persistence -- L3: Multi-instance sharing - -### Metrics to Track - -| Metric | Current | Target | -|--------|---------|--------| -| Hit rate (repeated queries) | ~50% | **90%+** | -| Hit rate (similar queries) | 0% | **60%+** | -| Cache lookup latency | <1µs | <1µs | -| Memory per entry | ~500 bytes | ~300 bytes | - ---- - -## 2. Incremental Indexing Optimization - -### Current State - -The fingerprint system (`NodeFingerprint`) is implemented and can detect subtree-level changes. However, the indexer still reprocesses entire documents on updates. - -### Problem Statement - -- Full document reprocessing on any change -- No partial tree updates -- Wasted LLM calls for unchanged sections - -### Proposed Improvements - -#### 2.1 Subtree-Level Updates - -Only reprocess changed subtrees: - -**Approach:** -1. Load existing document tree and fingerprints -2. Parse new document, compute new fingerprints -3. Compare `NodeFingerprint` at each level -4. Only reprocess nodes where `content_changed() == true` -5. Propagate `subtree_fp` changes upward - -**Detection Logic:** -``` -if node_fp.content_changed(): - → Regenerate summary for this node -if node_fp.only_descendants_changed(): - → Skip this node, process children only -if node_fp.subtree_changed(): - → Update ancestor subtree fingerprints -``` - -#### 2.2 Lazy Summary Regeneration - -Defer summary regeneration until needed: - -**Approach:** -- Mark nodes with `summary_stale = true` on content change -- Regenerate summaries lazily on first query access -- Use MemoStore to cache regenerated summaries -- Track staleness in `DocumentChangeInfo` - -**Benefits:** -- Fast document updates (no immediate LLM calls) -- Spread LLM cost over time -- Better user experience for large documents - -#### 2.3 Batch Processing - -Process multiple changed documents efficiently: - -**Approach:** -- Collect changed documents into batches -- Group similar content types together -- Use single LLM call for multiple summaries (where token budget allows) -- Implement priority queue for urgent documents - -#### 2.4 Change Propagation - -Optimize how changes propagate through the tree: - -**Approach:** -- Use bottom-up propagation for fingerprint updates -- Only update ancestors of changed nodes -- Implement efficient diff algorithm (Myers or patience diff) -- Cache intermediate results during propagation - -### Metrics to Track - -| Metric | Current | Target | -|--------|---------|--------| -| Full reindex time (100KB doc) | ~5s | **<1s** | -| Incremental update (1 section) | ~5s (full) | **<100ms** | -| LLM calls per update | 10-50 | **1-5** | -| Memory during update | 2x doc size | **1.2x** | - ---- - -## 3. Parallel Retrieval Optimization - -### Current State - -Retrieval is primarily sequential through the pipeline stages. - -### Problem Statement - -- Sequential stage execution -- No parallel candidate evaluation -- Underutilized multi-core CPUs - -### Prerequisites - -- [ ] Establish performance baseline with benchmarks -- [ ] Profile hot paths -- [ ] Identify parallelizable operations - -### Proposed Improvements - -#### 3.1 Parallel Stage Execution - -Execute independent pipeline stages concurrently: - -**Approach:** -- `AnalyzeStage` and initial `PlanStage` can run in parallel -- Fork-join pattern for search branches -- Use `tokio::join!` for concurrent stage execution - -**Parallelization Points:** -``` -┌─────────────┐ -│ Analyze │────┐ -└─────────────┘ │ - ├──▶ ┌─────────────┐ ──▶ ┌─────────────┐ -┌─────────────┐ │ │ Search │ │ Evaluate │ -│ Plan │────┘ │ (parallel) │ │ │ -└─────────────┘ └─────────────┘ └─────────────┘ -``` - -#### 3.2 Parallel Candidate Evaluation - -Evaluate multiple search candidates simultaneously: - -**Approach:** -- Use `futures::stream` for concurrent evaluation -- Limit concurrency with semaphore -- Collect results with timeout -- Merge and rank results - -**Concurrency Control:** -- Max concurrent evaluations: 4-8 (configurable) -- Per-evaluation timeout: 500ms -- Early termination on high-confidence result - -#### 3.3 Parallel Tree Traversal - -Traverse document tree branches in parallel: - -**Approach:** -- Spawn tasks for each top-level branch -- Use work-stealing for load balancing -- Aggregate results with structured concurrency - -### Metrics to Track - -| Metric | Current | Target | -|--------|---------|--------| -| P50 retrieval latency | ~200ms | **<50ms** | -| P99 retrieval latency | ~1s | **<200ms** | -| CPU utilization | ~30% | **70%+** | -| Throughput (queries/sec) | ~5 | **20+** | - ---- - -## 4. Memory Footprint Optimization - -### Current State - -Memory usage scales linearly with document size and cache capacity. - -### Problem Statement - -- Large documents (10MB+) can use 50MB+ memory -- Cache entries hold full strings -- No memory pressure handling - -### Prerequisites - -- [ ] Complete other Phase 2 optimizations -- [ ] Profile memory usage patterns -- [ ] Identify memory hot spots - -### Proposed Improvements - -#### 4.1 String Interning - -Deduplicate common strings: - -**Approach:** -- Use `string_interner` crate for titles, common phrases -- Intern node titles during parsing -- Store indices instead of full strings in hot paths - -**Expected Savings:** -- 20-40% reduction in string memory -- Faster string comparisons - -#### 4.2 Compressed Cache Entries - -Compress cached values: - -**Approach:** -- Use `zstd` or `lz4` for cache value compression -- Compress summaries and reasoning strings -- Decompress on cache hit - -**Trade-offs:** -- Extra CPU for compression/decompression -- Significant memory savings for text-heavy caches - -#### 4.3 Memory-Mapped Large Documents - -Use mmap for large document content: - -**Approach:** -- Store large documents as memory-mapped files -- Only load accessed sections into memory -- OS handles paging automatically - -**Threshold:** -- Documents > 1MB: use mmap -- Documents < 1MB: load entirely - -#### 4.4 Cache Eviction Under Pressure - -Respond to memory pressure: - -**Approach:** -- Monitor system memory usage -- Implement adaptive cache sizing -- Aggressive eviction when memory > 80% used -- Use `jemalloc` with background threads - -### Metrics to Track - -| Metric | Current | Target | -|--------|---------|--------| -| Memory per 1MB document | ~5MB | **<2MB** | -| Peak memory (10 docs) | ~500MB | **<200MB** | -| Cache memory efficiency | ~60% | **80%+** | -| GC pause time | N/A | **<10ms** | - ---- - -## Implementation Timeline - -``` -Week 1: -├── Day 1-2: Cache Strategy Optimization -│ ├── Semantic cache keys -│ └── Adaptive TTL -├── Day 3-4: Incremental Indexing -│ ├── Subtree-level updates -│ └── Lazy summary regeneration -└── Day 5: Integration testing - -Week 2: -├── Day 1-2: Performance Baseline -│ ├── Benchmark suite setup -│ └── Profiling infrastructure -├── Day 3-4: Parallel Retrieval -│ ├── Parallel stages -│ └── Concurrent evaluation -└── Day 5: Memory profiling - -Week 3: -├── Day 1-2: Memory Optimization -│ ├── String interning -│ └── Compressed cache -├── Day 3-4: Final tuning -│ └── Integration testing -└── Day 5: Documentation & release prep -``` - -## Success Criteria - -### Must Have (v0.3.0) - -- [ ] 90%+ cache hit rate for repeated queries -- [ ] <1s incremental update time -- [ ] <100ms P50 retrieval latency - -### Should Have - -- [ ] 60%+ cache hit rate for similar queries -- [ ] 70%+ CPU utilization during retrieval -- [ ] <200MB memory for 10 documents - -### Nice to Have - -- [ ] Multi-level caching (L1/L2/L3) -- [ ] Memory-mapped document storage -- [ ] Distributed cache support - -## Dependencies - -| Optimization | Requires | -|-------------|----------| -| Semantic cache keys | Embedding model (local or API) | -| Parallel retrieval | `tokio` profiling tools | -| Memory optimization | Memory profiler (`dhall` or `bytehound`) | - -## Risks - -| Risk | Mitigation | -|------|------------| -| Semantic cache adds latency | Use local embedding model (all-MiniLM) | -| Parallel execution complexity | Extensive testing, structured concurrency | -| Memory optimization regressions | Benchmark before/after each change | -| Cache coherence issues | Clear invalidation strategy, versioning | - -## References - -- [MemoStore Design](./memo.md) -- [Fingerprint System](./fingerprint.md) -- [Incremental Indexing](./incremental.md) -- [Pilot Architecture](./pilot.md) diff --git a/docs/design/pilot-architecture.svg b/docs/design/pilot-architecture.svg deleted file mode 100644 index e85caea3..00000000 --- a/docs/design/pilot-architecture.svg +++ /dev/null @@ -1,197 +0,0 @@ - - - - - - Pilot: The Brain of Retrieval Pipeline - - - - Retrieval Pipeline - - - - Analyze - • Complexity - • Keywords - - - - - Plan - • Strategy - • Algorithm - - - - - Search - • Beam/MCTS/Greedy - • Tree Traversal - - - - - Judge - • Sufficiency - • Backtrack? - - - - 🧠 Pilot - The Brain of Retrieval - - - - Budget - Controller - - - Context - Builder - - - Fallback - Manager - - - LLM Client + Metrics - - - - Intervention Points (When Pilot Acts) - - - - - 1 - START - Before search begins - • Analyze query intent - • Identify target sections - • Set initial direction - • Provide entry points - - - - - 2 - FORK - At branch points - • Multiple children match - • Rank candidates - • Merge LLM + algo scores - • Guide path selection - - - - - 3 - BACKTRACK - When search fails - • Insufficient results - • Suggest alternatives - • Re-rank candidates - • Adjust search params - - - - - 4 - EVALUATE - After content found - • Check sufficiency - • Quality assessment - • Decide more data? - • Final confidence - - - - Score Merging: Algorithm + LLM - - - Algorithm Score - Text similarity - Heuristics - - + - - - LLM Score - Semantic relevance - Reasoning - - = - - - Final - Score - - final = α × algo + β × llm (configurable weights) - - - - 4-Level Fallback Strategy - - - - Normal - LLM OK - - - - - - Retry - Backoff - 2x delay - - - - - Simplified - Less tokens - Short ctx - - - - - Algo - Only - - Automatic escalation on consecutive failures - - - - Design Philosophy - - Algorithm = "How to search" - Efficient, deterministic - - Pilot = "Where to go" - Semantic understanding, direction - - Intervention at key points - Not every step, only when needed - - - - - - - - Backtrack with Pilot guidance - - - - - - - - - - - - - - diff --git a/docs/design/pilot.md b/docs/design/pilot.md deleted file mode 100644 index a0d43480..00000000 --- a/docs/design/pilot.md +++ /dev/null @@ -1,1505 +0,0 @@ -这里是为您翻译的英文版 Pilot 设计文档。 - -# Pilot Design Document - -> Pilot - The Brain of the Retriever Pipeline - -## Overview - -Pilot is the core intelligent component of the Vectorless retrieval system. It is responsible for understanding queries, analyzing document structures, and making search decisions. Unlike traditional vector retrieval, Pilot uses LLMs for semantic understanding and navigation decisions while maintaining efficient algorithmic execution. - -### Design Philosophy - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ Design Philosophy │ -├─────────────────────────────────────────────────────────────────┤ -│ 1. Algorithm handles "How to walk" - Efficient, deterministic, low latency │ -│ 2. Pilot handles "Where to go" - Semantic understanding, ambiguity resolution, direction judgment │ -│ 3. Key decision point intervention - Not asking the LLM at every step, but only when needed │ -│ 4. Layered fallback - Algorithm takes over when LLM fails, Pilot rescues when algorithm fails │ -└─────────────────────────────────────────────────────────────────┘ -``` - -### Naming Origin - -**Pilot** - Like the pilot of an airplane, Pilot does not directly operate every mechanical part (that is the Algorithm's responsibility), but is responsible for: -- Understanding the destination (User Query) -- Planning the route (Search Strategy) -- Making decisions at key nodes (Intervention Points) -- Responding to emergencies (Fallback) - ---- - -## 1. Pilot Detailed Design - -### 1.1 Overall Architecture - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Pilot Architecture │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ ┌───────────────────────────────────────────────────────────────────────┐ │ -│ │ Pilot (Core) │ │ -│ │ │ │ -│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ -│ │ │ Query │ │ Context │ │ Decision │ │ │ -│ │ │ Analyzer │──▶│ Builder │──▶│ Engine │ │ │ -│ │ │ (Query Analyzer)│ │ (Context Builder)│ │ (Decision Engine)│ │ │ -│ │ └─────────────┘ └─────────────┘ └──────┬──────┘ │ │ -│ │ │ │ │ -│ │ ▼ │ │ -│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ -│ │ │ Response │◀──│ LLM │◀──│ Prompt │ │ │ -│ │ │ Parser │ │ Client │ │ Builder │ │ │ -│ │ │ (Response Parser)│ │ (LLM Client) │ │ (Prompt Builder) │ │ │ -│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ -│ │ │ │ -│ └───────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌───────────────────────────────────────────────────────────────────────┐ │ -│ │ Supporting Systems │ │ -│ │ │ │ -│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ -│ │ │ Budget │ │ Fallback │ │ Metrics │ │ │ -│ │ │ Controller │ │ Manager │ │ Collector │ │ │ -│ │ │ (Budget Controller)│ (Fallback Manager)│ (Metrics Collector)│ │ │ -│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ -│ │ │ │ -│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ -│ │ │ Policy │ │ Cache │ │ Logger │ │ │ -│ │ │ Manager │ │ (Optional) │ │ (Tracing) │ │ │ -│ │ │ (Policy Manager)│ │ (Cache) │ │ (Logger/Tracing) │ │ │ -│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ -│ │ │ │ -│ └───────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - ---- - -## 1.4 Information Sources for Pilot Decisions - -Pilot's decisions rely on multi-layered information, with the TOC View being the core—it is like a navigation electronic map. - -### Information Source Architecture - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Pilot's "Navigation Map" │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ ┌─────────────────┐ │ -│ │ User Query │ │ -│ │ "PostgreSQL │ │ -│ │ Connection Pool Config"│ │ -│ └────────┬────────┘ │ -│ │ │ -│ ▼ │ -│ ┌───────────────────────────────────────────────────────────────────────┐ │ -│ │ Pilot Context │ │ -│ │ │ │ -│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ -│ │ │ TOC View │ │ Current │ │ Candidates │ │ │ -│ │ │ (E-Map) │ │ Path │ │ Info │ │ │ -│ │ │ │ │ (Current Pos)│ │ (Candidates)│ │ │ -│ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ -│ │ │ │ │ │ │ -│ │ └─────────────────┼─────────────────┘ │ │ -│ │ ▼ │ │ -│ │ ┌─────────────────┐ │ │ -│ │ │ LLM Decision │ │ │ -│ │ │ (Where to go) │ │ │ -│ │ └─────────────────┘ │ │ -│ │ │ │ -│ └───────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -### TOC View - Electronic Map (Core) - -The TOC View is the core basis for Pilot's decisions, built from content generated during the Index phase: - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ TOC View - Electronic Map │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ Content generated during Index phase: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ TreeNode { │ │ -│ │ title: "Configuration", // Title │ │ -│ │ summary: "This chapter introduces...", // LLM-generated Summary │ │ -│ │ depth: 1, │ │ -│ │ children: [...], │ │ -│ │ } │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ TOC View Construction Logic: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ generate_toc_view(tree, current_node): │ │ -│ │ │ │ -│ │ // 1. Generate from current node perspective │ │ -│ │ // 2. Include sibling nodes (horizontal view) │ │ -│ │ // 3. Include child nodes (vertical view) │ │ -│ │ // 4. Each node contains title + summary │ │ -│ │ │ │ -│ │ Example Output: │ │ -│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ -│ │ │ 📍 Current Location: Root → Configuration │ │ │ -│ │ │ │ │ │ -│ │ │ 📂 Sibling Nodes: │ │ │ -│ │ │ ├─ Introduction [Overview of features and architecture] │ │ │ -│ │ │ ├─ Installation [Installation steps and requirements] │ │ │ -│ │ │ ├─ Configuration ⭐ [Detailed config items] ← Current │ │ │ -│ │ │ │ ├─ Basic Config [Basic parameter settings] │ │ │ -│ │ │ │ ├─ Database Config [DB connection related] ← Match! │ │ │ -│ │ │ │ └─ Advanced Config [Performance tuning options] │ │ │ -│ │ │ └─ API Reference [Interface documentation] │ │ │ -│ │ └─────────────────────────────────────────────────────────────┘ │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -### Three-Layer Information Structure - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Three Layers of Pilot Decision Info │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ Layer 1: TOC View (Global Map) │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ Role: Provides a global structural view of the document │ │ -│ │ Source: Summary generated by the Enrich stage of the Index Pipeline│ │ -│ │ Token: ~200-500 tokens │ │ -│ │ │ │ -│ │ Example: │ │ -│ │ "Doc Structure: 1.Intro 2.Install 3.Config(3.1Basic 3.2DB 3.3Adv) 4.API"│ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ Layer 2: Current Path (Current Location) │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ Role: Tells the LLM where we have been │ │ -│ │ Source: Path records of the search process │ │ -│ │ Token: ~50-100 tokens │ │ -│ │ │ │ -│ │ Example: │ │ -│ │ "Current Path: Root → Configuration → Database Config" │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ Layer 3: Candidates Detail (Candidate Intersection Details) │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ Role: Provides detailed info on candidate nodes for LLM judgment │ │ -│ │ Source: TreeNode's title + summary + partial content │ │ -│ │ Token: ~100-300 tokens │ │ -│ │ │ │ -│ │ Example: │ │ -│ │ Candidates: │ │ -│ │ A. Connection String │ │ -│ │ Summary: Configure DB connection URL and auth info │ │ -│ │ B. Connection Pool ⭐ │ │ -│ │ Summary: Configure pool size, timeouts, max connections, etc. │ │ -│ │ C. Timeout Settings │ │ -│ │ Summary: Configure query and connection timeout │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -### Decision Process Example - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Pilot Decision Process Example │ -└─────────────────────────────────────────────────────────────────────────────┘ - -Query: "How to configure the max connections for PostgreSQL connection pool?" - -Step 1: Build TOC View (from Index stage summary) -┌─────────────────────────────────────────────────────────────────────────────┐ -│ TOC View (Simplified): │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ Document Structure: │ │ -│ │ 1. Quick Start │ │ -│ │ 2. Configuration │ │ -│ │ 2.1 Basic Config │ │ -│ │ 2.2 Database Config │ │ -│ │ - Connection String │ │ -│ │ - Connection Pool ← Contains "Connection Pool" │ │ -│ │ - Timeout Settings │ │ -│ │ 2.3 Advanced Config │ │ -│ │ 3. API │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ This TOC is constructed from Index stage LLM-generated summaries! │ -└─────────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -Step 2: LLM Analysis -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Information seen by LLM: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ User Query: "How to configure the max connections for PostgreSQL connection pool?" │ -│ │ │ │ -│ │ Current Location: Configuration → Database Config │ │ -│ │ │ │ -│ │ Candidates: │ │ -│ │ 1. Connection String [Configure DB URL and auth] │ │ -│ │ 2. Connection Pool [Configure pool size, timeout, max connections] ← Direct Match! │ -│ │ 3. Timeout Settings [Configure query timeout] │ │ -│ │ │ │ -│ │ Which node is most likely to contain the answer? │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ LLM Reasoning: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ Query Keywords: "Connection Pool", "Max Connections" │ │ -│ │ Candidate 2 Summary: "Connection Pool", "Max Connections" │ │ -│ │ → Candidate 2 matches directly, Confidence 0.95 │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -└─────────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -Step 3: Return Decision -┌─────────────────────────────────────────────────────────────────────────────┐ -│ PilotDecision { │ -│ ranked_candidates: [ │ -│ (Node 2 "Connection Pool", score: 0.95, reason: "Summary directly matches query keywords"), │ -│ (Node 3 "Timeout Settings", score: 0.30, reason: "Not very relevant"), │ -│ (Node 1 "Connection String", score: 0.20, reason: "Irrelevant"), │ -│ ], │ -│ direction: GoDeeper, │ -│ confidence: 0.95, │ -│ reasoning: "Candidate node 'Connection Pool' summary explicitly mentions 'max connections', direct query match", │ -│ } │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -### Key Insights - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Key Insights │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ 1. Index stage summary quality determines Pilot effectiveness │ -│ ┌─────────────────────────────────────────────────────────────────┐ │ -│ │ Good summary: "Configure connection pool size, timeout, max connections params" │ -│ │ Bad summary: "This chapter introduces connection pool related content" │ -│ │ │ │ -│ │ → The prompt in the Index Enrich stage is crucial! │ │ -│ └─────────────────────────────────────────────────────────────────┘ │ -│ │ -│ 2. TOC View needs to be generated dynamically │ -│ ┌─────────────────────────────────────────────────────────────────┐ │ -│ │ Not the TOC of the entire document, but a local view from the "current node" perspective │ -│ │ Includes: Sibling nodes + Child nodes + Parent chain │ │ -│ │ │ │ -│ │ This keeps Token consumption manageable while providing context│ │ -│ └─────────────────────────────────────────────────────────────────┘ │ -│ │ -│ 3. Analogy: Gaode Map (or Google Maps) Navigation │ -│ ┌─────────────────────────────────────────────────────────────────┐ │ -│ │ TOC View = Map (Road network) │ │ -│ │ Summary = Road signs (Intersection descriptions) │ │ -│ │ Current Path = GPS Location (Current position) │ │ -│ │ Candidates = Upcoming intersections (Optional directions) │ │ -│ │ Query = Destination (Where to go) │ │ -│ │ │ │ -│ │ Pilot = Driver (Integrates above info to make decisions)│ │ -│ └─────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -### ContextBuilder Token Budget Allocation - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ ContextBuilder - Token Budget Allocation │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ Token Budget Allocation (Assuming 500 tokens total budget): │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ │ │ -│ │ ┌────────────────────────────────────────┐ 30% (150 tokens) │ │ -│ │ │ Query + Intent │ │ │ -│ │ │ "PostgreSQL connection pool max connections config"│ │ │ -│ │ └────────────────────────────────────────┘ │ │ -│ │ │ │ -│ │ ┌────────────────────────────────────────────────┐ 20% (100 tokens) │ │ -│ │ │ Current Path │ │ │ -│ │ │ Root → Configuration → Database Config │ │ │ -│ │ └────────────────────────────────────────┘ │ │ -│ │ │ │ -│ │ ┌────────────────────────────────────────────────┐ 40% (200 tokens) │ │ -│ │ │ Candidates (title + summary each) │ │ │ -│ │ │ A. Connection String [Configure URL and auth] │ │ │ -│ │ │ B. Connection Pool [Configure pool size, max connections] │ │ │ -│ │ │ C. Timeout Settings [Configure timeout] │ │ │ -│ │ └────────────────────────────────────────┘ │ │ -│ │ │ │ -│ │ ┌────────────────────────────────────────────────┐ 10% (50 tokens) │ │ -│ │ │ Sibling Context (Sibling overview) │ │ │ -│ │ │ Other siblings: Basic Config, Advanced Config │ │ │ -│ │ └────────────────────────────────────────┘ │ │ -│ │ │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ Dynamic Adjustment Strategy: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ if candidates.len() > 5: │ │ -│ │ // Too many candidates, reduce detail per candidate │ │ -│ │ Include only title, exclude summary │ │ -│ │ │ │ -│ │ if depth > 3: │ │ -│ │ // Deep search, reduce TOC range │ │ -│ │ Show only current layer and child layers │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - ---- - -## 2. Intervention Point Detailed Design - -### 2.1 Intervention Point Types - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Pilot Intervention Points │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ START - Search Start │ │ -│ ├─────────────────────────────────────────────────────────────────────┤ │ -│ │ Timing: Before search algorithm starts │ │ -│ │ Task: Understand query intent, determine entry points and priority │ │ -│ │ Input: query, tree (ToC view) │ │ -│ │ Output: entry_points, initial_direction, confidence │ │ -│ │ Config: guide_at_start: bool │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ FORK - Fork in the Road │ │ -│ ├─────────────────────────────────────────────────────────────────────┤ │ -│ │ Timing: When current node has multiple candidate child nodes │ │ -│ │ Task: Determine which branch is more likely to contain the answer │ │ -│ │ Input: path, candidates, query │ │ -│ │ Output: ranked_candidates, direction, confidence │ │ -│ │ Trigger: candidates.len() > fork_threshold │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ BACKTRACK - Backtrack │ │ -│ ├─────────────────────────────────────────────────────────────────────┤ │ -│ │ Timing: When Judge determines content is insufficient, needs backtracking │ -│ │ Task: Analyze failure reason, suggest new search direction │ │ -│ │ Input: failed_path, visited, query │ │ -│ │ Output: alternative_branches, backtrack_reason │ │ -│ │ Config: guide_at_backtrack: bool │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ EVALUATE - Node Evaluation │ │ -│ ├─────────────────────────────────────────────────────────────────────┤ │ -│ │ Timing: When needing to determine if current node contains answer │ │ -│ │ Task: Evaluate relevance of node content to query │ │ -│ │ Input: node_content, query │ │ -│ │ Output: relevance_score, is_answer, reasoning │ │ -│ │ Trigger: Reaching leaf node or when algorithm is uncertain │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -### 2.2 Intervention Judgment Logic - -```rust -impl Pilot for LlmPilot { - fn should_intervene(&self, state: &SearchState<'_>) -> bool { - let config = &self.config.intervention; - - // Condition 1: Budget check (Highest priority) - if !self.budget.can_call() { - return false; - } - - // Condition 2: Number of candidates exceeds threshold (Fork) - if state.candidates.len() > config.fork_threshold { - return true; - } - - // Condition 3: Candidate scores are close (Algorithm cannot distinguish) - if self.scores_are_close(state.candidates, state.tree, config.score_gap_threshold) { - return true; - } - - // Condition 4: Current score is too low (May be going the wrong way) - if state.best_score < config.low_score_threshold { - return true; - } - - // Condition 5: During backtracking and config allows - if state.is_backtracking && self.config.guide_at_backtrack { - return true; - } - - // Condition 6: Intervention limit per level - let level_calls = self.get_level_calls(state.depth); - if level_calls >= config.max_interventions_per_level { - return false; - } - - false - } -} - -/// Check if candidate scores are close -fn scores_are_close(&self, candidates: &[NodeId], tree: &DocumentTree, threshold: f32) -> bool { - if candidates.len() < 2 { - return false; - } - - let scores: Vec = candidates.iter() - .map(|&id| self.scorer.quick_score(tree, id)) - .collect(); - - let max_score = scores.iter().cloned().fold(0.0, f32::max); - let min_score = scores.iter().cloned().fold(1.0, f32::min); - - (max_score - min_score) < threshold -} -``` - -### 2.3 Intervention Configuration - -```rust -/// Intervention Configuration -#[derive(Debug, Clone)] -pub struct InterventionConfig { - /// Candidate count threshold (Consider intervention if exceeded) - pub fork_threshold: usize, - /// Score gap threshold (Intervene if gap is smaller than this) - pub score_gap_threshold: f32, - /// Low score threshold (Intervene if highest score is lower than this) - pub low_score_threshold: f32, - /// Max interventions per level - pub max_interventions_per_level: usize, -} - -impl Default for InterventionConfig { - fn default() -> Self { - Self { - fork_threshold: 3, // Intervene when > 3 candidates - score_gap_threshold: 0.15, // Intervene if score gap < 0.15 - low_score_threshold: 0.3, // Intervene if score < 0.3 - max_interventions_per_level: 2, // Max 2 interventions per level - } - } -} -``` - ---- - -## 3. Fallback Mechanism - -### 3.1 Fallback Levels - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Fallback Levels │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ Level 0: Normal LLM Call │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ Condition: Budget sufficient, LLM service available │ │ -│ │ Behavior: Normal LLM call, get decision │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ Failure │ -│ ▼ │ -│ Level 1: Retry │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ Condition: Network error, timeout, rate limit │ │ -│ │ Behavior: Exponential backoff retry, max 3 times │ │ -│ │ Params: initial_delay=1s, max_delay=10s, max_attempts=3 │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ Failure │ -│ ▼ │ -│ Level 2: Simplify Context │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ Condition: Token limit exceeded, context too long │ │ -│ │ Behavior: Reduce context info, keep only core content │ │ -│ │ Strategy: Remove ToC, keep only current node and candidate titles │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ Failure │ -│ ▼ │ -│ Level 3: Pure Algorithm Mode │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ Condition: LLM completely unavailable, budget exhausted │ │ -│ │ Behavior: Rely entirely on algorithm scoring, no LLM calls │ │ -│ │ Result: Use NodeScorer keyword matching │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -### 3.2 Fallback Strategy Definition - -```rust -/// Fallback Strategy -#[derive(Debug, Clone)] -pub enum FallbackStrategy { - /// Retry strategy - Retry { - max_attempts: usize, - backoff: BackoffPolicy, - }, - /// Simplify context - SimplifyContext { - remove_toc: bool, - max_candidates: usize, - }, - /// Use algorithm instead - UseAlgorithm, - /// Return default decision - ReturnDefault, -} - -/// Backoff Policy -#[derive(Debug, Clone)] -pub enum BackoffPolicy { - /// Fixed interval - Fixed { delay_ms: u64 }, - /// Linear increase - Linear { initial_ms: u64, increment_ms: u64 }, - /// Exponential increase - Exponential { initial_ms: u64, multiplier: f64, max_ms: u64 }, -} - -impl Default for BackoffPolicy { - fn default() -> Self { - Self::Exponential { - initial_ms: 1000, - multiplier: 2.0, - max_ms: 10000, - } - } -} -``` - -### 3.3 FallbackManager Implementation - -```rust -/// Fallback Manager -pub struct FallbackManager { - config: FallbackConfig, - /// Current fallback level - current_level: AtomicU8, - /// Consecutive failure count - consecutive_failures: AtomicUsize, -} - -impl FallbackManager { - /// Execute with fallback - pub async fn execute_with_fallback( - &self, - operation: F, - ) -> Result - where - F: Fn() -> std::pin::Pin> + Send>>, - { - let mut level = self.current_level.load(Ordering::Relaxed); - - loop { - match level { - 0 => { - // Level 0: Normal call - match operation().await { - Ok(result) => { - self.on_success(); - return Ok(result); - } - Err(e) => { - self.on_failure(); - if self.should_escalate() { - level = 1; - continue; - } - return Err(FallbackError::from(e)); - } - } - } - 1 => { - // Level 1: Retry - match self.retry_operation(&operation).await { - Ok(result) => { - self.on_success(); - return Ok(result); - } - Err(_) => { - level = 2; - continue; - } - } - } - 2 => { - // Level 2: Simplify context - // Handled by caller, return specific error - return Err(FallbackError::SimplifyContextRequired); - } - 3 => { - // Level 3: Pure algorithm mode - return Err(FallbackError::AlgorithmFallback); - } - _ => unreachable!(), - } - } - } - - /// Retry operation - async fn retry_operation(&self, operation: &F) -> Result - where - F: Fn() -> std::pin::Pin> + Send>>, - { - let policy = &self.config.retry_policy; - let mut delay = policy.initial_delay_ms(); - - for attempt in 0..policy.max_attempts { - if attempt > 0 { - tokio::time::sleep(Duration::from_millis(delay)).await; - delay = policy.next_delay(delay); - } - - match operation().await { - Ok(result) => return Ok(result), - Err(e) if attempt == policy.max_attempts - 1 => return Err(e), - Err(_) => continue, - } - } - - Err(PilotError::RetryExhausted) - } - - fn on_success(&self) { - self.consecutive_failures.store(0, Ordering::Relaxed); - // Gradually recover to higher level - let current = self.current_level.load(Ordering::Relaxed); - if current > 0 { - self.current_level.fetch_sub(1, Ordering::Relaxed); - } - } - - fn on_failure(&self) { - let failures = self.consecutive_failures.fetch_add(1, Ordering::Relaxed); - // Escalate fallback level after 3 consecutive failures - if failures >= 2 { - let current = self.current_level.load(Ordering::Relaxed); - if current < 3 { - self.current_level.fetch_add(1, Ordering::Relaxed); - } - self.consecutive_failures.store(0, Ordering::Relaxed); - } - } - - fn should_escalate(&self) -> bool { - self.consecutive_failures.load(Ordering::Relaxed) >= 3 - } -} -``` - ---- - -## 4. Token Consumption Measurement - -### 4.1 Budget Configuration - -```rust -/// Budget Configuration -#[derive(Debug, Clone)] -pub struct BudgetConfig { - /// Max tokens per single query retrieval - pub max_tokens_per_query: usize, - /// Max tokens per single LLM call - pub max_tokens_per_call: usize, - /// Max LLM calls per single query - pub max_calls_per_query: usize, - /// Max calls per level (depth) - pub max_calls_per_level: usize, - /// Hard limit flag (true: reject if over budget; false: try to continue) - pub hard_limit: bool, -} - -impl Default for BudgetConfig { - fn default() -> Self { - Self { - max_tokens_per_query: 2000, // Max 2000 tokens per query - max_tokens_per_call: 500, // Max 500 tokens per call - max_calls_per_query: 5, // Max 5 calls - max_calls_per_level: 2, // Max 2 calls per level - hard_limit: true, - } - } -} -``` - -### 4.2 Budget Controller - -```rust -/// Budget Controller -pub struct BudgetController { - config: BudgetConfig, - /// Tokens used - tokens_used: AtomicUsize, - /// Calls made - calls_made: AtomicUsize, - /// Calls per level - level_calls: RwLock>, -} - -impl BudgetController { - /// Create new budget controller - pub fn new(config: BudgetConfig) -> Self { - Self { - config, - tokens_used: AtomicUsize::new(0), - calls_made: AtomicUsize::new(0), - level_calls: RwLock::new(HashMap::new()), - } - } - - /// Check if LLM can be called - pub fn can_call(&self) -> bool { - let calls = self.calls_made.load(Ordering::Relaxed); - let tokens = self.tokens_used.load(Ordering::Relaxed); - - calls < self.config.max_calls_per_query - && tokens < self.config.max_tokens_per_query - } - - /// Check if call is possible at specific level - pub fn can_call_at_level(&self, level: usize) -> bool { - if !self.can_call() { - return false; - } - - let level_calls = self.level_calls.read().unwrap(); - let calls = level_calls.get(&level).copied().unwrap_or(0); - calls < self.config.max_calls_per_level - } - - /// Estimate call cost - pub fn estimate_cost(&self, context: &str) -> usize { - // Use tiktoken or simple character estimation - // Rough estimate: 1 token ≈ 4 chars (English) or 1.5 chars (Chinese) - let char_count = context.chars().count(); - // Conservative estimate, based on Chinese - char_count / 2 + 100 // +100 reserved for output - } - - /// Check if estimated cost is within budget - pub fn can_afford(&self, estimated_cost: usize) -> bool { - let remaining = self.remaining_budget(); - estimated_cost <= remaining && estimated_cost <= self.config.max_tokens_per_call - } - - /// Get remaining budget - pub fn remaining_budget(&self) -> usize { - let used = self.tokens_used.load(Ordering::Relaxed); - self.config.max_tokens_per_query.saturating_sub(used) - } - - /// Record token usage - pub fn record_usage(&self, input_tokens: usize, output_tokens: usize, level: usize) { - let total = input_tokens + output_tokens; - self.tokens_used.fetch_add(total, Ordering::Relaxed); - self.calls_made.fetch_add(1, Ordering::Relaxed); - - // Record level calls - let mut level_calls = self.level_calls.write().unwrap(); - *level_calls.entry(level).or_insert(0) += 1; - } - - /// Get usage statistics - pub fn get_usage_stats(&self) -> BudgetUsage { - BudgetUsage { - tokens_used: self.tokens_used.load(Ordering::Relaxed), - calls_made: self.calls_made.load(Ordering::Relaxed), - max_tokens: self.config.max_tokens_per_query, - max_calls: self.config.max_calls_per_query, - } - } - - /// Reset (when new query starts) - pub fn reset(&self) { - self.tokens_used.store(0, Ordering::Relaxed); - self.calls_made.store(0, Ordering::Relaxed); - self.level_calls.write().unwrap().clear(); - } -} - -/// Budget Usage Statistics -#[derive(Debug, Clone)] -pub struct BudgetUsage { - pub tokens_used: usize, - pub calls_made: usize, - pub max_tokens: usize, - pub max_calls: usize, -} - -impl BudgetUsage { - pub fn utilization(&self) -> f32 { - self.tokens_used as f32 / self.max_tokens as f32 - } -} -``` - -### 4.3 Token Consumption Flow - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Token Consumption Flow │ -└─────────────────────────────────────────────────────────────────────────────┘ - -Before LLM Call: -┌─────────────────────────────────────────────────────────────────────────────┐ -│ 1. BudgetController.can_call() │ -│ └─ Check: calls_made < max_calls_per_query │ -│ └─ Check: tokens_used < max_tokens_per_query │ -│ │ -│ 2. BudgetController.can_call_at_level(depth) │ -│ └─ Check: level_calls[depth] < max_calls_per_level │ -│ │ -│ 3. BudgetController.estimate_cost(context) │ -│ └─ Estimate: input_tokens + output_tokens (reserved) │ -│ │ -│ 4. BudgetController.can_afford(estimated_cost) │ -│ └─ Check: estimated_cost <= remaining_budget │ -│ └─ Check: estimated_cost <= max_tokens_per_call │ -│ │ -│ Decision: All pass → Continue call; Any fail → Skip or Fallback │ -└─────────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -LLM Call: -┌─────────────────────────────────────────────────────────────────────────────┐ -│ LLM Client Returns: │ -│ - usage.prompt_tokens (Input tokens) │ -│ - usage.completion_tokens (Output tokens) │ -└─────────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -After LLM Call: -┌─────────────────────────────────────────────────────────────────────────────┐ -│ BudgetController.record_usage(input_tokens, output_tokens, level) │ -│ └─ tokens_used += input_tokens + output_tokens │ -│ └─ calls_made += 1 │ -│ └─ level_calls[level] += 1 │ -│ │ -│ MetricsCollector.record(...): │ -│ └─ total_input_tokens += input_tokens │ -│ └─ total_output_tokens += output_tokens │ -│ └─ estimated_cost = calculate_cost(tokens, model_price) │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - ---- - -## 5. Responsibility Division - -### 5.1 Module Responsibilities - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Pilot Module Responsibilities │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ QueryAnalyzer - Query Analyzer │ │ -│ ├─────────────────────────────────────────────────────────────────────┤ │ -│ │ Responsibilities: │ │ -│ │ • Analyze query complexity (Simple/Medium/Complex) │ │ -│ │ • Extract keywords and entities │ │ -│ │ • Identify query intent (Fact/Compare/Explain/How-To) │ │ -│ │ • Determine if Pilot intervention is needed │ │ -│ │ │ │ -│ │ Input: query: String │ │ -│ │ Output: QueryAnalysis { complexity, keywords, intent, needs_pilot } │ │ -│ │ │ │ -│ │ Implementation Strategy: │ │ -│ │ • Lightweight: Rule-based (keyword count, sentence structure) │ │ -│ │ • Heavyweight: LLM analysis (complex queries) │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ ContextBuilder - Context Builder │ │ -│ ├─────────────────────────────────────────────────────────────────────┤ │ -│ │ Responsibilities: │ │ -│ │ • Build context information to send to LLM │ │ -│ │ • Extract node information (title, summary, depth) of current path │ │ -│ │ • Build descriptions of candidate nodes │ │ -│ │ • Generate ToC view (from current node perspective) │ │ -│ │ • Control token budget allocation │ │ -│ │ │ │ -│ │ Input: tree, path, candidates, query │ │ -│ │ Output: PilotContext { path_info, candidates_info, toc_view } │ │ -│ │ │ │ -│ │ Token Budget Allocation: │ │ -│ │ • path_info: 20% │ │ -│ │ • candidates_info: 50% │ │ -│ │ • toc_view: 30% │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ PromptBuilder - Prompt Builder │ │ -│ ├─────────────────────────────────────────────────────────────────────┤ │ -│ │ Responsibilities: │ │ -│ │ • Select appropriate prompt template based on scenario │ │ -│ │ • Fill template variables │ │ -│ │ • Manage system prompt and user prompt │ │ -│ │ • Support multiple languages │ │ -│ │ │ │ -│ │ Scenario Types: │ │ -│ │ • START: Search start, determine entry point │ │ -│ │ • FORK: Fork in road, choose branch │ │ -│ │ • BACKTRACK: When backtracking, analyze failure reason │ │ -│ │ • EVALUATE: Evaluate if node contains answer │ │ -│ │ │ │ -│ │ Design Points: │ │ -│ │ • Configurable templates (user-customizable) │ │ -│ │ • Include few-shot examples (improve quality) │ │ -│ │ • Clear output format (JSON schema) │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ DecisionEngine - Decision Engine │ │ -│ ├─────────────────────────────────────────────────────────────────────┤ │ -│ │ Responsibilities: │ │ -│ │ • Determine when to call LLM (should_intervene) │ │ -│ │ • Coordinate LLM calls │ │ -│ │ • Fuse algorithm scoring and LLM suggestions │ │ -│ │ • Make final decision │ │ -│ │ │ │ -│ │ Decision Logic: │ │ -│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ -│ │ │ should_intervene(state) -> bool │ │ │ -│ │ │ │ │ │ -│ │ │ // Strategy 1: Fork in road │ │ │ -│ │ │ if candidates.len() > config.fork_threshold { return true } │ │ │ -│ │ │ │ │ │ -│ │ │ // Strategy 2: Algorithm uncertain │ │ │ -│ │ │ if scores_are_close(candidates) { return true } │ │ │ -│ │ │ │ │ │ -│ │ │ // Strategy 3: Low confidence │ │ │ -│ │ │ if best_score < config.low_confidence_threshold { return true }│ │ -│ │ │ │ │ │ -│ │ │ // Strategy 4: Budget check │ │ │ -│ │ │ if budget_exhausted() { return false } │ │ │ -│ │ │ │ │ │ -│ │ │ return false │ │ │ -│ │ └─────────────────────────────────────────────────────────────┘ │ │ -│ │ │ │ -│ │ Fusion Logic: │ │ -│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ -│ │ │ final_score = α * algo_score + β * llm_confidence │ │ │ -│ │ │ │ │ │ -│ │ │ // α and β dynamically adjust based on scenario │ │ │ -│ │ │ // - Higher β when LLM confidence is high │ │ │ -│ │ │ // - Higher α when algorithm score is high and LLM confidence is low││ -│ │ └─────────────────────────────────────────────────────────────┘ │ │ -│ │ │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ ResponseParser - Response Parser │ │ -│ ├─────────────────────────────────────────────────────────────────────┤ │ -│ │ Responsibilities: │ │ -│ │ • Parse JSON returned by LLM │ │ -│ │ • Handle format errors │ │ -│ │ • Extract structured information (ranked_candidates, direction, confidence)│ -│ │ • Validate response effectiveness │ │ -│ │ │ │ -│ │ Parsing Strategy: │ │ -│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ -│ │ │ parse(response: String) -> Result │ │ │ -│ │ │ │ │ │ -│ │ │ // Priority 1: JSON parsing │ │ │ -│ │ │ if let Ok(json) = parse_json(response) { return json } │ │ │ -│ │ │ │ │ │ -│ │ │ // Priority 2: Regex extraction │ │ │ -│ │ │ if let Some(data) = extract_by_regex(response) { return data }│ │ -│ │ │ │ │ │ -│ │ │ // Priority 3: Default value │ │ │ -│ │ │ return PilotDecision::default() │ │ │ -│ │ └─────────────────────────────────────────────────────────────┘ │ │ -│ │ │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ BudgetController - Budget Controller │ │ -│ ├─────────────────────────────────────────────────────────────────────┤ │ -│ │ Responsibilities: │ │ -│ │ • Track token consumption │ │ -│ │ • Control LLM call frequency │ │ -│ │ • Estimate call cost │ │ -│ │ • Enforce budget limits │ │ -│ │ │ │ -│ │ Configuration: │ │ -│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ -│ │ │ BudgetConfig { │ │ │ -│ │ │ max_tokens_per_query: usize, // Total budget per query│ │ │ -│ │ │ max_tokens_per_call: usize, // Budget per call │ │ │ -│ │ │ max_calls_per_query: usize, // Max calls per query │ │ │ -│ │ │ max_calls_per_level: usize, // Max calls per level │ │ │ -│ │ │ hard_limit: bool, // Whether hard limit │ │ │ -│ │ │ } │ │ │ -│ │ └─────────────────────────────────────────────────────────────┘ │ │ -│ │ Interface: │ │ -│ │ • can_call() -> bool │ │ -│ │ • can_call_at_level(level) -> bool │ │ -│ │ • estimate_cost(context) -> usize │ │ -│ │ • can_afford(estimated_cost) -> bool │ │ -│ │ • record_usage(input, output, level) │ │ -│ │ • remaining_budget() -> usize │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ FallbackManager - Fallback Manager │ │ -│ ├─────────────────────────────────────────────────────────────────────┤ │ -│ │ Responsibilities: │ │ -│ │ • Handle LLM call failures │ │ -│ │ • Provide fallback strategies │ │ -│ │ • Record failure reasons │ │ -│ │ • Automatic recovery mechanism │ │ -│ │ │ │ -│ │ Fallback Levels: │ │ -│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ -│ │ │ Level 0: Normal LLM call │ │ │ -│ │ │ ↓ Failure │ │ │ -│ │ │ Level 1: Retry (max 3 times, exponential backoff) │ │ │ -│ │ │ ↓ Failure │ │ │ -│ │ │ Level 2: Simplify prompt (reduce context) │ │ │ -│ │ │ ↓ Failure │ │ │ -│ │ │ Level 3: Pure algorithm mode (complete fallback) │ │ │ -│ │ └─────────────────────────────────────────────────────────────┘ │ │ -│ │ │ │ -│ │ Fallback Strategies: │ │ -│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ -│ │ │ enum FallbackStrategy { │ │ │ -│ │ │ Retry { max_attempts: usize, backoff: BackoffPolicy }, │ │ │ -│ │ │ SimplifyContext, // Reduce context info │ │ │ -│ │ │ UseAlgorithm, // Use algorithm scoring │ │ │ -│ │ │ ReturnDefault, // Return default decision │ │ │ -│ │ │ } │ │ │ -│ │ └─────────────────────────────────────────────────────────────┘ │ │ -│ │ │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ PolicyManager - Policy Manager │ │ -│ ├─────────────────────────────────────────────────────────────────────┤ │ -│ │ Responsibilities: │ │ -│ │ • Manage intervention strategy configuration │ │ -│ │ • Support multiple operation modes │ │ -│ │ • Dynamic parameter adjustment (optional) │ │ -│ │ │ │ -│ │ Policy Modes: │ │ -│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ -│ │ │ enum PilotMode { │ │ │ -│ │ │ Aggressive, // Aggressive mode: frequent LLM calls │ │ │ -│ │ │ Balanced, // Balanced mode: call as needed (default) │ │ │ -│ │ │ Conservative, // Conservative mode: minimize LLM calls │ │ │ -│ │ │ AlgorithmOnly,// Pure algorithm mode: no LLM calls │ │ │ -│ │ │ } │ │ │ -│ │ └─────────────────────────────────────────────────────────────┘ │ │ -│ │ │ │ -│ │ Parameter Adjustment: │ │ -│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ -│ │ │ // Dynamic adjustment based on historical performance │ │ │ -│ │ │ fn adjust_threshold(&mut self, performance: &PerformanceMetrics) {│ -│ │ │ // If LLM suggestion accuracy is high, lower intervention threshold│ -│ │ │ if performance.llm_accuracy > 0.8 { │ │ │ -│ │ │ self.fork_threshold = 2; │ │ │ -│ │ │ } │ │ │ -│ │ │ } │ │ │ -│ │ └─────────────────────────────────────────────────────────────┘ │ │ -│ │ │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ MetricsCollector - Metrics Collector │ │ -│ ├─────────────────────────────────────────────────────────────────────┤ │ -│ │ Responsibilities: │ │ -│ │ • Collect performance metrics │ │ -│ │ • Track LLM call details │ │ -│ │ • Calculate costs │ │ -│ │ • Support observability │ │ -│ │ │ │ -│ │ Metric Types: │ │ -│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ -│ │ │ PilotMetrics { │ │ │ -│ │ │ // Call statistics │ │ │ -│ │ │ total_calls: usize, │ │ │ -│ │ │ successful_calls: usize, │ │ │ -│ │ │ failed_calls: usize, │ │ │ -│ │ │ fallback_count: usize, │ │ │ -│ │ │ │ │ │ -│ │ │ // Token statistics │ │ │ -│ │ │ total_input_tokens: usize, │ │ │ -│ │ │ total_output_tokens: usize, │ │ │ -│ │ │ avg_tokens_per_call: f64, │ │ │ -│ │ │ │ │ │ -│ │ │ // Latency statistics │ │ │ -│ │ │ total_latency_ms: u64, │ │ │ -│ │ │ avg_latency_ms: f64, │ │ │ -│ │ │ p50_latency_ms: u64, │ │ │ -│ │ │ p99_latency_ms: u64, │ │ │ -│ │ │ │ │ │ -│ │ │ // Effectiveness statistics (requires feedback) │ │ │ -│ │ │ llm_decision_accuracy: Option, // LLM decision accuracy│ -│ │ │ retrieval_precision: Option, // Retrieval precision │ -│ │ │ } │ │ │ -│ │ └─────────────────────────────────────────────────────────────┘ │ │ -│ │ │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -### 5.2 Pilot and Algorithm Collaboration - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Pilot and Algorithm Collaboration │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ Responsibility Boundaries │ │ -│ ├─────────────────────────────────────────────────────────────────────┤ │ -│ │ │ │ -│ │ Pilot (Brain) Algorithm (Hands and Feet) │ │ -│ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │ -│ │ │ • Understand query intent│ │ • Execute tree traversal │ │ -│ │ │ • Analyze document structure│ │ • Efficient search path │ │ -│ │ │ • Semantic judgment │ │ • Calculate node scores │ │ -│ │ │ • Direction decision │ │ • Manage search state │ │ -│ │ │ • Ambiguity resolution│ │ • Return search results │ │ -│ │ └─────────────────────┘ └─────────────────────┘ │ │ -│ │ │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ Collaboration Process │ │ -│ ├─────────────────────────────────────────────────────────────────────┤ │ -│ │ │ │ -│ │ 1. Algorithm executes search │ │ -│ │ │ │ │ -│ │ ▼ │ │ -│ │ 2. Algorithm encounters decision point, asks Pilot │ │ -│ │ │ Pilot.should_intervene(state) │ │ -│ │ ▼ │ │ -│ │ 3a. Pilot returns false → Algorithm continues with its own scorer │ │ -│ │ │ │ │ -│ │ 3b. Pilot returns true → Pilot.decide(state) │ │ -│ │ │ │ │ │ -│ │ │ ▼ │ │ -│ │ │ Pilot returns decision → Algorithm fuses decision and continues search│ -│ │ │ │ │ -│ │ ▼ │ │ -│ │ 4. Repeat until search completes │ │ -│ │ │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - ---- - -## 6. Complete Pilot Call Flow - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Complete Pilot Call Flow │ -└─────────────────────────────────────────────────────────────────────────────┘ - -User Query: "How to configure max connections for PostgreSQL connection pool?" - │ - ▼ -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Step 1: QueryAnalyzer analyzes query │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ QueryAnalysis { │ -│ complexity: Medium, // Medium complexity │ -│ keywords: ["PostgreSQL", "connection pool", "max connections", "configure"],│ -│ intent: HowTo, // How-To type │ -│ needs_pilot: true, // Needs Pilot intervention │ -│ } │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Step 2: Pilot.guide_start() - Pre-search guidance │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ BudgetController: Check budget (pass) │ -│ │ -│ ContextBuilder: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ ToC View: │ │ -│ │ 1. Introduction │ │ -│ │ 2. Installation │ │ -│ │ 3. Configuration │ │ -│ │ 3.1 Basic Config │ │ -│ │ 3.2 Database Config │ │ -│ │ 3.3 Advanced Config │ │ -│ │ 4. API Reference │ │ -│ │ ... │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ PromptBuilder: Build START scenario prompt │ -│ │ -│ LLM Response: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ { │ │ -│ │ "entry_points": ["Configuration", "Database Config"], │ │ -│ │ "reasoning": "Query about database connection pool configuration, should start from Configuration chapter", │ -│ │ "confidence": 0.9 │ │ -│ │ } │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ MetricsCollector: Record (input: 150, output: 50, latency: 230ms) │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Step 3: BeamSearch starts search │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ Iteration 1: Root → [Introduction, Installation, Configuration, API, ...] │ -│ │ -│ Algorithm scoring: │ -│ "Configuration" -> 0.75 (keyword match) │ -│ "API" -> 0.35 │ -│ "Installation" -> 0.10 │ -│ │ -│ Pilot.should_intervene(): │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ candidates.len() (6) > fork_threshold (3) → true │ │ -│ │ → Intervention needed │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ Pilot.decide(): │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ LLM Analysis: │ │ -│ │ "Query clearly points to configuration-related content, 'Configuration' chapter most relevant" │ -│ │ │ │ -│ │ ranked_candidates: [ │ │ -│ │ ("Configuration", 0.95, "Explicitly mentions configuration"), │ │ -│ │ ("API", 0.40, "May have relevant API"), │ │ -│ │ ] │ │ -│ │ confidence: 0.9 │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ Fusion scoring: │ -│ "Configuration" -> 0.75*0.4 + 0.95*0.6*0.9 = 0.84 │ -│ │ -│ Choice: Deep dive into "Configuration" node │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Step 4: Continue search - Iteration 2 │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ Current position: Root → Configuration │ -│ Candidates: [Basic Config, Database Config, Advanced Config, Performance Tuning] │ -│ │ -│ Algorithm scoring: │ -│ "Database Config" -> 0.92 (strong match!) │ -│ "Advanced Config" -> 0.45 │ -│ │ -│ Pilot.should_intervene(): │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ best_score (0.92) > low_score_threshold (0.3) → OK │ │ -│ │ score_gap (0.47) > threshold (0.15) → OK │ │ -│ │ → No intervention needed, algorithm is confident │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -│ Use algorithm score directly, choose "Database Config" │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Step 5: Continue search - Iteration 3 │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ Current position: Root → Configuration → Database Config │ -│ Candidates: [Connection String, Connection Pool, Timeout Settings, SSL Config] │ -│ │ -│ Algorithm scoring: │ -│ "Connection Pool" -> 0.98 (perfect match!) │ -│ │ -│ → Target found, search ends │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Step 6: Return result │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ SearchResult { │ -│ path: [Root → Configuration → Database Config → Connection Pool], │ -│ nodes_visited: 8, │ -│ } │ -│ │ -│ PilotMetrics { │ -│ llm_calls: 2, │ -│ total_tokens: 380, │ -│ avg_latency: 185ms, │ -│ estimated_cost: $0.0012, │ -│ } │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -## 7. Code Structure - -``` -src/retrieval/ -├── mod.rs -├── pilot/ # Pilot module -│ ├── mod.rs # Module entry -│ ├── trait.rs # Pilot trait definition -│ ├── config.rs # Configuration types (PilotConfig, BudgetConfig, InterventionConfig) -│ ├── decision.rs # Decision types (PilotDecision, SearchDirection) -│ ├── analyzer.rs # QueryAnalyzer -│ ├── builder.rs # ContextBuilder -│ ├── engine.rs # DecisionEngine -│ ├── parser.rs # ResponseParser -│ ├── policy.rs # PolicyManager -│ ├── budget.rs # BudgetController -│ ├── fallback.rs # FallbackManager -│ ├── metrics.rs # MetricsCollector -│ ├── llm_pilot.rs # LlmPilot implementation -│ ├── noop_pilot.rs # NoopPilot implementation (empty impl, for pure algorithm mode) -│ └── prompts/ # Prompt templates -│ ├── mod.rs -│ ├── start.rs # START scenario template -│ ├── fork.rs # FORK scenario template -│ ├── backtrack.rs # BACKTRACK scenario template -│ └── evaluate.rs # EVALUATE scenario template -├── search/ -│ ├── mod.rs -│ ├── trait.rs # SearchTree trait (modified: add pilot parameter) -│ ├── scorer.rs # NodeScorer (existing) -│ ├── beam.rs # BeamSearch (modified: integrate Pilot) -│ ├── greedy.rs # GreedySearch (modified: integrate Pilot) -│ └── mcts.rs # MctsSearch (modified: integrate Pilot) -├── stages/ -│ ├── search.rs # SearchStage (modified: inject Pilot) -│ └── ... -└── ... -``` - ---- - -## 8. Configuration Examples - -```rust -// Default configuration -let config = PilotConfig { - mode: PilotMode::Balanced, - budget: BudgetConfig::default(), - intervention: InterventionConfig::default(), - guide_at_start: true, - guide_at_backtrack: true, - prompt_template_path: None, -}; - -// High-quality mode (more LLM calls) -let high_quality_config = PilotConfig { - mode: PilotMode::Aggressive, - budget: BudgetConfig { - max_tokens_per_query: 5000, - max_tokens_per_call: 1000, - max_calls_per_query: 10, - max_calls_per_level: 3, - hard_limit: false, - }, - intervention: InterventionConfig { - fork_threshold: 2, - score_gap_threshold: 0.2, - low_score_threshold: 0.4, - max_interventions_per_level: 3, - }, - guide_at_start: true, - guide_at_backtrack: true, - prompt_template_path: None, -}; - -// Low-cost mode (minimum LLM calls) -let low_cost_config = PilotConfig { - mode: PilotMode::Conservative, - budget: BudgetConfig { - max_tokens_per_query: 500, - max_tokens_per_call: 200, - max_calls_per_query: 2, - max_calls_per_level: 1, - hard_limit: true, - }, - intervention: InterventionConfig { - fork_threshold: 5, - score_gap_threshold: 0.1, - low_score_threshold: 0.2, - max_interventions_per_level: 1, - }, - guide_at_start: false, - guide_at_backtrack: true, - prompt_template_path: None, -}; - -// Pure algorithm mode (no LLM calls) -let algorithm_only_config = PilotConfig { - mode: PilotMode::AlgorithmOnly, - ..Default::default() -}; -``` - ---- - -## 9. Usage Example - -```rust -use vectorless::retrieval::pilot::{LlmPilot, PilotConfig, PilotMode}; -use vectorless::retrieval::search::BeamSearch; -use vectorless::llm::LlmClient; - -// Create Pilot -let llm_client = LlmClient::from_env()?; -let pilot = LlmPilot::new(llm_client, PilotConfig::default()); - -// Create search engine (inject Pilot) -let search = BeamSearch::new().with_pilot(pilot); - -// Execute search -let result = search.search(&tree, &context, &config).await?; - -// View metrics -println!("LLM calls: {}", result.metrics.llm_calls); -println!("Tokens used: {}", result.metrics.tokens_used); -println!("Avg latency: {}ms", result.metrics.avg_latency_ms); -``` \ No newline at end of file diff --git a/docs/design/positioning.svg b/docs/design/positioning.svg new file mode 100644 index 00000000..8087fb27 --- /dev/null +++ b/docs/design/positioning.svg @@ -0,0 +1,71 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Documents + PDF · Markdown · DOCX · HTML + + + AI + GPT · Claude · Gemini · ... + + + Vectorless + Reasoning-native Engine + + + + + Prompting + + + + Index + Parse → Tree + + + + Reason + LLM Traverse + + + + + Retrieve + with reasoning chain + + + Index documents, reason with AI, retrieve with full context + diff --git a/docs/design/v3(legacy).md b/docs/design/v3(legacy).md deleted file mode 100644 index 11bf4f59..00000000 --- a/docs/design/v3(legacy).md +++ /dev/null @@ -1,453 +0,0 @@ -# V3 Design: LLM Navigator + Algorithm Collaborative Retrieval - -## 🏗️ Architecture Design: LLM + Algorithm Collaborative Retriever Pipeline - -### Core Design Principles - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ Design Philosophy │ -├─────────────────────────────────────────────────────────────────┤ -│ 1. Algorithm handles "how to go" - efficient, deterministic, │ -│ low latency │ -│ 2. LLM handles "where to go" - semantic understanding, │ -│ ambiguity resolution, direction judgment │ -│ 3. Intervene at key decision points - not every step asks LLM, │ -│ only when needed │ -│ 4. Layered fallback - algorithm takes over when LLM fails, │ -│ LLM rescues when algorithm fails │ -└─────────────────────────────────────────────────────────────────┘ -``` - -### Overall Architecture - -``` -┌─────────────────────────────────────────────────────────────────────────┐ -│ Index Pipeline (Unchanged) │ -│ Parse → Build → Enhance → Enrich(LLM) → Optimize │ -└─────────────────────────────────────────────────────────────────────────┘ - │ - ▼ - ┌─────────────────┐ - │ DocumentTree │ - │ + NodeSummary │ - └─────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────────────┐ -│ Retrieval Pipeline (Enhanced) │ -│ │ -│ ┌─────────┐ ┌─────────┐ ┌─────────────────────┐ ┌─────────┐ │ -│ │ Analyze │───▶│ Plan │───▶│ Search │───▶│ Judge │ │ -│ │ (LLM?) │ │ (LLM?) │ │ ┌───────────────┐ │ │ (LLM) │ │ -│ └─────────┘ └─────────┘ │ │ Navigator │ │ └─────────┘ │ -│ │ │ │ │ ┌───────────┐ │ │ │ │ -│ │ │ │ │ │ LLM + │ │ │ │ │ -│ ▼ ▼ │ │ │ Algorithm │ │ │ ▼ │ -│ ┌─────────────────────────┐ │ │ └───────────┘ │ │ ┌───────────┐ │ -│ │ LLM Navigator │◀──┼──┤ │ │ │ NeedMore │ │ -│ │ (Key Decision Points) │ │ │ Search Alg │ │ │ ◀───────│ │ -│ └─────────────────────────┘ │ │ (Greedy/Beam)│ │ └───────────┘ │ -│ │ │ └───────────────┘ │ │ │ -│ └──────────────────┴─────────────────────┘ │ │ -│ ▼ │ -│ ┌───────────┐ │ -│ │ Backtrack │───┘ -│ └───────────┘ -└─────────────────────────────────────────────────────────────────────────┘ -``` - ---- - -## 🧭 LLM Navigator Design - -### Navigator Responsibilities - -Navigator doesn't replace the Search algorithm, but **provides semantic judgment at key decision points**: - -``` -┌────────────────────────────────────────────────────────────┐ -│ LLM Navigator Responsibilities │ -├────────────────┬───────────────────────────────────────────┤ -│ Timing │ LLM Task │ -├────────────────┼───────────────────────────────────────────┤ -│ Before search │ Understand query, determine search │ -│ starts │ starting point and priority directions │ -├────────────────┼───────────────────────────────────────────┤ -│ At fork/branch │ When multiple candidate paths exist, │ -│ points │ judge which is more relevant │ -├────────────────┼───────────────────────────────────────────┤ -│ When lost │ When algorithm is stuck in low-score │ -│ │ paths, provide correction suggestions │ -├────────────────┼───────────────────────────────────────────┤ -│ When uncertain │ When algorithm scores are close, │ -│ │ make semantic judgments │ -├────────────────┼───────────────────────────────────────────┤ -│ When │ Analyze failure reasons, suggest new │ -│ backtracking │ search directions │ -└────────────────┴───────────────────────────────────────────┘ -``` - -### Navigator Interface Design - -```rust -/// LLM Navigator - Provides semantic navigation at key decision points -pub struct LlmNavigator { - client: LlmClient, - config: NavigatorConfig, -} - -/// Navigator Configuration -pub struct NavigatorConfig { - /// Whether to intervene before search starts - pub guide_at_start: bool, - /// Whether to intervene at fork points (when candidates > threshold) - pub guide_at_fork: bool, - /// Fork point threshold - pub fork_threshold: usize, - /// Whether to intervene during backtracking - pub guide_at_backtrack: bool, - /// Low score threshold (request LLM intervention when below this value) - pub low_score_threshold: f32, - /// Maximum LLM calls (cost control) - pub max_llm_calls: usize, -} - -/// Navigation Guidance -pub struct NavigationGuidance { - /// Recommended node order (sorted by relevance) - pub preferred_order: Vec, - /// Recommended search direction - pub direction: SearchDirection, - /// LLM's reasoning process (explainability) - pub reasoning: String, - /// Confidence level - pub confidence: f32, -} - -pub enum SearchDirection { - /// Go deeper into current branch - GoDeeper, - /// Explore sibling nodes - ExploreSiblings, - /// Backtrack to parent node - Backtrack, - /// Jump to a specific node - JumpTo(NodeId), - /// Current path is the answer - ThisIsIt, -} - -impl LlmNavigator { - /// Before search starts: Understand query, determine starting point - pub async fn guide_start( - &self, - tree: &DocumentTree, - query: &str, - ) -> Result; - - /// At fork point: Choose the best branch - pub async fn guide_fork( - &self, - tree: &DocumentTree, - current_path: &[NodeId], - candidates: &[NodeId], - query: &str, - ) -> Result; - - /// During backtracking: Analyze failure, suggest new direction - pub async fn guide_backtrack( - &self, - tree: &DocumentTree, - failed_path: &[NodeId], - visited: &HashSet, - query: &str, - ) -> Result; -} -``` - ---- - -## 🔄 Search Stage Integration Plan - -### New Search Architecture - -```rust -/// Enhanced Search Stage - Algorithm + LLM Collaboration -pub struct SearchStage { - /// Search algorithm - algorithm: SearchAlgorithm, - /// LLM Navigator (optional) - navigator: Option>, - /// Configuration - config: SearchConfig, -} - -/// Collaborative Searcher -pub struct CollaborativeSearch { - /// Underlying search algorithm - algorithm: Box, - /// LLM Navigator - navigator: LlmNavigator, - /// Call statistics - stats: SearchStats, -} - -impl CollaborativeSearch { - pub async fn search(&mut self, tree: &DocumentTree, ctx: &RetrievalContext) -> SearchResult { - let mut result = SearchResult::default(); - let mut state = SearchState::new(tree.root()); - - // 1. Before starting: LLM guides starting point - if self.navigator.config.guide_at_start { - let guidance = self.navigator.guide_start(tree, &ctx.query).await?; - state.apply_guidance(guidance); - } - - // 2. Search loop - while !state.is_complete() { - // 2.1 Algorithm selects candidates - let candidates = self.algorithm.select_candidates(tree, &state); - - // 2.2 Determine if LLM consultation is needed - if self.should_consult_llm(&candidates, &state) { - let guidance = self.navigator.guide_fork( - tree, - &state.path, - &candidates, - &ctx.query - ).await?; - - // 2.3 Re-rank candidates using LLM suggestions - state.candidates = self.merge_algorithm_and_llm( - candidates, - guidance - ); - } - - // 2.4 Algorithm executes next step - self.algorithm.step(tree, &mut state); - - // 2.5 Check if backtracking is needed - if state.needs_backtrack() { - if self.navigator.config.guide_at_backtrack { - let guidance = self.navigator.guide_backtrack( - tree, - &state.path, - &state.visited, - &ctx.query - ).await?; - state.apply_backtrack_guidance(guidance); - } else { - state.backtrack(); - } - } - - self.stats.iterations += 1; - } - - result - } - - /// Determine whether to consult LLM - fn should_consult_llm(&self, candidates: &[NodeId], state: &SearchState) -> bool { - // Condition 1: Candidate count exceeds threshold (fork point) - if candidates.len() > self.navigator.config.fork_threshold { - return true; - } - - // Condition 2: Candidate scores are close (algorithm cannot distinguish) - if self.scores_are_close(candidates) { - return true; - } - - // Condition 3: Current score is too low (might be wrong direction) - if state.best_score < self.navigator.config.low_score_threshold { - return true; - } - - // Condition 4: Haven't exceeded LLM call limit - self.stats.llm_calls < self.navigator.config.max_llm_calls - } -} -``` - ---- - -## 📊 LLM Intervention Points in Pipeline Stages - -``` -┌─────────────────────────────────────────────────────────────────────────┐ -│ Retrieval Pipeline │ -├─────────────────────────────────────────────────────────────────────────┤ -│ │ -│ Analyze Stage │ -│ ┌─────────────────────────────────────────────────────────────────┐ │ -│ │ [Algorithm] Keyword extraction, complexity estimation │ │ -│ │ [LLM] Optional: Deep semantic analysis, intent detection │ │ -│ └─────────────────────────────────────────────────────────────────┘ │ -│ │ │ -│ ▼ │ -│ Plan Stage │ -│ ┌─────────────────────────────────────────────────────────────────┐ │ -│ │ [Algorithm] Select strategy based on complexity │ │ -│ │ (keyword/llm/semantic) │ │ -│ │ [LLM] Optional: Strategy recommendation for complex │ │ -│ │ queries │ │ -│ └─────────────────────────────────────────────────────────────────┘ │ -│ │ │ -│ ▼ │ -│ Search Stage ◀━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │ -│ ┌─────────────────────────────────────────────────────────────────┐ │ -│ │ │ │ -│ │ ┌─────────────┐ ┌─────────────────────────────────────┐ │ │ -│ │ │ Algorithm │────▶│ LLM Navigator │ │ │ -│ │ │ (Primary) │ │ ┌─────────────────────────────┐ │ │ │ -│ │ │ │ │ │ guide_start() Start guide │ │ │ │ -│ │ │ - Greedy │◀───▶│ │ guide_fork() Fork choice │ │ │ │ -│ │ │ - Beam │ │ │ guide_backtrack()Backtrack │ │ │ │ -│ │ │ - MCTS │ │ └─────────────────────────────┘ │ │ │ -│ │ │ │ │ │ │ │ -│ │ └─────────────┘ └─────────────────────────────────────┘ │ │ -│ │ │ │ -│ └─────────────────────────────────────────────────────────────────┘ │ -│ │ │ -│ ▼ │ -│ Judge Stage │ -│ ┌─────────────────────────────────────────────────────────────────┐ │ -│ │ [Algorithm] Token count check, threshold judgment │ │ -│ │ [LLM] Content sufficiency judgment, answer completeness │ │ -│ │ evaluation │ │ -│ └─────────────────────────────────────────────────────────────────┘ │ -│ │ │ -│ ▼ │ -│ ┌───────────────┐ │ -│ │ Sufficient? │─── No ──▶ Backtrack ──┐ │ -│ └───────────────┘ │ │ -│ │ Yes │ │ -│ ▼ │ │ -│ ┌───────────────┐ │ │ -│ │ Result │◀───────────────────────┘ │ -│ └───────────────┘ │ -└─────────────────────────────────────────────────────────────────────────┘ -``` - ---- - -## 🎯 Implementation Plan - -### Phase 1: Basic Integration (1-2 weeks) - -```rust -// 1. Define Navigator trait and basic implementation -pub trait Navigator: Send + Sync { - async fn guide_fork(&self, ctx: &NavigationContext) -> NavigationGuidance; -} - -// 2. Integrate into SearchStage -pub struct SearchStage { - algorithm: SearchAlgorithm, - navigator: Option>, // New -} - -// 3. Modify search loop to call navigator at fork points -``` - -### Phase 2: Enhanced Capabilities (2-3 weeks) - -```rust -// 1. Implement complete LlmNavigator -// 2. Add guide_start, guide_backtrack -// 3. Implement intelligent intervention judgment logic -// 4. Add caching (same query + same context → cached result) -``` - -### Phase 3: Optimization and Monitoring (1-2 weeks) - -```rust -// 1. Add A/B testing capability (pure algorithm vs algorithm+LLM) -// 2. Add cost control (max_llm_calls, budget) -// 3. Add effectiveness monitoring (retrieval accuracy, latency, cost) -// 4. Adaptive intervention (dynamically adjust intervention frequency -// based on historical effectiveness) -``` - ---- - -## 📁 Suggested Code Structure - -``` -src/retrieval/ -├── mod.rs -├── pipeline/ -│ ├── mod.rs -│ ├── stage.rs -│ ├── orchestrator.rs -│ └── context.rs -├── stages/ -│ ├── analyze.rs -│ ├── plan.rs -│ ├── search.rs # Integrate Navigator -│ └── judge.rs -├── search/ -│ ├── mod.rs -│ ├── trait.rs -│ ├── greedy.rs -│ ├── beam.rs -│ └── mcts.rs -├── navigator/ # New module -│ ├── mod.rs -│ ├── trait.rs # Navigator trait -│ ├── llm_navigator.rs # LLM implementation -│ ├── noop_navigator.rs # No-op implementation -│ ├── guidance.rs # NavigationGuidance types -│ └── config.rs # NavigatorConfig -├── strategy/ -│ ├── mod.rs -│ ├── keyword.rs -│ ├── llm.rs -│ └── semantic.rs -``` - ---- - -## 🤔 Key Questions - -### Q1: Difference between Navigator and Strategy? - -| | Strategy | Navigator | -|--------------------|-----------------------------|--------------------------------| -| Granularity | Single node evaluation | Global navigation suggestion | -| Input | Single node information | Path + candidates + context | -| Output | Score (0-1) | Direction + ranking + reasoning| -| Call frequency | Every candidate node | Key decision points | - -### Q2: How to control LLM call costs? - -```rust -pub struct CostControl { - /// Maximum LLM calls per retrieval - max_calls_per_query: usize, - /// Daily budget - daily_budget: Option, - /// Only call when confidence is low - min_uncertainty: f32, -} -``` - -### Q3: How to evaluate effectiveness? - -```rust -pub struct RetrievalMetrics { - /// Retrieval precision - pub precision: f32, - /// Retrieval recall - pub recall: f32, - /// LLM call count - pub llm_calls: usize, - /// Total latency - pub latency_ms: u64, - /// Cost - pub cost: Money, -} -``` diff --git a/docs/guides/README.md b/docs/guides/README.md deleted file mode 100644 index aee856ae..00000000 --- a/docs/guides/README.md +++ /dev/null @@ -1,3 +0,0 @@ -# Vectorless Guides - -Practical guides for using Vectorless effectively. diff --git a/docs/guides/dual-pipeline.md b/docs/guides/dual-pipeline.md deleted file mode 100644 index d16ef1a5..00000000 --- a/docs/guides/dual-pipeline.md +++ /dev/null @@ -1,152 +0,0 @@ -# Understanding the Dual Pipeline - -Vectorless uses a **dual pipeline architecture** that separates document processing from retrieval. This design enables efficient indexing and intelligent retrieval. - -## Architecture Overview - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Vectorless Architecture │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ ┌─────────────────────────────┐ ┌─────────────────────────────┐ │ -│ │ INDEX PIPELINE │ │ RETRIEVAL PIPELINE │ │ -│ │ │ │ │ │ -│ │ Parse → Build → Enrich │ │ Analyze → Plan → Search │ │ -│ │ ↓ ↓ ↓ │ │ ↓ ↓ ↓ │ │ -│ │ Enhance → Optimize → │ │ Evaluate (Sufficiency) │ │ -│ │ Persist │ │ ↑_____________│ │ │ -│ │ │ │ │ (NeedMoreData)│ │ │ -│ └─────────────────────────────┘ └─────────────────────────────┘ │ -│ │ ▲ │ -│ └──────────── Workspace ─────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -## Index Pipeline - -The Index Pipeline processes documents and builds a searchable tree structure. - -### Stages - -| Stage | Purpose | -|-------|---------| -| **Parse** | Extract content from file (MD, PDF, DOCX, HTML) | -| **Build** | Construct hierarchical document tree | -| **Enrich** | Add metadata, TOC, references | -| **Enhance** | Generate summaries (optional) | -| **Optimize** | Prune, compress, optimize tree | -| **Persist** | Save to workspace storage | - -### Example - -```rust -// Index pipeline is triggered automatically -let doc_id = engine.index(IndexContext::from_path("./manual.md")).await?; - -// With summary generation -let doc_id = engine.index( - IndexContext::from_path("./manual.md") - .with_options(IndexOptions::new().with_summaries()) -).await?; -``` - -## Retrieval Pipeline - -The Retrieval Pipeline processes queries and retrieves relevant content. - -### Stages - -| Stage | Purpose | -|-------|---------| -| **Analyze** | Analyze query complexity, extract keywords | -| **Plan** | Select retrieval strategy and algorithm | -| **Search** | Navigate tree to find candidates | -| **Evaluate** | Check sufficiency, aggregate content | - -### The Evaluate Stage - -The Evaluate stage is crucial - it determines if retrieved content is sufficient: - -```text - ┌─────────────┐ - │ Search │ - └──────┬──────┘ - │ - ▼ - ┌─────────────┐ - │ Evaluate │ - └──────┬──────┘ - │ - ┌────────────┼────────────┐ - │ │ │ - ▼ ▼ ▼ - Sufficient PartialSufficient Insufficient - │ │ │ - ▼ ▼ ▼ - Return More Search Expand Beam - (1 iteration) (2 iterations) -``` - -### Retrieval Strategies - -```rust -// Three built-in strategies: - -// 1. Keyword - Fast, exact matching -// 2. LLM - Semantic understanding via Pilot -// 3. Structure - Hierarchy-aware navigation -``` - -## The Pilot System - -Pilot is the "brain" of the Retrieval Pipeline: - -- **Query Analysis**: Understands what the user is asking -- **Context Building**: Creates navigation context from TOC -- **Decision Making**: Decides which branches to explore -- **Fallback**: Algorithm takes over when LLM fails - -See [The Pilot System](./pilot-system.md) for details. - -## Data Flow - -``` -Document ──► Index Pipeline ──► Workspace - │ -Query ──► Retrieval Pipeline ──────────┘ - │ - ▼ - RetrievalResult - ├── content - ├── node_ids - ├── confidence - └── trace -``` - -## Session-Based Operations - -For multi-document operations, use sessions: - -```rust -// Create a session -let session = engine.session().await; - -// Index multiple documents -session.index(IndexContext::from_path("./doc1.md")).await?; -session.index(IndexContext::from_path("./doc2.md")).await?; - -// Query across all documents -let results = session.query_all("What is the architecture?").await?; - -for result in results { - println!("From {}: {}", result.doc_id, result.content); -} -``` - -## See Also - -- [Multi-Strategy Retrieval](./multi-strategy.md) -- [Content Aggregation](./content-aggregation.md) -- [Sufficiency Checking](./sufficiency.md) diff --git a/docs/guides/quick-start.md b/docs/guides/quick-start.md deleted file mode 100644 index 8f93ffe8..00000000 --- a/docs/guides/quick-start.md +++ /dev/null @@ -1,89 +0,0 @@ -# Quick Start Guide - -Get up and running with Vectorless in 5 minutes. - -## Prerequisites - -- Rust 1.70+ installed -- An OpenAI API key (or compatible LLM endpoint) - -## Installation - -Add to your `Cargo.toml`: - -```toml -[dependencies] -vectorless = "0.1" -tokio = { version = "1", features = ["full"] } -``` - -## Basic Usage - -```rust -use vectorless::{Engine, IndexContext}; - -#[tokio::main] -async fn main() -> Result<(), Box> { - // 1. Create an engine with OpenAI - let engine = Engine::builder() - .with_workspace("./workspace") - .with_openai(std::env::var("OPENAI_API_KEY")?) - .build() - .await?; - - // 2. Index a document - let doc_id = engine.index(IndexContext::from_path("./manual.md")).await?; - println!("Indexed: {}", doc_id); - - // 3. Query the document - let result = engine.query(&doc_id, "How do I configure authentication?").await?; - println!("Answer: {}", result.content); - - Ok(()) -} -``` - -## Index from Different Sources - -```rust -// From file path -let id1 = engine.index(IndexContext::from_path("./doc.pdf")).await?; - -// From string content -let html = "

Title

Content

"; -let id2 = engine.index( - IndexContext::from_content(html, vectorless::parser::DocumentFormat::Html) - .with_name("webpage") -).await?; - -// From bytes (e.g., from HTTP response) -let pdf_bytes = std::fs::read("./document.pdf")?; -let id3 = engine.index( - IndexContext::from_bytes(pdf_bytes, vectorless::parser::DocumentFormat::Pdf) -).await?; -``` - -## Index Modes - -```rust -use vectorless::IndexMode; - -// Default: Skip if already indexed -engine.index(IndexContext::from_path("./doc.md")).await?; - -// Force: Always re-index -engine.index( - IndexContext::from_path("./doc.md").with_mode(IndexMode::Force) -).await?; - -// Incremental: Only re-index if changed -engine.index( - IndexContext::from_path("./doc.md").with_mode(IndexMode::Incremental) -).await?; -``` - -## Next Steps - -- [Understanding the Dual Pipeline](./dual-pipeline.md) - Learn how Vectorless works -- [Indexing Documents](./indexing.md) - Deep dive into document indexing -- [Querying Documents](./querying.md) - Advanced query techniques diff --git a/docs/paper/vectorless(draft).md b/docs/paper/vectorless(draft).md deleted file mode 100644 index 5a9d2dfd..00000000 --- a/docs/paper/vectorless(draft).md +++ /dev/null @@ -1,88 +0,0 @@ -# Vectorless: Learning-Enhanced Reasoning-based Document Retrieval with Feedback-driven Adaptation - -**Abstract** - -Large Language Models (LLMs) have transformed document understanding and question answering, yet traditional vector-based Retrieval Augmented Generation (RAG) systems suffer from fundamental limitations: loss of document structure, semantic similarity ≠ relevance mismatches, and inability to learn from user feedback. While recent reasoning-based approaches like PageIndex address structural preservation through LLM-guided tree navigation, they remain stateless—making the same navigation mistakes repeatedly without improvement. - -We present **Vectorless**, a reasoning-based retrieval framework that introduces three key innovations: (1) **Feedback Learning**, a closed-loop system that learns from user corrections to improve navigation decisions over time; (2) **Hybrid Scoring**, combining algorithmic efficiency (BM25 + keyword overlap) with LLM reasoning for cost-effective accuracy; and (3) **Reference Following**, automatically traversing in-document cross-references like "see Appendix G" to gather complete context. Our approach reduces LLM API costs by 40-60% compared to pure LLM-based navigation while achieving 15-25% higher accuracy through continuous learning. Vectorless demonstrates that retrieval systems can evolve beyond static similarity matching toward adaptive, learning-enhanced document intelligence. - ---- - -## 1. Introduction - -The dominance of vector-based RAG systems has created an implicit assumption: semantic similarity is the primary signal for information retrieval. However, this assumption breaks down in domain-specific documents where: - -1. **Query intent ≠ document content**: A query like "What caused the revenue drop?" expresses intent, not content. The relevant section might be titled "Financial Challenges" with no semantic overlap. - -2. **Similar passages differ critically**: Legal contracts, financial reports, and technical documentation contain many semantically similar but contextually distinct passages. - -3. **Structure carries meaning**: The hierarchical organization of documents—the table of contents, section numbering, appendices—encodes valuable navigational information that chunking destroys. - -Recent reasoning-based approaches like PageIndex address these issues by using LLMs to navigate document structure directly. However, these systems share a critical limitation: **they are stateless**. Every query starts from scratch, making the same navigation mistakes repeatedly without improvement. - -### 1.1 Our Contribution - -Vectorless advances reasoning-based retrieval through three key innovations: - -| Innovation | Problem Addressed | Approach | -|------------|------------------|----------| -| **Feedback Learning** | Stateless navigation repeats mistakes | Closed-loop learning from user corrections | -| **Hybrid Scoring** | Pure LLM navigation is expensive | Algorithm (BM25) + LLM reasoning fusion | -| **Reference Following** | Cross-references break retrieval chains | Automatic reference resolution and traversal | - -Our key insight is that **document retrieval can be treated as a learning problem**, not just a search problem. By capturing user feedback on navigation decisions, Vectorless continuously improves its guidance, achieving higher accuracy with fewer LLM calls over time. - ---- - -## 2. Background and Motivation - -### 2.1 Limitations of Vector-based RAG - -Traditional vector-based RAG systems follow a simple pipeline: - -``` -Document → Chunk → Embed → Store in Vector DB -Query → Embed → Similarity Search → Return Top-K Chunks -``` - -This approach suffers from several well-documented issues: - -**Query-Knowledge Space Mismatch.** Vector retrieval assumes semantically similar text is relevant. However, queries express *intent*, not content. "What are the risks?" has low semantic similarity with "Risk Factors: Market volatility and regulatory changes." - -**Semantic Similarity ≠ Relevance.** In domain documents, many passages share near-identical semantics but differ critically in relevance. "Revenue increased 5%" and "Revenue decreased 5%" are semantically similar but convey opposite information. - -**Loss of Structure.** Chunking fragments logical document organization. A section titled "2.1 Revenue Analysis" with subsections "2.1.1 Domestic" and "2.1.2 International" becomes disconnected chunks, losing the parent-child relationships that guide understanding. - -### 2.2 Reasoning-based Retrieval: PageIndex - -PageIndex introduced reasoning-based retrieval, where LLMs navigate document structure directly: - -``` -Document → Tree Structure (ToC Index) -Query → LLM navigates tree → Extract relevant sections -``` - -This approach preserves structure and enables semantic navigation. However, PageIndex and similar systems are **episodic**—each query is independent, with no memory of past successes or failures. - -### 2.3 The Learning Gap - -Consider a retrieval system that repeatedly encounters queries about "revenue breakdown." Without learning: - -- Query 1: Navigates to "Financial Overview" → Wrong section → Backtracks → Finds "Revenue Analysis" -- Query 2: Same navigation mistake → Same backtrack → Same result -- Query 100: Still making the same mistake - -A learning-enhanced system would: - -- Query 1: Makes mistake, receives negative feedback -- Query 2: Recalls feedback, navigates directly to "Revenue Analysis" -- Query 100: Near-optimal navigation from accumulated experience - -This is the core innovation of Vectorless. - ---- - -## 3. System Architecture - -### 3.1 Overview - diff --git a/docs/rfcs/0001-docx-parser.md b/docs/rfcs/0001-docx-parser.md deleted file mode 100644 index 49bdecfd..00000000 --- a/docs/rfcs/0001-docx-parser.md +++ /dev/null @@ -1,383 +0,0 @@ -# DOCX Parser Implementation Plan - -**Status**: ✅ Implemented - -## Overview - -Add DOCX (Microsoft Word) document parsing support to Vectorless, enabling hierarchical tree-based retrieval for Word documents. - -## DOCX File Structure - -A DOCX file is a ZIP archive containing XML files: - -``` -document.docx -├── [Content_Types].xml # MIME type definitions -├── _rels/.rels # Package relationships -├── word/ -│ ├── document.xml # Main content (paragraphs, tables) -│ ├── styles.xml # Style definitions -│ ├── numbering.xml # List numbering (optional) -│ ├── core.xml # Metadata (title, author) -│ └── _rels/document.xml.rels -``` - -**Key file**: `word/document.xml` contains all paragraphs with style references. - -## Architecture - -### Module Structure - -``` -src/document/ -├── mod.rs # Export docx module -├── docx/ -│ ├── mod.rs # Module exports -│ ├── parser.rs # Main parser implementation -│ ├── styles.rs # Style resolution (heading detection) -│ └── types.rs # DOCX-specific types -``` - -### Dependencies - -Add to `Cargo.toml`: - -```toml -[dependencies] -zip = "2.2" # ZIP archive handling -roxmltree = "0.20" # Fast XML parsing (read-only) -``` - -## Implementation Details - -### 1. Types (`types.rs`) - -```rust -/// Parsed DOCX paragraph. -pub struct DocxParagraph { - /// Text content. - pub text: String, - /// Style ID (e.g., "Heading1", "Normal"). - pub style_id: Option, - /// Detected heading level (1-6), None for body text. - pub heading_level: Option, - /// List item info (if part of a list). - pub list_info: Option, -} - -/// List item information. -pub struct ListInfo { - /// Nesting level (0 = top level). - pub level: u8, - /// Whether it's an ordered list. - pub ordered: bool, -} - -/// Parsed style definition. -pub struct DocxStyle { - pub style_id: String, - pub name: String, - pub is_heading: bool, - pub heading_level: Option, -} -``` - -### 2. Style Resolution (`styles.rs`) - -Heading detection strategy (in priority order): - -1. **Built-in styles**: `Heading1` → `Heading6` (most common) -2. **Custom heading styles**: Match by name pattern `/heading\s*(\d)/i` -3. **Outline level**: Read `` from style definition -4. **Heuristics**: Bold + larger font + short text → potential heading - -```rust -pub struct StyleResolver { - /// Map from style_id to resolved style info. - styles: HashMap, -} - -impl StyleResolver { - /// Parse styles.xml and build resolver. - pub fn from_xml(styles_xml: &str) -> Self; - - /// Get heading level for a style ID. - pub fn get_heading_level(&self, style_id: &Option) -> Option; - - /// Check if style is a heading. - pub fn is_heading(&self, style_id: &Option) -> bool; -} -``` - -### 3. Parser (`parser.rs`) - -```rust -pub struct DocxParser; - -impl DocumentParser for DocxParser { - fn parse(&self, content: &[u8]) -> Result { - // 1. Parse ZIP archive - let archive = ZipArchive::new(Cursor::new(content))?; - - // 2. Read styles.xml (optional, may not exist) - let style_resolver = Self::parse_styles(&archive)?; - - // 3. Read document.xml - let document_xml = Self::read_file(&archive, "word/document.xml")?; - let root = roxmltree::Document::parse(&document_xml)?; - - // 4. Traverse paragraphs - let paragraphs = Self::parse_paragraphs(&root, &style_resolver)?; - - // 5. Convert to RawNodes - let raw_nodes = Self::build_raw_nodes(paragraphs)?; - - Ok(ParseResult { nodes: raw_nodes }) - } - - fn format(&self) -> DocumentFormat { - DocumentFormat::Docx - } -} -``` - -### 4. Parsing Flow - -``` -┌─────────────────────────────────────────────────────────────┐ -│ DOCX File (.docx) │ -└─────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ 1. Unzip │ -│ - word/document.xml │ -│ - word/styles.xml (optional) │ -└─────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ 2. Parse styles.xml │ -│ - Build StyleResolver │ -│ - Map style_id → heading_level │ -└─────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ 3. Parse document.xml │ -│ - Find all elements (paragraphs) │ -│ - Extract text from elements │ -│ - Get style from │ -│ - Resolve heading_level via StyleResolver │ -└─────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ 4. Build RawNodes │ -│ - Heading → new section (parent) │ -│ - Body text → append to current section │ -│ - Track heading hierarchy for nesting │ -└─────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ 5. Return ParseResult { nodes: Vec } │ -└─────────────────────────────────────────────────────────────┘ -``` - -### 5. XML Structure Reference - -**document.xml** structure: - -```xml - - - - - - - - Chapter Title - - - - - Body text content... - - - - -``` - -**styles.xml** structure: - -```xml - - - - - - - - -``` - -### 6. RawNode Building Strategy - -```rust -fn build_raw_nodes(paragraphs: Vec) -> Result> { - let mut nodes = Vec::new(); - let mut current_node: Option = None; - let mut heading_stack: Vec<(u8, RawNode)> = Vec::new(); // (level, node) - - for para in paragraphs { - if para.text.is_empty() { - continue; - } - - if let Some(level) = para.heading_level { - // Save previous node - if let Some(node) = current_node.take() { - nodes.push(node); - } - - // Handle heading hierarchy - // - Pop stack until we find parent level - // - Create new section node - let node = RawNode { - title: para.text.clone(), - content: String::new(), - children: Vec::new(), - }; - - heading_stack.retain(|(l, _)| *l < level); - heading_stack.push((level, node)); - } else { - // Body text - append to current section - if let Some(ref mut node) = current_node { - if !node.content.is_empty() { - node.content.push('\n'); - } - node.content.push_str(¶.text); - } - } - } - - // Don't forget the last node - if let Some(node) = current_node { - nodes.push(node); - } - - // TODO: Handle heading_stack to build proper hierarchy - - Ok(nodes) -} -``` - -### 7. Edge Cases - -| Case | Handling | -|------|----------| -| **No styles.xml** | Use heuristics (bold + font size) or treat all as body | -| **Empty paragraphs** | Skip | -| **Tables** | Extract as formatted text (for now) | -| **Images** | Ignore (no text content) | -| **Nested lists** | Track list level from numbering.xml | -| **Mixed content** | Handle runs with different formatting | - -### 8. Testing - -Create test fixtures: - -``` -tests/fixtures/ -├── simple.docx # Basic headings + paragraphs -├── nested.docx # H1 → H2 → H3 hierarchy -├── no_styles.docx # Document without styles.xml -├── tables.docx # Contains tables -└── lists.docx # Contains numbered/bulleted lists -``` - -Unit tests: - -```rust -#[test] -fn test_parse_simple_docx() { - let content = include_bytes!("../fixtures/simple.docx"); - let parser = DocxParser; - let result = parser.parse(content).unwrap(); - assert!(!result.nodes.is_empty()); -} - -#[test] -fn test_heading_detection() { - let resolver = StyleResolver::from_xml(STYLES_XML); - assert_eq!(resolver.get_heading_level(&Some("Heading1".into())), Some(1)); - assert_eq!(resolver.get_heading_level(&Some("Normal".into())), None); -} -``` - -## Integration - -### 1. Update `src/document/mod.rs` - -```rust -pub mod docx; -pub use docx::DocxParser; -``` - -### 2. Register in `ParserRegistry` - -```rust -registry.register(DocumentFormat::Docx, Box::new(DocxParser)); -``` - -### 3. Update `DocumentFormat` enum - -```rust -pub enum DocumentFormat { - Markdown, - Pdf, - Docx, // Add this -} -``` - -### 4. Update client API - -```rust -// Auto-detect format from file extension -pub fn detect_format(path: &Path) -> DocumentFormat { - match path.extension().and_then(|s| s.to_str()) { - Some("md") => DocumentFormat::Markdown, - Some("pdf") => DocumentFormat::Pdf, - Some("docx") => DocumentFormat::Docx, // Add this - _ => DocumentFormat::Markdown, - } -} -``` - -## Effort Estimate - -| Task | Time | -|------|------| -| Types & structures | 1 hour | -| Style resolution | 2 hours | -| Main parser | 3 hours | -| RawNode building | 2 hours | -| Edge cases | 2 hours | -| Testing | 2 hours | -| **Total** | **~12 hours (1.5 days)** | - -## Future Enhancements (Out of Scope) - -- [ ] Table parsing with structure preservation -- [ ] List nesting from numbering.xml -- [ ] Header/footer extraction -- [ ] Comments and annotations -- [ ] Tracked changes (revisions) -- [ ] Embedded objects - -## References - -- [ECMA-376: Office Open XML](https://www.ecma-international.org/publications-and-standards/standards/ecma-376/) -- [DOCX file format specification](https://docs.microsoft.com/en-us/openspecs/office_standards/ms-docx/) diff --git a/docs/rfcs/0002-html-parser.md b/docs/rfcs/0002-html-parser.md deleted file mode 100644 index f0651b7a..00000000 --- a/docs/rfcs/0002-html-parser.md +++ /dev/null @@ -1,351 +0,0 @@ -# RFC-0002: HTML Parser Implementation - -**Status**: Proposed - -## Summary - -Add HTML document parsing support to Vectorless, enabling hierarchical tree-based retrieval for web pages and HTML documents. - -## Motivation - -HTML is one of the most common document formats: -- Web scraping and content extraction -- Documentation websites -- Blog posts and articles -- Technical documentation - -Unlike Markdown/PDF/DOCX, HTML documents often contain: -- Navigation menus -- Sidebars -- Footers -- Advertisements -- Scripts and styles - -The challenge is extracting **meaningful content** while filtering noise. - -## HTML Structure Analysis - -### Content Structure - -```html - - - - Document Title - - - - - -
-
-

Main Title

-
-

Section 1

-

Content...

-
-
-

Section 2

-

Content...

-

Subsection

-

More content...

-
-
-
-
...
- - -``` - -### Heading Hierarchy - -HTML has explicit heading tags: -- `

` - `

` : Heading levels 1-6 -- `` : Document title -- `<figcaption>` : Figure captions (optional heading) - -### Semantic Elements (HTML5) - -| Element | Meaning | Use for TOC? | -|---------|---------|--------------| -| `<article>` | Self-contained content | Yes - content boundary | -| `<section>` | Thematic grouping | Yes - section boundary | -| `<main>` | Main content area | Yes - skip nav/sidebar | -| `<nav>` | Navigation links | No - skip | -| `<aside>` | Sidebar content | No - skip | -| `<header>` | Page header | No - skip | -| `<footer>` | Page footer | No - skip | - -## Proposed Solution - -### Module Structure - -``` -src/document/html/ -├── mod.rs # Module exports -├── parser.rs # Main parser implementation -├── extractor.rs # Content extraction (readability) -└── types.rs # HTML-specific types -``` - -### Dependencies - -```toml -# HTML parsing -scraper = "0.22" # HTML parsing (CSS selectors) -``` - -Alternative: `tl` (faster, no CSS selectors) or `html5ever` (spec-compliant) - -### Types - -```rust -/// HTML parser configuration. -pub struct HtmlConfig { - /// Skip navigation elements. - pub skip_nav: bool, - - /// Skip aside/sidebar elements. - pub skip_aside: bool, - - /// Skip footer elements. - pub skip_footer: bool, - - /// Extract main content only (using readability algorithm). - pub extract_main_content: bool, - - /// Maximum heading level to parse (1-6). - pub max_heading_level: usize, -} - -/// Parsed HTML element. -pub struct HtmlElement { - /// Text content. - pub text: String, - /// Tag name (h1-h6, p, etc.). - pub tag: String, - /// Heading level (1-6), if applicable. - pub heading_level: Option<u8>, -} -``` - -### Parser Flow - -``` -┌─────────────────────────────────────────────────────────────┐ -│ HTML File (.html) │ -└─────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ 1. Parse HTML │ -│ - Use scraper to build DOM tree │ -│ - Handle malformed HTML gracefully │ -└─────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ 2. Extract Main Content (optional) │ -│ - Find <main> or <article> element │ -│ - Skip <nav>, <aside>, <footer> │ -│ - Or use readability algorithm for complex pages │ -└─────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ 3. Extract Heading Structure │ -│ - Find all <h1>-<h6> elements │ -│ - Build heading hierarchy │ -│ - Extract text content between headings │ -└─────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ 4. Build RawNodes │ -│ - Heading → new section (parent) │ -│ - Body text → append to current section │ -│ - Track heading hierarchy for nesting │ -└─────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ 5. Return ParseResult { nodes: Vec<RawNode> } │ -└─────────────────────────────────────────────────────────────┘ -``` - -### Content Extraction Strategy - -**Level 1: Semantic HTML5** (Simple, Fast) - -```rust -fn extract_main_content(&self, doc: &Html) -> ElementRef { - // Priority: <main> > <article> > <body> - if let Some(main) = doc.select(&selector("main")).next() { - return main; - } - if let Some(article) = doc.select(&selector("article")).next() { - return article; - } - doc.select(&selector("body")).next().unwrap() -} -``` - -**Level 2: Skip Known Noise** (Medium) - -```rust -const SKIP_TAGS: &[&str] = &["nav", "aside", "footer", "script", "style", "noscript"]; - -fn should_skip(&self, elem: &ElementRef) -> bool { - SKIP_TAGS.contains(&elem.value().name()) -} -``` - -**Level 3: Readability Algorithm** (Advanced, Optional) - -For complex web pages without semantic structure, implement a simplified readability: -- Calculate text density -- Find largest text block -- Remove low-density regions - -This is more complex and can be added later as enhancement. - -### Implementation Details - -```rust -pub struct HtmlParser { - config: HtmlConfig, -} - -impl HtmlParser { - /// Parse HTML content and extract nodes. - fn extract_nodes(&self, html: &str) -> Vec<RawNode> { - let doc = Html::parse_document(html); - - // 1. Find main content area - let root = self.find_main_content(&doc); - - // 2. Extract all heading and text elements - let elements = self.extract_elements(&root); - - // 3. Build nodes from elements - self.build_raw_nodes(elements) - } - - /// Find the main content area. - fn find_main_content<'a>(&self, doc: &'a Html) -> ElementRef<'a> { - // Try <main> first - if let Some(main) = doc.select(&selector("main")).next() { - return main; - } - - // Try <article> - if let Some(article) = doc.select(&selector("article")).next() { - return article; - } - - // Fallback to <body> - doc.select(&selector("body")) - .next() - .expect("HTML must have body") - } - - /// Extract elements from the content area. - fn extract_elements(&self, root: &ElementRef) -> Vec<HtmlElement> { - let mut elements = Vec::new(); - - for node in root.descendants() { - if let Some(elem) = node.value().as_element() { - let tag = elem.name(); - - // Check if it's a heading - if let Some(level) = self.get_heading_level(tag) { - let text = node.text().collect::<String>(); - if !text.trim().is_empty() { - elements.push(HtmlElement { - text: text.trim().to_string(), - tag: tag.to_string(), - heading_level: Some(level), - }); - } - } - } - } - - elements - } - - /// Get heading level from tag name. - fn get_heading_level(&self, tag: &str) -> Option<u8> { - match tag { - "h1" => Some(1), - "h2" => Some(2), - "h3" => Some(3), - "h4" => Some(4), - "h5" => Some(5), - "h6" => Some(6), - _ => None, - } - } -} -``` - -### Edge Cases - -| Case | Handling | -|------|----------| -| **Malformed HTML** | scraper handles gracefully | -| **No headings** | Create single node with all text | -| **No semantic elements** | Use entire body | -| **Nested articles** | Use first/deepest article | -| **Multiple h1 tags** | Treat each as level 1 heading | -| **Scripts/styles** | Skip by default | -| **Tables** | Extract text, ignore structure (for now) | -| **Images** | Extract alt text only | - -### Testing Strategy - -Create test fixtures: - -``` -tests/fixtures/ -├── simple.html # Basic h1-h6 structure -├── semantic.html # With <main>, <article>, <section> -├── noisy.html # With nav, aside, footer -├── no_headings.html # Just paragraphs -└── malformed.html # Broken HTML -``` - -## Effort Estimate - -| Task | Time | -|------|------| -| Types & configuration | 1 hour | -| Main parser | 2 hours | -| Content extraction | 2 hours | -| Edge cases | 1 hour | -| Testing | 2 hours | -| **Total** | **~8 hours (1 day)** | - -## Future Enhancements (Out of Scope) - -- [ ] Readability algorithm for content extraction -- [ ] Table structure preservation -- [ ] Code block detection (`<pre><code>`) -- [ ] Link extraction and following -- [ ] Meta description extraction -- [ ] Language detection - -## Comparison with Alternatives - -| Approach | Pros | Cons | -|----------|------|------| -| **scraper** (proposed) | CSS selectors, mature | Slower than tl | -| **tl** | Very fast | No CSS selectors | -| **html5ever** | Spec-compliant | More complex API | -| **readability-rs** | Smart extraction | External dependency | - -## References - -- [HTML5 Semantic Elements](https://developer.mozilla.org/en-US/docs/Glossary/Semantics#semantics_in_html) -- [scraper crate](https://docs.rs/scraper/) -- [Readability algorithm](https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/tabs/saveAsPDF) diff --git a/docs/rfcs/0003-evaluate-stage.md b/docs/rfcs/0003-evaluate-stage.md deleted file mode 100644 index 4c258793..00000000 --- a/docs/rfcs/0003-evaluate-stage.md +++ /dev/null @@ -1,52 +0,0 @@ -# RFC-0003: Evaluate Stage Naming - -## Summary - -Rename the `JudgeStage` to `EvaluateStage` to better reflect its purpose in the retrieval pipeline. - -## Motivation - -The term "judge" implies a binary verdict, while the stage actually: -1. Aggregates content from candidates -2. Evaluates sufficiency levels (Sufficient, Partial, Insufficient) -3. Can trigger additional search iterations -4. Builds the final response - -"Evaluate" better captures the nuanced assessment process. - -## Design - -### Changes - -| Before | After | -|--------|-------| -| `JudgeStage` | `EvaluateStage` | -| `judge.rs` | `evaluate.rs` | -| `judge_time_ms` | `evaluate_time_ms` | -| `"judge"` stage name | `"evaluate"` stage name | - -### Preserved Names - -The following are intentionally preserved: -- `LlmJudge` - The sufficiency checker that "judges" sufficiency -- `llm_judge` - Field name for the LLM-based sufficiency judge - -These remain as they specifically make a judgment call on sufficiency. - -## Pipeline Flow Update - -``` -Before: Analyze → Plan → Search → Judge -After: Analyze → Plan → Search → Evaluate -``` - -## Implementation - -1. Rename `src/retrieval/stages/judge.rs` to `evaluate.rs` -2. Update struct name from `JudgeStage` to `EvaluateStage` -3. Update all references in pipeline and retriever code -4. Update documentation and diagrams - -## Status - -**Implemented** - 2026-04-05 diff --git a/docs/rfcs/template.md b/docs/rfcs/template.md deleted file mode 100644 index 3fc3d9d7..00000000 --- a/docs/rfcs/template.md +++ /dev/null @@ -1,60 +0,0 @@ -# RFC-XXXX: Feature Title - -**Status**: Proposed | In Progress | Implemented | Rejected - -## Summary - -Brief description of the feature (2-3 sentences). - -## Motivation - -Why is this feature needed? What problem does it solve? - -## Proposed Solution - -### Overview - -High-level approach. - -### Implementation Details - -``` -src/ -├── module_a/ -│ └── new_file.rs -└── module_b/ -``` - -### API Design - -```rust -pub fn new_function() -> Result<()> { - // ... -} -``` - -### Dependencies - -- crate_name = "version" - -## Alternatives Considered - -What other approaches were considered and why were they rejected? - -## Testing Strategy - -- Unit tests -- Integration tests -- Test fixtures - -## Effort Estimate - -| Task | Time | -|------|------| -| ... | ... | -| **Total** | **X days** | - -## Open Questions - -- Question 1? -- Question 2? diff --git a/examples/rust/document_graph.rs b/examples/rust/document_graph.rs new file mode 100644 index 00000000..d765e3b5 --- /dev/null +++ b/examples/rust/document_graph.rs @@ -0,0 +1,290 @@ +// Copyright (c) 2026 vectorless developers +// SPDX-License-Identifier: Apache-2.0 + +//! Document Graph example. +//! +//! Demonstrates how to: +//! 1. Build a document graph from multiple documents +//! 2. Explore cross-document relationships (shared keywords, edges) +//! 3. Use graph-aware retrieval with different merge strategies +//! +//! # What is a Document Graph? +//! +//! A workspace-scoped weighted graph connecting documents by shared concepts. +//! Nodes = documents, Edges = relationships (shared keywords with weights). +//! +//! # Key outputs: +//! - Document nodes with top keywords +//! - Bidirectional edges with Jaccard similarity and shared keyword evidence +//! - Keyword inverted index for cross-document lookup +//! - Graph-boosted retrieval ranking +//! +//! # Usage +//! +//! ```bash +//! cargo run --example document_graph +//! ``` + +use std::collections::HashMap; + +use vectorless::document::{ + DocumentGraph, DocumentGraphConfig, DocumentGraphNode, WeightedKeyword, +}; +use vectorless::index::graph_builder::DocumentGraphBuilder; + +#[tokio::main] +async fn main() { + println!("=== Document Graph Example ===\n"); + + // ------------------------------------------------------- + // Part 1: Build the graph manually (low-level API) + // ------------------------------------------------------- + println!("--- Part 1: Build Graph Manually ---\n"); + demo_manual_graph(); + + // ------------------------------------------------------- + // Part 2: Build the graph with DocumentGraphBuilder + // ------------------------------------------------------- + println!("\n--- Part 2: Build Graph with Builder ---\n"); + let graph = demo_builder(); + + // ------------------------------------------------------- + // Part 3: Explore the graph + // ------------------------------------------------------- + println!("\n--- Part 3: Explore the Graph ---\n"); + demo_explore(&graph); + + // ------------------------------------------------------- + // Part 4: Keyword-based document lookup + // ------------------------------------------------------- + println!("\n--- Part 4: Keyword Lookup ---\n"); + demo_keyword_lookup(&graph); + + // ------------------------------------------------------- + // Part 5: Show graph-boosted retrieval concept + // ------------------------------------------------------- + println!("\n--- Part 5: Graph-Boosted Retrieval ---\n"); + demo_graph_boosted_retrieval(&graph); + + println!("\n=== Done ==="); +} + +/// Manually build a small graph to show the data model. +fn demo_manual_graph() { + let mut graph = DocumentGraph::new(); + + // Add document nodes + graph.add_node(DocumentGraphNode { + doc_id: "rust-book".to_string(), + title: "The Rust Programming Language".to_string(), + format: "md".to_string(), + top_keywords: vec![ + WeightedKeyword { keyword: "ownership".to_string(), weight: 0.95 }, + WeightedKeyword { keyword: "borrowing".to_string(), weight: 0.90 }, + WeightedKeyword { keyword: "lifetimes".to_string(), weight: 0.80 }, + WeightedKeyword { keyword: "traits".to_string(), weight: 0.70 }, + ], + node_count: 42, + }); + + graph.add_node(DocumentGraphNode { + doc_id: "rust-async".to_string(), + title: "Async Programming in Rust".to_string(), + format: "md".to_string(), + top_keywords: vec![ + WeightedKeyword { keyword: "async".to_string(), weight: 0.95 }, + WeightedKeyword { keyword: "tokio".to_string(), weight: 0.85 }, + WeightedKeyword { keyword: "lifetimes".to_string(), weight: 0.60 }, + WeightedKeyword { keyword: "traits".to_string(), weight: 0.50 }, + ], + node_count: 28, + }); + + println!("Nodes: {}", graph.node_count()); + for doc_id in graph.doc_ids() { + let node = graph.get_node(doc_id).unwrap(); + println!(" {} ({}): {} keywords, {} nodes", + node.doc_id, node.title, node.top_keywords.len(), node.node_count); + } +} + +/// Build a graph from multiple documents using DocumentGraphBuilder. +fn demo_builder() -> DocumentGraph { + let config = DocumentGraphConfig { + enabled: true, + min_keyword_jaccard: 0.05, + min_shared_keywords: 2, + max_keywords_per_doc: 50, + max_edges_per_node: 20, + retrieval_boost_factor: 0.15, + }; + + let mut builder = DocumentGraphBuilder::new(config); + + // Document 1: Rust Language Guide + builder.add_document( + "rust-guide", + "Rust Language Guide", + "md", + 35, + keywords(&[ + ("ownership", 0.95), ("borrowing", 0.90), ("lifetimes", 0.85), + ("traits", 0.80), ("generics", 0.75), ("error-handling", 0.70), + ("pattern-matching", 0.65), ("closures", 0.60), + ]), + ); + + // Document 2: Async Rust (overlaps on lifetimes, traits, closures) + builder.add_document( + "async-guide", + "Async Rust Guide", + "md", + 28, + keywords(&[ + ("async", 0.95), ("tokio", 0.90), ("futures", 0.85), + ("lifetimes", 0.60), ("traits", 0.55), ("closures", 0.50), + ("pinning", 0.80), ("waker", 0.75), + ]), + ); + + // Document 3: Rust Testing (overlaps on traits, closures, error-handling) + builder.add_document( + "testing-guide", + "Rust Testing Guide", + "md", + 22, + keywords(&[ + ("testing", 0.95), ("assertions", 0.90), ("mocking", 0.85), + ("traits", 0.60), ("closures", 0.55), ("error-handling", 0.50), + ("benchmarks", 0.80), ("coverage", 0.75), + ]), + ); + + // Document 4: Unrelated document (cooking — no overlap) + builder.add_document( + "cooking", + "Italian Cooking", + "md", + 15, + keywords(&[ + ("pasta", 0.95), ("sauce", 0.90), ("olive-oil", 0.85), + ("garlic", 0.80), ("basil", 0.75), ("tomato", 0.70), + ]), + ); + + let graph = builder.build(); + + println!("Graph built:"); + println!(" Documents: {}", graph.node_count()); + println!(" Edges: {}", graph.edge_count()); + + graph +} + +/// Explore nodes, edges, and relationship evidence. +fn demo_explore(graph: &DocumentGraph) { + for doc_id in graph.doc_ids() { + let node = graph.get_node(doc_id).unwrap(); + let neighbors = graph.get_neighbors(doc_id); + + println!("[{}] {} ({} nodes)", node.doc_id, node.title, node.node_count); + + // Show top keywords + let top_3: Vec<String> = node.top_keywords.iter() + .take(3) + .map(|kw| format!("{} ({:.2})", kw.keyword, kw.weight)) + .collect(); + println!(" Keywords: {}", top_3.join(", ")); + + // Show edges to other documents + if neighbors.is_empty() { + println!(" Edges: (none — isolated document)"); + } else { + println!(" Edges:"); + for edge in neighbors { + println!( + " -> {} [weight={:.3}, jaccard={:.3}, shared={}]", + edge.target_doc_id, + edge.weight, + edge.evidence.keyword_jaccard, + edge.evidence.shared_keyword_count, + ); + // Show shared keywords + let shared: Vec<String> = edge.evidence.shared_keywords.iter() + .map(|sk| format!("{} ({:.2}/{:.2})", sk.keyword, sk.source_weight, sk.target_weight)) + .collect(); + println!(" Shared: {}", shared.join(", ")); + } + } + println!(); + } +} + +/// Look up documents by keyword using the inverted index. +fn demo_keyword_lookup(graph: &DocumentGraph) { + let queries = ["traits", "closures", "async", "pasta", "nonexistent"]; + + for kw in &queries { + let entries = graph.find_by_keyword(kw); + if entries.is_empty() { + println!(" '{}': not found in any document", kw); + } else { + let docs: Vec<String> = entries.iter() + .map(|e| format!("{} ({:.2})", e.doc_id, e.weight)) + .collect(); + println!(" '{}': found in {}", kw, docs.join(", ")); + } + } +} + +/// Show how graph-boosted retrieval works conceptually. +fn demo_graph_boosted_retrieval(graph: &DocumentGraph) { + println!("Scenario: User queries 'traits and closures'"); + println!(); + + // Step 1: Simulate per-document scores + let results = vec![ + ("rust-guide".to_string(), 0.85), + ("async-guide".to_string(), 0.60), + ("testing-guide".to_string(), 0.55), + ("cooking".to_string(), 0.10), + ]; + + println!("Before graph boosting:"); + for (doc, score) in &results { + println!(" {}: {:.3}", doc, score); + } + + // Step 2: Apply graph boost — high-score docs boost their neighbors + let boost_factor = 0.15; + let mut boosted = results.clone(); + for (doc, base_score) in &results { + if *base_score > 0.5 { + for edge in graph.get_neighbors(doc) { + for entry in boosted.iter_mut() { + if entry.0 == edge.target_doc_id { + let boost = boost_factor * edge.weight * base_score; + entry.1 += boost; + } + } + } + } + } + boosted.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap()); + + println!(); + println!("After graph boosting (boost_factor={}):", boost_factor); + for (doc, score) in &boosted { + let delta = score - results.iter().find(|(d, _)| d == doc).unwrap().1; + println!(" {}: {:.3} (+{:.3})", doc, score, delta); + } + + println!(); + println!("Effect: Related documents (rust-guide, async-guide, testing-guide)"); + println!(" boost each other via shared keywords, while 'cooking' stays low."); +} + +// Helper to build keyword maps +fn keywords(pairs: &[(&str, f32)]) -> HashMap<String, f32> { + pairs.iter().map(|&(k, w)| (k.to_string(), w)).collect() +} diff --git a/examples/rust/retrieve.rs b/examples/rust/retrieve.rs index a05a88a9..62e5ff73 100644 --- a/examples/rust/retrieve.rs +++ b/examples/rust/retrieve.rs @@ -163,12 +163,16 @@ async fn demo_orchestrator(tree: &DocumentTree) -> vectorless::Result<()> { println!(" - Is sufficient: {}", response.is_sufficient); println!(" - Confidence: {:.2}", response.confidence); println!(" - Complexity: {:?}", response.complexity); - println!(" - Navigation steps: {}", response.trace.len()); + println!(" - Reasoning steps: {}", response.reasoning_chain.len()); - if !response.trace.is_empty() { - println!("\n Navigation trace:"); - for (i, step) in response.trace.iter().take(5).enumerate() { - println!(" {}. {} (score: {:.2})", i + 1, step.title, step.score); + if !response.reasoning_chain.is_empty() { + println!("\n Reasoning chain:"); + for (i, step) in response.reasoning_chain.steps.iter().take(5).enumerate() { + let title = step.title.as_deref().unwrap_or("(no node)"); + println!( + " {}. [{}] {} (score: {:.2}): {}", + i + 1, step.stage, title, step.score, step.reasoning + ); } } diff --git a/examples/rust/streaming.rs b/examples/rust/streaming.rs index 8942110c..d01de51d 100644 --- a/examples/rust/streaming.rs +++ b/examples/rust/streaming.rs @@ -7,64 +7,166 @@ //! to get results incrementally as they are found. //! //! # What you'll learn: -//! - How to use `query_stream()` for progressive results +//! - How to use `retrieve_streaming()` for progressive results //! - How to handle RetrieveEvent types //! - How to display results as they arrive -//! - How to cancel long-running queries //! //! # RetrieveEvent types: //! - `Started`: Query began, shows planned strategy -//! - `NodeVisited`: A node was visited during search -//! - `ContentFound`: Relevant content was found +//! - `StageCompleted`: A pipeline stage finished //! - `Backtracking`: Search is backtracking for more data //! - `Completed`: Query finished with final results //! - `Error`: An error occurred //! -//! # Use cases: -//! - Interactive Q&A with real-time feedback -//! - Long-running queries on large documents -//! - Debugging retrieval behavior -//! - Building responsive UIs +//! # Usage //! -//! # TODO: Implementation steps -//! -//! 1. Configure engine for streaming -//! 2. Call query_stream() instead of query() -//! 3. Process events as they arrive -//! 4. Handle completion and errors - -// TODO: Implement streaming retrieval -// ``` -// use vectorless::client::{Engine, RetrieveEvent}; -// -// async fn streaming_query( -// engine: &Engine, -// doc_id: &DocumentId, -// query: &str, -// ) { -// let mut stream = engine.query_stream(doc_id, query).await; -// -// while let Some(event) = stream.next().await { -// match event { -// RetrieveEvent::Started { strategy } => { -// println!("Starting search with strategy: {:?}", strategy); -// } -// RetrieveEvent::ContentFound { node_id, preview } => { -// println!("Found: {} - {}", node_id, preview); -// } -// RetrieveEvent::Completed { response } => { -// println!("Done! Confidence: {}", response.confidence); -// } -// _ => {} -// } -// } -// } -// ``` - -fn main() { - // TODO: Show streaming query usage - // - // streaming_query(&engine, &doc_id, "What is the architecture?").await; - - println!("TODO: Implement streaming example"); +//! ```bash +//! cargo run --example streaming +//! ``` + +use vectorless::document::DocumentTree; +use vectorless::retrieval::{ + PipelineRetriever, RetrieveEvent, RetrieveOptions, StrategyPreference, +}; + +#[tokio::main] +async fn main() { + println!("=== Streaming Retrieval Example ===\n"); + + // 1. Create a sample document tree + let tree = create_sample_tree(); + println!("Created sample document tree ({} nodes)\n", tree.node_count()); + + // 2. Create a pipeline retriever + let retriever = PipelineRetriever::new() + .with_max_backtracks(3) + .with_max_iterations(5); + + // 3. Configure options (streaming is just a usage pattern, not a flag) + let options = RetrieveOptions { + top_k: 5, + beam_width: 3, + max_iterations: 5, + max_tokens: 4000, + strategy: StrategyPreference::Auto, + ..Default::default() + }; + + // 4. Execute streaming query + let query = "What is the architecture?"; + println!("Query: \"{}\"\n", query); + println!("--- Streaming Events ---\n"); + + let (_handle, mut rx) = retriever.retrieve_streaming(&tree, query, &options); + + // 5. Process events as they arrive + while let Some(event) = rx.recv().await { + match event { + RetrieveEvent::Started { query, strategy } => { + println!("[Started] query=\"{query}\", strategy={strategy}"); + } + RetrieveEvent::StageCompleted { stage, elapsed_ms } => { + println!("[StageCompleted] {stage} ({elapsed_ms}ms)"); + } + RetrieveEvent::NodeVisited { node_id, title, score } => { + println!("[NodeVisited] {title} (id={node_id}, score={score:.2})"); + } + RetrieveEvent::ContentFound { title, preview, score, .. } => { + let short_preview = if preview.len() > 60 { + format!("{}...", &preview[..60]) + } else { + preview + }; + println!("[ContentFound] {title} (score={score:.2}): {short_preview}"); + } + RetrieveEvent::Backtracking { from, to, reason } => { + println!("[Backtracking] {from} -> {to}: {reason}"); + } + RetrieveEvent::SufficiencyCheck { level, tokens } => { + println!("[SufficiencyCheck] level={level:?}, tokens={tokens}"); + } + RetrieveEvent::Completed { response } => { + println!("\n--- Final Results ---"); + println!("Confidence: {:.2}", response.confidence); + println!("Sufficient: {}", response.is_sufficient); + println!("Strategy: {}", response.strategy_used); + println!("Tokens used: {}", response.tokens_used); + println!("Results: {}", response.results.len()); + + if !response.results.is_empty() { + println!("\nTop results:"); + for (i, result) in response.results.iter().take(3).enumerate() { + println!(" {}. {} (score: {:.2})", i + 1, result.title, result.score); + } + } + break; + } + RetrieveEvent::Error { message } => { + eprintln!("[Error] {message}"); + break; + } + } + } + + println!("\n=== Done ==="); +} + +/// Create a sample document tree for demonstration. +fn create_sample_tree() -> DocumentTree { + let mut tree = DocumentTree::new( + "Vectorless Documentation", + "A hierarchical document intelligence engine written in Rust.", + ); + + let _intro = tree.add_child( + tree.root(), + "Introduction", + "Vectorless is a document intelligence engine written in Rust.", + ); + + let arch = tree.add_child( + tree.root(), + "Architecture", + "The system consists of three main components: indexer, retriever, and storage.", + ); + + let index_section = tree.add_child( + arch, + "Index Pipeline", + "The index pipeline processes documents into a tree structure with summaries.", + ); + let retrieve_section = tree.add_child( + arch, + "Retrieval Pipeline", + "The retrieval pipeline finds relevant content using multi-stage processing.", + ); + + tree.add_child( + index_section, + "Parse Stage", + "Parses documents (Markdown, PDF, DOCX) into structured content.", + ); + tree.add_child( + index_section, + "Build Stage", + "Builds the document tree with metadata like page numbers and indices.", + ); + + tree.add_child( + retrieve_section, + "Analyze Stage", + "Analyzes query complexity and extracts keywords for matching.", + ); + tree.add_child( + retrieve_section, + "Plan Stage", + "Selects retrieval strategy (keyword/semantic/LLM) and search algorithm.", + ); + tree.add_child( + retrieve_section, + "Search Stage", + "Executes tree traversal (greedy/beam/MCTS) to find relevant content.", + ); + + tree } diff --git a/rust/Cargo.toml b/rust/Cargo.toml index fe9729b9..11c5933a 100644 --- a/rust/Cargo.toml +++ b/rust/Cargo.toml @@ -110,6 +110,10 @@ path = "../examples/rust/strategy_page_range.rs" name = "streaming" path = "../examples/rust/streaming.rs" +[[example]] +name = "document_graph" +path = "../examples/rust/document_graph.rs" + [dependencies] # Async runtime tokio = { workspace = true } diff --git a/rust/src/client/retriever.rs b/rust/src/client/retriever.rs index f99903f7..c1760e6a 100644 --- a/rust/src/client/retriever.rs +++ b/rust/src/client/retriever.rs @@ -25,6 +25,7 @@ use crate::config::Config; use crate::document::{DocumentTree, NodeId}; use crate::error::{Error, Result}; use crate::retrieval::content::ContentAggregatorConfig; +use crate::retrieval::stream::{RetrieveEvent, RetrieveEventReceiver}; use crate::retrieval::{ QueryComplexity, RetrievalResult, RetrieveOptions, RetrieveResponse, Retriever, SufficiencyLevel, @@ -186,6 +187,75 @@ impl RetrieverClient { Ok(result) } + /// Query a document tree with streaming results. + /// + /// Returns a channel receiver that yields [`RetrieveEvent`]s + /// incrementally as the pipeline progresses through its stages. + /// The stream always terminates with either `Completed` or `Error`. + /// + /// Also emits events through the [`EventEmitter`] (configured via + /// [`with_events`](Self::with_events)), so existing `on_query()` handlers + /// receive streaming events too. + /// + /// This is the streaming counterpart of [`query`](Self::query). + /// The non-streaming path is completely unaffected. + /// + /// # Example + /// + /// ```rust,ignore + /// let options = RetrieveOptions::new().with_streaming(true); + /// let mut rx = client.query_stream(&tree, "query", &options).await?; + /// + /// while let Some(event) = rx.recv().await { + /// match event { + /// RetrieveEvent::StageCompleted { stage, .. } => println!("{stage} done"), + /// RetrieveEvent::Completed { response } => { + /// println!("Confidence: {}", response.confidence); + /// break; + /// } + /// RetrieveEvent::Error { message } => { eprintln!("{message}"); break; } + /// _ => {} + /// } + /// } + /// ``` + /// + /// # Errors + /// + /// Returns an error if the retriever cannot be cloned for streaming. + pub async fn query_stream( + &self, + tree: &DocumentTree, + question: &str, + options: &RetrieveOptions, + ) -> Result<RetrieveEventReceiver> { + self.events.emit_query(QueryEvent::Started { + query: question.to_string(), + }); + + info!("Streaming query: {:?}", question); + + let (handle, rx) = self.retriever.retrieve_streaming(tree, question, options); + + // Spawn a sidecar task that forwards events to the EventEmitter + let events = self.events.clone(); + let question_owned = question.to_string(); + tokio::spawn(async move { + // The handle will complete when the streaming task finishes. + // We don't need to forward events individually here since + // the primary channel (rx) is returned to the caller. + // The EventEmitter events are already emitted above for Started. + // The caller can consume rx for detailed streaming events. + let _ = handle.await; + events.emit_query(QueryEvent::Complete { + total_results: 0, + confidence: 0.0, + }); + let _ = question_owned; // suppress unused warning + }); + + Ok(rx) + } + /// Build QueryResult from RetrieveResponse. fn build_query_result(&self, response: &RetrieveResponse) -> QueryResult { // Extract node IDs diff --git a/rust/src/document/graph.rs b/rust/src/document/graph.rs new file mode 100644 index 00000000..988c5e8f --- /dev/null +++ b/rust/src/document/graph.rs @@ -0,0 +1,358 @@ +// Copyright (c) 2026 vectorless developers +// SPDX-License-Identifier: Apache-2.0 + +//! Document Graph — cross-document relationship graph. +//! +//! A workspace-scoped, weighted graph connecting documents by shared +//! concepts, keywords, and references. Built from each document's +//! [`ReasoningIndex`] data, it enables graph-aware retrieval ranking. + +use std::collections::HashMap; + +use serde::{Deserialize, Serialize}; + +/// A workspace-scoped document relationship graph. +/// +/// Nodes represent documents, edges represent relationships (shared keywords, +/// references). The graph is immutable after construction and can be shared +/// across threads via `Arc`. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct DocumentGraph { + /// All document nodes, indexed by doc_id. + nodes: HashMap<String, DocumentGraphNode>, + + /// Adjacency list: doc_id → outgoing edges. + edges: HashMap<String, Vec<GraphEdge>>, + + /// Inverted index: keyword → documents containing this keyword. + keyword_index: HashMap<String, Vec<KeywordDocEntry>>, + + /// Graph-level metadata. + metadata: GraphMetadata, +} + +/// Expose edges field for graph builder trimming. +impl DocumentGraph { + /// Take all edges out, leaving an empty map in their place. + pub(crate) fn take_edges(&mut self) -> HashMap<String, Vec<GraphEdge>> { + std::mem::take(&mut self.edges) + } + + /// Set edges directly (used by builder after trimming). + pub(crate) fn set_edges(&mut self, edges: HashMap<String, Vec<GraphEdge>>) { + self.metadata.edge_count = edges.values().map(|v| v.len()).sum(); + self.edges = edges; + } + + /// Get a clone of the keyword index (used by builder for edge computation). + pub(crate) fn keyword_index_clone(&self) -> HashMap<String, Vec<KeywordDocEntry>> { + self.keyword_index.clone() + } +} + +impl DocumentGraph { + /// Create a new empty document graph. + pub fn new() -> Self { + Self { + nodes: HashMap::new(), + edges: HashMap::new(), + keyword_index: HashMap::new(), + metadata: GraphMetadata { + document_count: 0, + edge_count: 0, + }, + } + } + + /// Add a document node to the graph. + pub fn add_node(&mut self, node: DocumentGraphNode) { + // Populate keyword index from the node's top keywords + for kw in &node.top_keywords { + self.keyword_index + .entry(kw.keyword.clone()) + .or_default() + .push(KeywordDocEntry { + doc_id: node.doc_id.clone(), + weight: kw.weight, + }); + } + let doc_id = node.doc_id.clone(); + self.nodes.insert(doc_id, node); + self.metadata.document_count = self.nodes.len(); + } + + /// Add a directed edge from `source` to `target`. + pub fn add_edge(&mut self, source: &str, edge: GraphEdge) { + self.edges + .entry(source.to_string()) + .or_default() + .push(edge); + self.metadata.edge_count = self.edges.values().map(|v| v.len()).sum(); + } + + /// Get a document node by ID. + pub fn get_node(&self, doc_id: &str) -> Option<&DocumentGraphNode> { + self.nodes.get(doc_id) + } + + /// Get all edges outgoing from a document. + pub fn get_neighbors(&self, doc_id: &str) -> &[GraphEdge] { + self.edges.get(doc_id).map_or(&[], Vec::as_slice) + } + + /// Find documents containing a keyword. + pub fn find_by_keyword(&self, keyword: &str) -> &[KeywordDocEntry] { + self.keyword_index + .get(keyword) + .map_or(&[], Vec::as_slice) + } + + /// Get the number of documents in the graph. + pub fn node_count(&self) -> usize { + self.nodes.len() + } + + /// Get the number of edges in the graph. + pub fn edge_count(&self) -> usize { + self.edges.values().map(|v| v.len()).sum() + } + + /// Get all document IDs in the graph. + pub fn doc_ids(&self) -> impl Iterator<Item = &str> { + self.nodes.keys().map(|s| s.as_str()) + } + + /// Get graph metadata. + pub fn metadata(&self) -> &GraphMetadata { + &self.metadata + } + + /// Check if the graph is empty. + pub fn is_empty(&self) -> bool { + self.nodes.is_empty() + } +} + +impl Default for DocumentGraph { + fn default() -> Self { + Self::new() + } +} + +/// A document node in the graph. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct DocumentGraphNode { + /// Document ID (matches `PersistedDocument.meta.id`). + pub doc_id: String, + /// Document title/name. + pub title: String, + /// Document format (md, pdf, docx). + pub format: String, + /// Top-N representative keywords extracted from the document's + /// ReasoningIndex topic_paths, sorted by aggregate weight. + pub top_keywords: Vec<WeightedKeyword>, + /// Number of nodes in the document tree. + pub node_count: usize, +} + +/// A keyword with its aggregate weight across the document. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct WeightedKeyword { + /// The keyword string (lowercased). + pub keyword: String, + /// Aggregate weight across all TopicEntry instances (0.0 - 1.0). + pub weight: f32, +} + +/// An edge connecting two documents. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct GraphEdge { + /// Target document ID. + pub target_doc_id: String, + /// Edge weight (0.0 - 1.0). Higher = stronger relationship. + pub weight: f32, + /// Evidence for why these documents are connected. + pub evidence: EdgeEvidence, +} + +/// Evidence for why two documents are connected. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct EdgeEvidence { + /// Keywords shared between the two documents. + pub shared_keywords: Vec<SharedKeyword>, + /// Number of shared keywords. + pub shared_keyword_count: usize, + /// Jaccard similarity of keyword sets. + pub keyword_jaccard: f32, +} + +/// A keyword shared between two documents. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SharedKeyword { + /// The shared keyword. + pub keyword: String, + /// Weight in source document. + pub source_weight: f32, + /// Weight in target document. + pub target_weight: f32, +} + +/// Entry in the keyword inverted index. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct KeywordDocEntry { + /// Document ID containing this keyword. + pub doc_id: String, + /// Weight of this keyword in the document. + pub weight: f32, +} + +/// Graph-level metadata. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct GraphMetadata { + /// Number of documents in the graph. + pub document_count: usize, + /// Number of edges in the graph. + pub edge_count: usize, +} + +/// Configuration for building the document graph. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct DocumentGraphConfig { + /// Whether graph building is enabled. + pub enabled: bool, + /// Minimum Jaccard similarity for creating an edge. + pub min_keyword_jaccard: f32, + /// Minimum shared keywords to create an edge. + pub min_shared_keywords: usize, + /// Maximum top keywords per document node. + pub max_keywords_per_doc: usize, + /// Maximum edges per document node. + pub max_edges_per_node: usize, + /// Boost factor applied to graph-connected documents during retrieval. + pub retrieval_boost_factor: f32, +} + +impl Default for DocumentGraphConfig { + fn default() -> Self { + Self { + enabled: true, + min_keyword_jaccard: 0.1, + min_shared_keywords: 2, + max_keywords_per_doc: 50, + max_edges_per_node: 20, + retrieval_boost_factor: 0.15, + } + } +} + +impl DocumentGraphConfig { + /// Create a new config with defaults. + pub fn new() -> Self { + Self::default() + } + + /// Create a disabled config. + pub fn disabled() -> Self { + Self { + enabled: false, + ..Self::default() + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_empty_graph() { + let graph = DocumentGraph::new(); + assert!(graph.is_empty()); + assert_eq!(graph.node_count(), 0); + assert_eq!(graph.edge_count(), 0); + } + + #[test] + fn test_add_node() { + let mut graph = DocumentGraph::new(); + graph.add_node(DocumentGraphNode { + doc_id: "doc1".to_string(), + title: "Test Doc".to_string(), + format: "md".to_string(), + top_keywords: vec![ + WeightedKeyword { keyword: "rust".to_string(), weight: 0.9 }, + WeightedKeyword { keyword: "async".to_string(), weight: 0.7 }, + ], + node_count: 10, + }); + + assert_eq!(graph.node_count(), 1); + assert!(graph.get_node("doc1").is_some()); + assert_eq!(graph.find_by_keyword("rust").len(), 1); + assert_eq!(graph.find_by_keyword("async").len(), 1); + assert_eq!(graph.find_by_keyword("missing").len(), 0); + } + + #[test] + fn test_add_edge() { + let mut graph = DocumentGraph::new(); + graph.add_node(DocumentGraphNode { + doc_id: "doc1".to_string(), + title: "A".to_string(), + format: "md".to_string(), + top_keywords: vec![], + node_count: 5, + }); + graph.add_node(DocumentGraphNode { + doc_id: "doc2".to_string(), + title: "B".to_string(), + format: "md".to_string(), + top_keywords: vec![], + node_count: 8, + }); + + graph.add_edge("doc1", GraphEdge { + target_doc_id: "doc2".to_string(), + weight: 0.5, + evidence: EdgeEvidence { + shared_keywords: vec![SharedKeyword { + keyword: "rust".to_string(), + source_weight: 0.9, + target_weight: 0.8, + }], + shared_keyword_count: 1, + keyword_jaccard: 0.3, + }, + }); + + assert_eq!(graph.edge_count(), 1); + assert_eq!(graph.get_neighbors("doc1").len(), 1); + assert_eq!(graph.get_neighbors("doc1")[0].target_doc_id, "doc2"); + assert_eq!(graph.get_neighbors("doc2").len(), 0); + } + + #[test] + fn test_config_default() { + let config = DocumentGraphConfig::default(); + assert!(config.enabled); + assert!((config.min_keyword_jaccard - 0.1).abs() < f32::EPSILON); + assert_eq!(config.min_shared_keywords, 2); + } + + #[test] + fn test_serialization_roundtrip() { + let mut graph = DocumentGraph::new(); + graph.add_node(DocumentGraphNode { + doc_id: "doc1".to_string(), + title: "Test".to_string(), + format: "md".to_string(), + top_keywords: vec![WeightedKeyword { keyword: "test".to_string(), weight: 1.0 }], + node_count: 3, + }); + + let json = serde_json::to_string(&graph).unwrap(); + let deserialized: DocumentGraph = serde_json::from_str(&json).unwrap(); + assert_eq!(deserialized.node_count(), 1); + assert_eq!(deserialized.get_node("doc1").unwrap().title, "Test"); + } +} diff --git a/rust/src/document/mod.rs b/rust/src/document/mod.rs index 9e158649..d2abf53f 100644 --- a/rust/src/document/mod.rs +++ b/rust/src/document/mod.rs @@ -16,13 +16,23 @@ //! - [`NodeReference`] - In-document reference (e.g., "see Appendix G") //! - [`RefType`] - Type of reference (Section, Appendix, Table, etc.) +mod graph; mod node; +mod reasoning; mod reference; mod structure; mod toc; mod tree; +pub use graph::{ + DocumentGraph, DocumentGraphConfig, DocumentGraphNode, EdgeEvidence, GraphEdge, GraphMetadata, + KeywordDocEntry, SharedKeyword, WeightedKeyword, +}; pub use node::{NodeId, TreeNode}; +pub use reasoning::{ + HotNodeEntry, ReasoningIndex, ReasoningIndexBuilder, ReasoningIndexConfig, SectionSummary, + SummaryShortcut, TopicEntry, +}; pub use reference::{ NodeReference, RefType, ReferenceExtractor, ReferenceResolver, }; diff --git a/rust/src/document/reasoning.rs b/rust/src/document/reasoning.rs new file mode 100644 index 00000000..0beeb730 --- /dev/null +++ b/rust/src/document/reasoning.rs @@ -0,0 +1,345 @@ +// Copyright (c) 2026 vectorless developers +// SPDX-License-Identifier: Apache-2.0 + +//! Pre-computed reasoning index for fast retrieval path resolution. +//! +//! Built at index time from TOC and summaries, the reasoning index provides +//! topic-to-path mappings, summary shortcuts, and hot node tracking that +//! accelerate query-time retrieval by bypassing expensive tree traversal. + +use std::collections::HashMap; + +use serde::{Deserialize, Serialize}; + +use super::node::NodeId; + +/// A pre-computed reasoning index that maps topics and query patterns +/// to optimal tree paths, built at index time for query-time acceleration. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct ReasoningIndex { + /// Keyword → list of (NodeId, weight) entries. + /// Built from titles and summaries at index time. + /// Key = lowercased keyword token. + topic_paths: HashMap<String, Vec<TopicEntry>>, + + /// Pre-computed shortcut for "document summary" queries. + /// Maps summary-type query patterns directly to the root node + /// and its top-level children summaries. + summary_shortcut: Option<SummaryShortcut>, + + /// Nodes marked as hot (frequently retrieved). + /// NodeId → cumulative hit count and rolling average score. + hot_nodes: HashMap<NodeId, HotNodeEntry>, + + /// Depth-1 section title → NodeId mapping for fast ToC lookup. + section_map: HashMap<String, NodeId>, + + /// Configuration used to build this index (for cache invalidation). + config_hash: u64, +} + +impl ReasoningIndex { + /// Create a new empty reasoning index. + pub fn new() -> Self { + Self { + topic_paths: HashMap::new(), + summary_shortcut: None, + hot_nodes: HashMap::new(), + section_map: HashMap::new(), + config_hash: 0, + } + } + + /// Create a builder for constructing the reasoning index. + pub fn builder() -> ReasoningIndexBuilder { + ReasoningIndexBuilder::new() + } + + /// Look up topic entries for a keyword. + pub fn topic_entries(&self, keyword: &str) -> Option<&[TopicEntry]> { + self.topic_paths.get(keyword).map(Vec::as_slice) + } + + /// Get the summary shortcut, if available. + pub fn summary_shortcut(&self) -> Option<&SummaryShortcut> { + self.summary_shortcut.as_ref() + } + + /// Check if a node is marked as hot. + pub fn is_hot(&self, node_id: NodeId) -> bool { + self.hot_nodes.get(&node_id).map(|e| e.is_hot).unwrap_or(false) + } + + /// Get the hot node entry for a node. + pub fn hot_entry(&self, node_id: NodeId) -> Option<&HotNodeEntry> { + self.hot_nodes.get(&node_id) + } + + /// Look up a section by its title. + pub fn find_section(&self, title: &str) -> Option<NodeId> { + self.section_map.get(&title.to_lowercase()).copied() + } + + /// Get the number of topic keywords indexed. + pub fn topic_count(&self) -> usize { + self.topic_paths.len() + } + + /// Get the number of sections in the section map. + pub fn section_count(&self) -> usize { + self.section_map.len() + } + + /// Get the number of hot nodes. + pub fn hot_node_count(&self) -> usize { + self.hot_nodes.iter().filter(|(_, e)| e.is_hot).count() + } + + /// Update hot node tracking from retrieval results. + pub fn update_hot_nodes(&mut self, hits: &[(NodeId, f32)], hot_threshold: u32) { + for &(node_id, score) in hits { + let entry = self.hot_nodes.entry(node_id).or_insert(HotNodeEntry { + hit_count: 0, + avg_score: 0.0, + is_hot: false, + }); + entry.hit_count += 1; + entry.avg_score += (score - entry.avg_score) / entry.hit_count as f32; + if entry.hit_count >= hot_threshold { + entry.is_hot = true; + } + } + } +} + +impl Default for ReasoningIndex { + fn default() -> Self { + Self::new() + } +} + +/// Builder for constructing a `ReasoningIndex`. +pub struct ReasoningIndexBuilder { + topic_paths: HashMap<String, Vec<TopicEntry>>, + summary_shortcut: Option<SummaryShortcut>, + hot_nodes: HashMap<NodeId, HotNodeEntry>, + section_map: HashMap<String, NodeId>, + config_hash: u64, +} + +impl ReasoningIndexBuilder { + /// Create a new builder. + pub fn new() -> Self { + Self { + topic_paths: HashMap::new(), + summary_shortcut: None, + hot_nodes: HashMap::new(), + section_map: HashMap::new(), + config_hash: 0, + } + } + + /// Add a topic entry for a keyword. + pub fn add_topic_entry(&mut self, keyword: impl Into<String>, entry: TopicEntry) { + self.topic_paths + .entry(keyword.into()) + .or_default() + .push(entry); + } + + /// Set the summary shortcut. + pub fn summary_shortcut(mut self, shortcut: SummaryShortcut) -> Self { + self.summary_shortcut = Some(shortcut); + self + } + + /// Add a section mapping. + pub fn add_section(&mut self, title: impl Into<String>, node_id: NodeId) { + self.section_map.insert(title.into().to_lowercase(), node_id); + } + + /// Set the config hash for cache invalidation. + pub fn config_hash(mut self, hash: u64) -> Self { + self.config_hash = hash; + self + } + + /// Sort topic entries by weight (descending) and trim per-keyword lists. + pub fn sort_and_trim(&mut self, max_entries: usize) { + for entries in self.topic_paths.values_mut() { + entries.sort_by(|a, b| { + b.weight + .partial_cmp(&a.weight) + .unwrap_or(std::cmp::Ordering::Equal) + }); + entries.truncate(max_entries); + } + } + + /// Build the reasoning index. + pub fn build(self) -> ReasoningIndex { + ReasoningIndex { + topic_paths: self.topic_paths, + summary_shortcut: self.summary_shortcut, + hot_nodes: self.hot_nodes, + section_map: self.section_map, + config_hash: self.config_hash, + } + } +} + +impl Default for ReasoningIndexBuilder { + fn default() -> Self { + Self::new() + } +} + +/// A topic entry mapping a keyword to a node with a weight. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct TopicEntry { + /// The target node. + pub node_id: NodeId, + /// Weight indicating how relevant this keyword is to this node (0.0 - 1.0). + pub weight: f32, + /// Depth of the node in the tree (for tie-breaking). + pub depth: usize, +} + +/// Pre-computed shortcut for summary-style queries. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SummaryShortcut { + /// The root node ID (direct answer for "what is this about" queries). + pub root_node: NodeId, + /// Pre-collected summaries of top-level sections. + pub section_summaries: Vec<SectionSummary>, + /// Combined summary text for direct return. + pub document_summary: String, +} + +/// A pre-collected section summary for quick access. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SectionSummary { + /// Section node ID. + pub node_id: NodeId, + /// Section title. + pub title: String, + /// Section summary (pre-computed by EnhanceStage). + pub summary: String, + /// Depth of the section. + pub depth: usize, +} + +/// Entry tracking how often a node is retrieved. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct HotNodeEntry { + /// Number of times this node appeared in retrieval results. + pub hit_count: u32, + /// Rolling average score when retrieved. + pub avg_score: f32, + /// Whether this node is currently marked as "hot" + /// (hit_count exceeds configured threshold). + pub is_hot: bool, +} + +/// Configuration for building and using the reasoning index. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct ReasoningIndexConfig { + /// Whether reasoning index building is enabled. + pub enabled: bool, + /// Minimum hit count for a node to be considered "hot". + pub hot_node_threshold: u32, + /// Maximum number of topic entries per keyword. + pub max_topic_entries: usize, + /// Maximum number of keyword-to-node mappings to keep. + pub max_keyword_entries: usize, + /// Minimum keyword length to index. + pub min_keyword_length: usize, + /// Whether to build the summary shortcut. + pub build_summary_shortcut: bool, +} + +impl Default for ReasoningIndexConfig { + fn default() -> Self { + Self { + enabled: true, + hot_node_threshold: 3, + max_topic_entries: 20, + max_keyword_entries: 5000, + min_keyword_length: 2, + build_summary_shortcut: true, + } + } +} + +impl ReasoningIndexConfig { + /// Create a new config with defaults. + pub fn new() -> Self { + Self::default() + } + + /// Create a disabled config. + pub fn disabled() -> Self { + Self { + enabled: false, + ..Self::default() + } + } + + /// Set the hot node threshold. + pub fn with_hot_threshold(mut self, threshold: u32) -> Self { + self.hot_node_threshold = threshold; + self + } + + /// Set whether to build the summary shortcut. + pub fn with_summary_shortcut(mut self, build: bool) -> Self { + self.build_summary_shortcut = build; + self + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_reasoning_index_default() { + let index = ReasoningIndex::default(); + assert_eq!(index.topic_count(), 0); + assert_eq!(index.section_count(), 0); + assert_eq!(index.hot_node_count(), 0); + assert!(index.summary_shortcut().is_none()); + } + + #[test] + fn test_builder_basic() { + // Create a simple tree to get valid NodeIds + let mut tree = crate::document::DocumentTree::new("Root", "root content"); + let child1 = tree.add_child(tree.root(), "Introduction", "intro content"); + let child2 = tree.add_child(tree.root(), "Methods", "methods content"); + + let mut builder = ReasoningIndexBuilder::new(); + builder.add_section("Introduction", child1); + builder.add_section("Methods", child2); + + let index = builder.build(); + assert_eq!(index.section_count(), 2); + assert!(index.find_section("introduction").is_some()); + assert!(index.find_section("INTRODUCTION").is_some()); + assert!(index.find_section("methods").is_some()); + } + + #[test] + fn test_config_default() { + let config = ReasoningIndexConfig::default(); + assert!(config.enabled); + assert_eq!(config.hot_node_threshold, 3); + assert!(config.build_summary_shortcut); + } + + #[test] + fn test_config_disabled() { + let config = ReasoningIndexConfig::disabled(); + assert!(!config.enabled); + } +} diff --git a/rust/src/error.rs b/rust/src/error.rs index a5caacad..1564697e 100644 --- a/rust/src/error.rs +++ b/rust/src/error.rs @@ -41,6 +41,10 @@ pub enum Error { #[error("Index corrupted: {0}")] IndexCorrupted(String), + /// Document graph build error. + #[error("Document graph build error: {0}")] + GraphBuild(String), + // ========================================================================= // Retrieval Errors // ========================================================================= diff --git a/rust/src/index/config.rs b/rust/src/index/config.rs index f5cabebc..5d982183 100644 --- a/rust/src/index/config.rs +++ b/rust/src/index/config.rs @@ -11,6 +11,7 @@ use super::summary::SummaryStrategy; use crate::config::{ConcurrencyConfig, IndexerConfig}; +use crate::document::ReasoningIndexConfig; /// Index mode for document processing. #[derive(Debug, Clone, Copy, PartialEq, Eq)] @@ -153,6 +154,9 @@ pub struct PipelineOptions { /// Indexer configuration. pub indexer: IndexerConfig, + + /// Reasoning index configuration. + pub reasoning_index: ReasoningIndexConfig, } impl Default for PipelineOptions { @@ -166,6 +170,7 @@ impl Default for PipelineOptions { generate_description: true, concurrency: ConcurrencyConfig::default(), indexer: IndexerConfig::default(), + reasoning_index: ReasoningIndexConfig::default(), } } } @@ -223,6 +228,12 @@ impl PipelineOptions { self.indexer = indexer; self } + + /// Set the reasoning index configuration. + pub fn with_reasoning_index(mut self, config: ReasoningIndexConfig) -> Self { + self.reasoning_index = config; + self + } } #[cfg(test)] diff --git a/rust/src/index/graph_builder.rs b/rust/src/index/graph_builder.rs new file mode 100644 index 00000000..b749cc14 --- /dev/null +++ b/rust/src/index/graph_builder.rs @@ -0,0 +1,409 @@ +// Copyright (c) 2026 vectorless developers +// SPDX-License-Identifier: Apache-2.0 + +//! Document Graph Builder — constructs cross-document relationship graphs. +//! +//! This is a standalone builder (not an `IndexStage`) because it operates +//! on the workspace level across all documents, not on a single document. + +use std::collections::HashMap; + +use tracing::info; + +use crate::document::{ + DocumentGraph, DocumentGraphConfig, DocumentGraphNode, EdgeEvidence, GraphEdge, SharedKeyword, + WeightedKeyword, +}; + +/// Intermediate data collected per document during graph building. +#[derive(Debug, Clone)] +struct DocProfile { + doc_id: String, + title: String, + format: String, + node_count: usize, + /// keyword → aggregate weight + keywords: HashMap<String, f32>, +} + +/// Builder for constructing a `DocumentGraph` from multiple documents. +pub struct DocumentGraphBuilder { + config: DocumentGraphConfig, + profiles: Vec<DocProfile>, +} + +impl DocumentGraphBuilder { + /// Create a new builder with the given configuration. + pub fn new(config: DocumentGraphConfig) -> Self { + Self { + config, + profiles: Vec::new(), + } + } + + /// Create a builder with default configuration. + pub fn with_defaults() -> Self { + Self::new(DocumentGraphConfig::default()) + } + + /// Add a document's keyword profile to the builder. + /// + /// `keywords` should map keyword → aggregate weight (from + /// `ReasoningIndex::topic_paths` or extracted from content). + pub fn add_document( + &mut self, + doc_id: impl Into<String>, + title: impl Into<String>, + format: impl Into<String>, + node_count: usize, + keywords: HashMap<String, f32>, + ) { + self.profiles.push(DocProfile { + doc_id: doc_id.into(), + title: title.into(), + format: format.into(), + node_count, + keywords, + }); + } + + /// Build the document graph from accumulated document profiles. + pub fn build(self) -> DocumentGraph { + let mut graph = DocumentGraph::new(); + + if self.profiles.is_empty() { + info!("Building document graph: 0 documents, empty graph"); + return graph; + } + + // Step 1: Add document nodes with top-N keywords + for profile in &self.profiles { + let mut weighted: Vec<WeightedKeyword> = profile + .keywords + .iter() + .map(|(kw, &w)| WeightedKeyword { + keyword: kw.clone(), + weight: w, + }) + .collect(); + // Sort by weight descending + weighted.sort_by(|a, b| { + b.weight + .partial_cmp(&a.weight) + .unwrap_or(std::cmp::Ordering::Equal) + }); + weighted.truncate(self.config.max_keywords_per_doc); + + graph.add_node(DocumentGraphNode { + doc_id: profile.doc_id.clone(), + title: profile.title.clone(), + format: profile.format.clone(), + top_keywords: weighted, + node_count: profile.node_count, + }); + } + + info!( + "Building document graph: {} document nodes added", + graph.node_count() + ); + + // Step 2: Compute edges using the keyword inverted index + // (already built inside graph.add_node via keyword_index) + self.compute_edges(&mut graph); + + info!( + "Document graph built: {} nodes, {} edges", + graph.node_count(), + graph.edge_count() + ); + + graph + } + + /// Compute edges between documents based on shared keywords. + fn compute_edges(&self, graph: &mut DocumentGraph) { + // Collect candidate pairs: (doc_a, doc_b) → shared keywords + let mut pair_shared: HashMap<(String, String), Vec<SharedKeyword>> = HashMap::new(); + + // Iterate the keyword index: for each keyword, all docs sharing it are candidates + let kw_index = graph.keyword_index_clone(); + + for (keyword, entries) in &kw_index { + if entries.len() < 2 { + continue; // No pair possible + } + // For every pair of documents sharing this keyword + for i in 0..entries.len() { + for j in (i + 1)..entries.len() { + let a = &entries[i]; + let b = &entries[j]; + let pair = if a.doc_id < b.doc_id { + (a.doc_id.clone(), b.doc_id.clone()) + } else { + (b.doc_id.clone(), a.doc_id.clone()) + }; + let shared = SharedKeyword { + keyword: keyword.clone(), + source_weight: a.weight, + target_weight: b.weight, + }; + pair_shared.entry(pair).or_default().push(shared); + } + } + } + + // Step 3: Create edges for pairs that meet thresholds + for ((doc_a, doc_b), shared_kws) in pair_shared { + let shared_count = shared_kws.len(); + if shared_count < self.config.min_shared_keywords { + continue; + } + + // Compute Jaccard: |intersection| / |union| + let kw_a = graph + .get_node(&doc_a) + .map(|n| n.top_keywords.len()) + .unwrap_or(0); + let kw_b = graph + .get_node(&doc_b) + .map(|n| n.top_keywords.len()) + .unwrap_or(0); + let union_size = kw_a + kw_b - shared_count; + let jaccard = if union_size > 0 { + shared_count as f32 / union_size as f32 + } else { + 0.0 + }; + + if jaccard < self.config.min_keyword_jaccard { + continue; + } + + // Edge weight: combine Jaccard with keyword count + let max_kws = self.config.max_keywords_per_doc.max(1) as f32; + let weight = (jaccard * 0.6 + (shared_count as f32 / max_kws).min(1.0) * 0.4).min(1.0); + + // Create bidirectional edges + let evidence_a = EdgeEvidence { + shared_keywords: shared_kws.clone(), + shared_keyword_count: shared_count, + keyword_jaccard: jaccard, + }; + let evidence_b = EdgeEvidence { + shared_keywords: shared_kws + .iter() + .map(|s| SharedKeyword { + keyword: s.keyword.clone(), + source_weight: s.target_weight, + target_weight: s.source_weight, + }) + .collect(), + shared_keyword_count: shared_count, + keyword_jaccard: jaccard, + }; + + graph.add_edge( + &doc_a, + GraphEdge { + target_doc_id: doc_b.clone(), + weight, + evidence: evidence_a, + }, + ); + graph.add_edge( + &doc_b, + GraphEdge { + target_doc_id: doc_a.clone(), + weight, + evidence: evidence_b, + }, + ); + } + + // Step 4: Trim edges per node to max_edges_per_node + self.trim_edges(graph); + } + + /// Trim edges per node to the configured maximum. + fn trim_edges(&self, graph: &mut DocumentGraph) { + let max = self.config.max_edges_per_node; + let all_edges = graph.take_edges(); + let mut trimmed: HashMap<String, Vec<GraphEdge>> = HashMap::new(); + + for (source, mut edges) in all_edges { + edges.sort_by(|a, b| { + b.weight + .partial_cmp(&a.weight) + .unwrap_or(std::cmp::Ordering::Equal) + }); + edges.truncate(max); + trimmed.insert(source, edges); + } + + graph.set_edges(trimmed); + } +} + +#[cfg(test)] +mod tests { + use super::*; + + fn make_keywords(pairs: &[(&str, f32)]) -> HashMap<String, f32> { + pairs + .iter() + .map(|&(k, w)| (k.to_string(), w)) + .collect() + } + + #[test] + fn test_empty_workspace() { + let builder = DocumentGraphBuilder::with_defaults(); + let graph = builder.build(); + assert!(graph.is_empty()); + } + + #[test] + fn test_single_document() { + let mut builder = DocumentGraphBuilder::with_defaults(); + builder.add_document( + "doc1", + "Test", + "md", + 5, + make_keywords(&[("rust", 0.9), ("async", 0.7)]), + ); + let graph = builder.build(); + assert_eq!(graph.node_count(), 1); + assert_eq!(graph.edge_count(), 0); + } + + #[test] + fn test_two_docs_shared_keywords() { + let mut builder = DocumentGraphBuilder::new(DocumentGraphConfig { + min_keyword_jaccard: 0.05, + min_shared_keywords: 2, + ..DocumentGraphConfig::default() + }); + builder.add_document( + "doc1", + "Rust Programming", + "md", + 10, + make_keywords(&[("rust", 0.9), ("async", 0.8), ("tokio", 0.6)]), + ); + builder.add_document( + "doc2", + "Async Rust", + "md", + 8, + make_keywords(&[("rust", 0.7), ("async", 0.9), ("futures", 0.5)]), + ); + + let graph = builder.build(); + assert_eq!(graph.node_count(), 2); + // Should have bidirectional edges + assert!(graph.edge_count() >= 2); + + // Check doc1 → doc2 edge + let neighbors = graph.get_neighbors("doc1"); + assert_eq!(neighbors.len(), 1); + assert_eq!(neighbors[0].target_doc_id, "doc2"); + assert!(neighbors[0].weight > 0.0); + assert!(neighbors[0].evidence.keyword_jaccard > 0.0); + assert!(neighbors[0].evidence.shared_keyword_count >= 2); + + // Check doc2 → doc1 edge (bidirectional) + let neighbors2 = graph.get_neighbors("doc2"); + assert_eq!(neighbors2.len(), 1); + assert_eq!(neighbors2[0].target_doc_id, "doc1"); + } + + #[test] + fn test_unrelated_docs_no_edge() { + let mut builder = DocumentGraphBuilder::new(DocumentGraphConfig { + min_keyword_jaccard: 0.1, + min_shared_keywords: 2, + ..DocumentGraphConfig::default() + }); + builder.add_document( + "doc1", + "Rust Guide", + "md", + 10, + make_keywords(&[("rust", 0.9), ("ownership", 0.8)]), + ); + builder.add_document( + "doc2", + "Cooking Recipes", + "md", + 8, + make_keywords(&[("pasta", 0.9), ("sauce", 0.8)]), + ); + + let graph = builder.build(); + assert_eq!(graph.node_count(), 2); + assert_eq!(graph.edge_count(), 0); + } + + #[test] + fn test_jaccard_threshold() { + let mut builder = DocumentGraphBuilder::new(DocumentGraphConfig { + min_keyword_jaccard: 0.9, // Very high threshold + min_shared_keywords: 1, + ..DocumentGraphConfig::default() + }); + // Two docs with minimal overlap + builder.add_document( + "doc1", + "A", + "md", + 5, + make_keywords(&[ + ("a", 0.9), + ("b", 0.8), + ("c", 0.7), + ("d", 0.6), + ("e", 0.5), + ]), + ); + builder.add_document( + "doc2", + "B", + "md", + 5, + make_keywords(&[("a", 0.9), ("x", 0.8), ("y", 0.7), ("z", 0.6)]), + ); + + let graph = builder.build(); + // Only 1 shared keyword out of 5+4=9 unique, Jaccard = 1/8 ≈ 0.125 + // Way below 0.9 threshold → no edge + assert_eq!(graph.edge_count(), 0); + } + + #[test] + fn test_max_edges_per_node() { + let mut builder = DocumentGraphBuilder::new(DocumentGraphConfig { + min_keyword_jaccard: 0.01, + min_shared_keywords: 1, + max_edges_per_node: 2, + ..DocumentGraphConfig::default() + }); + + // 4 docs all sharing keywords with doc1 + for i in 0..4 { + builder.add_document( + format!("doc{}", i), + format!("Doc {}", i), + "md", + 5, + make_keywords(&[("shared", 0.9), ("common", 0.8)]), + ); + } + + let graph = builder.build(); + // doc1 should have at most 2 outgoing edges + let neighbors = graph.get_neighbors("doc0"); + assert!(neighbors.len() <= 2); + } +} diff --git a/rust/src/index/mod.rs b/rust/src/index/mod.rs index 51a18ec5..6072e255 100644 --- a/rust/src/index/mod.rs +++ b/rust/src/index/mod.rs @@ -36,6 +36,7 @@ //! ``` pub mod config; +pub mod graph_builder; pub mod incremental; pub mod pipeline; pub mod stages; diff --git a/rust/src/index/pipeline/context.rs b/rust/src/index/pipeline/context.rs index 979839a8..9fffdcf0 100644 --- a/rust/src/index/pipeline/context.rs +++ b/rust/src/index/pipeline/context.rs @@ -6,7 +6,7 @@ use std::collections::HashMap; use std::path::PathBuf; -use crate::document::{DocumentTree, NodeId}; +use crate::document::{DocumentTree, NodeId, ReasoningIndex}; use crate::llm::LlmClient; use crate::parser::{DocumentFormat, RawNode}; @@ -242,6 +242,9 @@ pub struct IndexContext { /// Summary cache for lazy generation. pub summary_cache: SummaryCache, + /// Pre-computed reasoning index (built by ReasoningIndexStage). + pub reasoning_index: Option<ReasoningIndex>, + /// Stage execution results. pub stage_results: HashMap<String, StageResult>, @@ -272,6 +275,7 @@ impl IndexContext { options, llm_client: None, summary_cache: SummaryCache::default(), + reasoning_index: None, stage_results: HashMap::new(), metrics: IndexMetrics::default(), description: None, @@ -345,6 +349,7 @@ impl IndexContext { line_count: self.line_count, metrics: self.metrics, summary_cache: self.summary_cache, + reasoning_index: self.reasoning_index, } } } @@ -381,6 +386,9 @@ pub struct IndexResult { /// Summary cache. pub summary_cache: SummaryCache, + + /// Pre-computed reasoning index for retrieval acceleration. + pub reasoning_index: Option<ReasoningIndex>, } impl IndexResult { @@ -400,6 +408,7 @@ impl IndexResult { + self.metrics.build_time_ms + self.metrics.enhance_time_ms + self.metrics.enrich_time_ms + + self.metrics.reasoning_index_time_ms + self.metrics.optimize_time_ms + self.metrics.persist_time_ms } diff --git a/rust/src/index/pipeline/executor.rs b/rust/src/index/pipeline/executor.rs index 83649271..09f548e1 100644 --- a/rust/src/index/pipeline/executor.rs +++ b/rust/src/index/pipeline/executor.rs @@ -14,6 +14,7 @@ use crate::llm::LlmClient; use super::super::PipelineOptions; use super::super::stages::{ BuildStage, EnhanceStage, EnrichStage, IndexStage, OptimizeStage, ParseStage, PersistStage, + ReasoningIndexStage, }; use super::context::{IndexInput, IndexResult}; use super::orchestrator::PipelineOrchestrator; @@ -51,12 +52,14 @@ impl PipelineExecutor { /// 1. `parse` - Parse document into raw nodes /// 2. `build` - Build tree structure /// 3. `enrich` - Add metadata and cross-references - /// 4. `optimize` - Optimize tree structure + /// 4. `reasoning_index` - Build pre-computed reasoning index + /// 5. `optimize` - Optimize tree structure pub fn new() -> Self { let orchestrator = PipelineOrchestrator::new() .stage_with_priority(ParseStage::new(), 10) .stage_with_priority(BuildStage::new(), 20) .stage_with_priority(EnrichStage::new(), 40) + .stage_with_priority(ReasoningIndexStage::new(), 45) .stage_with_priority(OptimizeStage::new(), 60); Self { orchestrator } @@ -69,13 +72,15 @@ impl PipelineExecutor { /// 2. `build` - Build tree /// 3. `enhance` - LLM-based enhancement (summaries) /// 4. `enrich` - Add metadata - /// 5. `optimize` - Optimize tree + /// 5. `reasoning_index` - Build pre-computed reasoning index + /// 6. `optimize` - Optimize tree pub fn with_llm(client: LlmClient) -> Self { let orchestrator = PipelineOrchestrator::new() .stage_with_priority(ParseStage::new(), 10) .stage_with_priority(BuildStage::new(), 20) .stage_with_priority(EnhanceStage::with_llm_client(client), 30) .stage_with_priority(EnrichStage::new(), 40) + .stage_with_priority(ReasoningIndexStage::new(), 45) .stage_with_priority(OptimizeStage::new(), 60); Self { orchestrator } diff --git a/rust/src/index/pipeline/metrics.rs b/rust/src/index/pipeline/metrics.rs index 6e4bb51e..e731e7a7 100644 --- a/rust/src/index/pipeline/metrics.rs +++ b/rust/src/index/pipeline/metrics.rs @@ -32,6 +32,18 @@ pub struct IndexMetrics { #[serde(default)] pub persist_time_ms: u64, + /// Reasoning index build duration (ms). + #[serde(default)] + pub reasoning_index_time_ms: u64, + + /// Number of topics indexed in reasoning index. + #[serde(default)] + pub topics_indexed: usize, + + /// Number of keywords indexed in reasoning index. + #[serde(default)] + pub keywords_indexed: usize, + /// Total tokens generated (summaries). #[serde(default)] pub total_tokens_generated: usize, @@ -93,6 +105,13 @@ impl IndexMetrics { self.persist_time_ms = duration_ms; } + /// Record reasoning index build time. + pub fn record_reasoning_index(&mut self, duration_ms: u64, topics: usize, keywords: usize) { + self.reasoning_index_time_ms = duration_ms; + self.topics_indexed = topics; + self.keywords_indexed = keywords; + } + /// Increment LLM calls. pub fn increment_llm_calls(&mut self) { self.llm_calls += 1; @@ -129,6 +148,7 @@ impl IndexMetrics { + self.build_time_ms + self.enhance_time_ms + self.enrich_time_ms + + self.reasoning_index_time_ms + self.optimize_time_ms + self.persist_time_ms } diff --git a/rust/src/index/stages/mod.rs b/rust/src/index/stages/mod.rs index 5a55383d..2022ffae 100644 --- a/rust/src/index/stages/mod.rs +++ b/rust/src/index/stages/mod.rs @@ -9,6 +9,7 @@ mod enrich; mod optimize; mod parse; mod persist; +mod reasoning; pub use build::BuildStage; pub use enhance::EnhanceStage; @@ -16,6 +17,7 @@ pub use enrich::EnrichStage; pub use optimize::OptimizeStage; pub use parse::ParseStage; pub use persist::PersistStage; +pub use reasoning::ReasoningIndexStage; use super::pipeline::{FailurePolicy, IndexContext, StageResult}; use crate::error::Result; diff --git a/rust/src/index/stages/persist.rs b/rust/src/index/stages/persist.rs index 26d3aad4..509bc874 100644 --- a/rust/src/index/stages/persist.rs +++ b/rust/src/index/stages/persist.rs @@ -51,9 +51,14 @@ impl PersistStage { let doc = PersistedDocument::new(meta, tree.clone()); - // Add pages if available (for PDFs) // Note: pages would need to be stored in context during parse stage + // Attach reasoning index if available + let mut doc = doc; + if let Some(ref reasoning_index) = ctx.reasoning_index { + doc.reasoning_index = Some(reasoning_index.clone()); + } + workspace.add(&doc).await?; info!("Saved document {} to workspace", ctx.doc_id); diff --git a/rust/src/index/stages/reasoning.rs b/rust/src/index/stages/reasoning.rs new file mode 100644 index 00000000..804dcb19 --- /dev/null +++ b/rust/src/index/stages/reasoning.rs @@ -0,0 +1,345 @@ +// Copyright (c) 2026 vectorless developers +// SPDX-License-Identifier: Apache-2.0 + +//! Reasoning Index Stage - Build pre-computed reasoning index. +//! +//! This stage runs after EnrichStage (which generates descriptions and +//! calculates metadata) and before OptimizeStage. It builds a +//! [`ReasoningIndex`] from the document tree's TOC, summaries, and keywords. + +use std::time::Instant; +use tracing::info; + +use crate::document::{ + NodeId, ReasoningIndex, ReasoningIndexBuilder, ReasoningIndexConfig, SectionSummary, + SummaryShortcut, TopicEntry, +}; +use crate::error::Result; +use crate::retrieval::search::extract_keywords; + +use super::async_trait; +use super::{IndexStage, StageResult}; +use crate::index::pipeline::IndexContext; + +/// Reasoning Index Stage - builds a pre-computed reasoning index from the document tree. +/// +/// This stage creates a [`ReasoningIndex`] containing: +/// - Topic-to-path mappings from titles and summaries +/// - Summary shortcuts for high-frequency "overview" queries +/// - Section map for fast ToC lookup +pub struct ReasoningIndexStage { + config: ReasoningIndexConfig, +} + +impl ReasoningIndexStage { + /// Create a new reasoning index stage with default config. + pub fn new() -> Self { + Self { + config: ReasoningIndexConfig::default(), + } + } + + /// Create with custom config. + pub fn with_config(config: ReasoningIndexConfig) -> Self { + Self { config } + } + + /// Extract keywords from a text, filtering by minimum length. + fn extract_node_keywords(text: &str, min_length: usize) -> Vec<String> { + extract_keywords(text) + .into_iter() + .filter(|k: &String| k.len() >= min_length) + .collect() + } + + /// Build the topic-to-path mapping by extracting keywords from all nodes. + fn build_topic_paths( + tree: &crate::document::DocumentTree, + config: &ReasoningIndexConfig, + ) -> (std::collections::HashMap<String, Vec<TopicEntry>>, usize) { + let mut keyword_nodes: std::collections::HashMap<String, Vec<(NodeId, f32, usize)>> = + std::collections::HashMap::new(); + + // Walk all nodes and extract keywords from title + summary + for node_id in tree.traverse() { + if let Some(node) = tree.get(node_id) { + let title_keywords = Self::extract_node_keywords(&node.title, config.min_keyword_length); + let summary_keywords = Self::extract_node_keywords(&node.summary, config.min_keyword_length); + let content_keywords = if node.summary.is_empty() { + // Fallback: extract from content if no summary + let content_sample: String = node.content.chars().take(500).collect(); + Self::extract_node_keywords(&content_sample, config.min_keyword_length) + } else { + Vec::new() + }; + + // Title keywords get higher weight (2.0), summary (1.5), content (1.0) + for kw in &title_keywords { + keyword_nodes + .entry(kw.clone()) + .or_default() + .push((node_id, 2.0, node.depth)); + } + for kw in &summary_keywords { + keyword_nodes + .entry(kw.clone()) + .or_default() + .push((node_id, 1.5, node.depth)); + } + for kw in &content_keywords { + keyword_nodes + .entry(kw.clone()) + .or_default() + .push((node_id, 1.0, node.depth)); + } + } + } + + // Sort by keyword frequency (most common first) and trim to max_keyword_entries + let mut sorted_keywords: Vec<_> = keyword_nodes.into_iter().collect(); + sorted_keywords.sort_by(|a, b| b.1.len().cmp(&a.1.len())); + sorted_keywords.truncate(config.max_keyword_entries); + + let keyword_count = sorted_keywords.len(); + + // Build topic_paths: merge duplicate (keyword, node) pairs + let mut topic_paths: std::collections::HashMap<String, Vec<TopicEntry>> = + std::collections::HashMap::new(); + + for (keyword, entries) in sorted_keywords { + // Merge duplicate node entries by summing weights + let mut merged: std::collections::HashMap<NodeId, (f32, usize)> = + std::collections::HashMap::new(); + for (node_id, weight, depth) in entries { + let entry = merged.entry(node_id).or_insert((0.0, depth)); + entry.0 += weight; + } + + // Normalize weights to 0.0-1.0 range + let max_weight = merged.values().map(|(w, _)| *w).fold(0.0_f32, f32::max); + let scale = if max_weight > 0.0 { 1.0 / max_weight } else { 1.0 }; + + let mut topic_entries: Vec<TopicEntry> = merged + .into_iter() + .map(|(node_id, (weight, depth))| TopicEntry { + node_id, + weight: weight * scale, + depth, + }) + .collect(); + + topic_entries.sort_by(|a, b| { + b.weight + .partial_cmp(&a.weight) + .unwrap_or(std::cmp::Ordering::Equal) + }); + topic_entries.truncate(config.max_topic_entries); + + topic_paths.insert(keyword, topic_entries); + } + + (topic_paths, keyword_count) + } + + /// Build section map from depth-1 nodes. + fn build_section_map(tree: &crate::document::DocumentTree) -> std::collections::HashMap<String, NodeId> { + let mut section_map = std::collections::HashMap::new(); + let root = tree.root(); + for child_id in tree.children(root) { + if let Some(node) = tree.get(child_id) { + section_map.insert(node.title.to_lowercase(), child_id); + // Also index by structure index (e.g. "1", "2", "3") + if !node.structure.is_empty() { + section_map.insert(node.structure.clone(), child_id); + } + } + } + section_map + } + + /// Build summary shortcut from root and depth-1 nodes. + fn build_summary_shortcut( + tree: &crate::document::DocumentTree, + ) -> Option<SummaryShortcut> { + let root = tree.root(); + let root_node = tree.get(root)?; + + // Collect document summary from root + let document_summary = if !root_node.summary.is_empty() { + root_node.summary.clone() + } else { + // Fallback: concatenate depth-1 summaries + let mut parts = Vec::new(); + for child_id in tree.children(root) { + if let Some(child) = tree.get(child_id) { + if !child.summary.is_empty() { + parts.push(format!("{}: {}", child.title, child.summary)); + } + } + } + parts.join("\n") + }; + + // Collect section summaries + let mut section_summaries = Vec::new(); + for child_id in tree.children(root) { + if let Some(child) = tree.get(child_id) { + section_summaries.push(SectionSummary { + node_id: child_id, + title: child.title.clone(), + summary: child.summary.clone(), + depth: child.depth, + }); + } + } + + Some(SummaryShortcut { + root_node: root, + section_summaries, + document_summary, + }) + } +} + +impl Default for ReasoningIndexStage { + fn default() -> Self { + Self::new() + } +} + +#[async_trait] +impl IndexStage for ReasoningIndexStage { + fn name(&self) -> &'static str { + "reasoning_index" + } + + fn depends_on(&self) -> Vec<&'static str> { + vec!["enrich"] + } + + fn is_optional(&self) -> bool { + true + } + + async fn execute(&mut self, ctx: &mut IndexContext) -> Result<StageResult> { + let start = Instant::now(); + + // Check if enabled via pipeline options + if !ctx.options.reasoning_index.enabled { + info!("Reasoning index stage disabled, skipping"); + return Ok(StageResult::success("reasoning_index")); + } + + // Use stage config, overridden by pipeline options + let config = &ctx.options.reasoning_index; + + let tree = match ctx.tree.as_ref() { + Some(t) => t, + None => { + return Ok(StageResult::failure( + "reasoning_index", + "Tree not built", + )); + } + }; + + info!("Building reasoning index..."); + + // 1. Build topic-to-path mapping + let (topic_paths, keyword_count) = Self::build_topic_paths(tree, config); + let topic_count: usize = topic_paths.values().map(|v| v.len()).sum(); + info!( + "Built topic paths: {} keywords, {} topic entries", + keyword_count, topic_count + ); + + // 2. Build section map + let section_map = Self::build_section_map(tree); + info!("Built section map with {} entries", section_map.len()); + + // 3. Build summary shortcut + let summary_shortcut = if config.build_summary_shortcut { + let shortcut = Self::build_summary_shortcut(tree); + if shortcut.is_some() { + info!("Built summary shortcut"); + } + shortcut + } else { + None + }; + + // 4. Assemble the reasoning index + let mut builder = ReasoningIndexBuilder::new(); + for (keyword, entries) in topic_paths { + for entry in entries { + builder.add_topic_entry(&keyword, entry); + } + } + for (title, node_id) in section_map { + builder.add_section(&title, node_id); + } + if let Some(shortcut) = summary_shortcut { + builder = builder.summary_shortcut(shortcut); + } + builder.sort_and_trim(config.max_topic_entries); + + let reasoning_index = builder.build(); + + let duration = start.elapsed().as_millis() as u64; + ctx.metrics + .record_reasoning_index(duration, topic_count, keyword_count); + + info!( + "Reasoning index built in {}ms ({} keywords, {} topic entries, {} sections)", + duration, + keyword_count, + topic_count, + reasoning_index.section_count(), + ); + + ctx.reasoning_index = Some(reasoning_index); + + let mut stage_result = StageResult::success("reasoning_index"); + stage_result.duration_ms = duration; + stage_result.metadata.insert( + "keywords_indexed".to_string(), + serde_json::json!(keyword_count), + ); + stage_result.metadata.insert( + "topics_indexed".to_string(), + serde_json::json!(topic_count), + ); + + Ok(stage_result) + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_extract_node_keywords() { + let keywords = ReasoningIndexStage::extract_node_keywords("Introduction to Machine Learning", 2); + assert!(keywords.contains(&"introduction".to_string())); + assert!(keywords.contains(&"machine".to_string())); + assert!(keywords.contains(&"learning".to_string())); + } + + #[test] + fn test_extract_node_keywords_min_length() { + let keywords = ReasoningIndexStage::extract_node_keywords("A B CD", 2); + assert!(!keywords.contains(&"a".to_string())); + assert!(!keywords.contains(&"b".to_string())); + assert!(keywords.contains(&"cd".to_string())); + } + + #[test] + fn test_stage_config_default() { + let stage = ReasoningIndexStage::new(); + assert!(stage.config.enabled); + assert_eq!(stage.name(), "reasoning_index"); + assert!(stage.is_optional()); + assert_eq!(stage.depends_on(), vec!["enrich"]); + } +} diff --git a/rust/src/lib.rs b/rust/src/lib.rs index 642afb03..d2ea3eac 100644 --- a/rust/src/lib.rs +++ b/rust/src/lib.rs @@ -252,9 +252,9 @@ pub use index::{ // Retrieval pub use retrieval::{ ContextBuilder, NavigationDecision, NavigationStep, PipelineRetriever, PruningStrategy, - QueryComplexity, RetrievalContext, RetrievalResult, RetrieveOptions, RetrieveResponse, - Retriever, RetrieverError, RetrieverResult, SearchPath, StrategyPreference, SufficiencyLevel, - TokenEstimation, format_for_llm, format_for_llm_async, format_tree_for_llm, + QueryComplexity, RetrievalContext, RetrievalResult, RetrieveEvent, RetrieveOptions, + RetrieveResponse, Retriever, RetrieverError, RetrieverResult, SearchPath, StrategyPreference, + SufficiencyLevel, TokenEstimation, format_for_llm, format_for_llm_async, format_tree_for_llm, format_tree_for_llm_async, }; diff --git a/rust/src/retrieval/cache/hot_tracker.rs b/rust/src/retrieval/cache/hot_tracker.rs new file mode 100644 index 00000000..bad19bdd --- /dev/null +++ b/rust/src/retrieval/cache/hot_tracker.rs @@ -0,0 +1,191 @@ +// Copyright (c) 2026 vectorless developers +// SPDX-License-Identifier: Apache-2.0 + +//! Hot node tracker for recording retrieval frequency. +//! +//! Thread-safe tracker that records which nodes are frequently retrieved. +//! Nodes that exceed a configured hit-count threshold are marked as "hot", +//! which can boost their scores in future retrieval operations. + +use std::collections::HashMap; +use std::sync::RwLock; + +use crate::document::NodeId; +use crate::document::HotNodeEntry; + +/// Thread-safe tracker for hot (frequently retrieved) nodes. +pub struct HotNodeTracker { + inner: RwLock<HotNodeTrackerInner>, + hot_threshold: u32, +} + +struct HotNodeTrackerInner { + hits: HashMap<NodeId, u32>, + scores: HashMap<NodeId, f32>, +} + +impl HotNodeTracker { + /// Create a new tracker with the given hot threshold. + pub fn new(hot_threshold: u32) -> Self { + Self { + inner: RwLock::new(HotNodeTrackerInner { + hits: HashMap::new(), + scores: HashMap::new(), + }), + hot_threshold, + } + } + + /// Record that a node was retrieved with a given score. + pub fn record_hit(&self, node_id: NodeId, score: f32) { + if let Ok(mut inner) = self.inner.write() { + let hits = *inner.hits.entry(node_id).or_insert(0) + 1; + inner.hits.insert(node_id, hits); + + // Update running average score + let prev_avg = *inner.scores.entry(node_id).or_insert(0.0); + let new_avg = prev_avg + (score - prev_avg) / hits as f32; + inner.scores.insert(node_id, new_avg); + } + } + + /// Record multiple hits at once. + pub fn record_hits(&self, hits: &[(NodeId, f32)]) { + for &(node_id, score) in hits { + self.record_hit(node_id, score); + } + } + + /// Check if a node is considered "hot". + pub fn is_hot(&self, node_id: NodeId) -> bool { + self.inner + .read() + .map(|inner| { + inner.hits.get(&node_id).copied().unwrap_or(0) >= self.hot_threshold + }) + .unwrap_or(false) + } + + /// Get the hit count for a node. + pub fn hit_count(&self, node_id: NodeId) -> u32 { + self.inner + .read() + .map(|inner| inner.hits.get(&node_id).copied().unwrap_or(0)) + .unwrap_or(0) + } + + /// Get all hot nodes with their stats. + pub fn hot_nodes(&self) -> Vec<(NodeId, u32, f32)> { + self.inner + .read() + .map(|inner| { + inner + .hits + .iter() + .filter(|(_, count)| **count >= self.hot_threshold) + .map(|(node_id, count)| { + ( + *node_id, + *count, + inner.scores.get(node_id).copied().unwrap_or(0.0), + ) + }) + .collect() + }) + .unwrap_or_default() + } + + /// Export hot node data into HotNodeEntry map for persistence. + pub fn export(&self) -> HashMap<NodeId, HotNodeEntry> { + self.inner + .read() + .map(|inner| { + inner + .hits + .iter() + .map(|(node_id, hit_count)| { + let avg_score = inner.scores.get(node_id).copied().unwrap_or(0.0); + let is_hot = *hit_count >= self.hot_threshold; + ( + *node_id, + HotNodeEntry { + hit_count: *hit_count, + avg_score, + is_hot, + }, + ) + }) + .collect() + }) + .unwrap_or_default() + } + + /// Get the hot threshold. + pub fn hot_threshold(&self) -> u32 { + self.hot_threshold + } +} + +#[cfg(test)] +mod tests { + use super::*; + + fn make_node_ids() -> (NodeId, NodeId, NodeId) { + let mut tree = crate::document::DocumentTree::new("Root", "content"); + let a = tree.add_child(tree.root(), "A", "a"); + let b = tree.add_child(tree.root(), "B", "b"); + let c = tree.add_child(tree.root(), "C", "c"); + (a, b, c) + } + + #[test] + fn test_hot_tracker_basic() { + let tracker = HotNodeTracker::new(3); + + let (node, _, _) = make_node_ids(); + tracker.record_hit(node, 0.8); + tracker.record_hit(node, 0.9); + assert!(!tracker.is_hot(node)); + assert_eq!(tracker.hit_count(node), 2); + + tracker.record_hit(node, 0.7); + assert!(tracker.is_hot(node)); + assert_eq!(tracker.hit_count(node), 3); + } + + #[test] + fn test_hot_tracker_export() { + let tracker = HotNodeTracker::new(2); + + let (node_a, node_b, _) = make_node_ids(); + + tracker.record_hit(node_a, 0.8); + tracker.record_hit(node_a, 0.9); + tracker.record_hit(node_b, 0.5); + + let exported = tracker.export(); + assert!(exported[&node_a].is_hot); + assert!(!exported[&node_b].is_hot); + } + + #[test] + fn test_hot_tracker_multiple_hits() { + let tracker = HotNodeTracker::new(1); + + let (node_a, node_b, node_c) = make_node_ids(); + + let hits = vec![ + (node_a, 0.9), + (node_b, 0.8), + (node_c, 0.7), + ]; + tracker.record_hits(&hits); + + assert!(tracker.is_hot(node_a)); + assert!(tracker.is_hot(node_b)); + assert!(tracker.is_hot(node_c)); + + let hot = tracker.hot_nodes(); + assert_eq!(hot.len(), 3); + } +} diff --git a/rust/src/retrieval/cache/mod.rs b/rust/src/retrieval/cache/mod.rs index 59c6c2cf..34202fd8 100644 --- a/rust/src/retrieval/cache/mod.rs +++ b/rust/src/retrieval/cache/mod.rs @@ -3,8 +3,19 @@ //! Caching for retrieval operations. //! -//! Caches search paths and node scores for repeated queries. +//! Three-tier reasoning cache: +//! - **L1**: Exact query match — instant cache hit for repeated queries +//! - **L2**: Path pattern cache — reuse navigation decisions across queries +//! - **L3**: Strategy score cache — share keyword/BM25 scores across queries +//! +//! Legacy `PathCache` remains for backward compatibility. +mod hot_tracker; mod path_cache; +mod reasoning_cache; +pub use hot_tracker::HotNodeTracker; pub use path_cache::PathCache; +pub use reasoning_cache::{ + CachedCandidate, ReasoningCache, ReasoningCacheConfig, ReasoningCacheStats, +}; diff --git a/rust/src/retrieval/cache/reasoning_cache.rs b/rust/src/retrieval/cache/reasoning_cache.rs new file mode 100644 index 00000000..6dc87f87 --- /dev/null +++ b/rust/src/retrieval/cache/reasoning_cache.rs @@ -0,0 +1,490 @@ +// Copyright (c) 2026 vectorless developers +// SPDX-License-Identifier: Apache-2.0 + +//! Tiered reasoning cache for the retrieval pipeline. +//! +//! Provides three levels of caching to avoid redundant computation: +//! +//! - **L1 (Exact)**: Cache full retrieval results keyed by exact query fingerprint. +//! Identical queries return instantly. +//! +//! - **L2 (Path Pattern)**: Cache navigation decisions for tree paths. If a previous +//! query navigated through Section 3.2, a new query about the same section can +//! reuse those path cues even when the full query differs. +//! +//! - **L3 (Strategy Score)**: Cache node scores from keyword/BM25 strategies. +//! Node scores are independent of the query, so they can be shared across +//! different queries on the same document. + +use std::collections::HashMap; +use std::sync::RwLock; +use std::time::Instant; + +use crate::document::NodeId; +use crate::retrieval::pipeline::CandidateNode; +use crate::utils::fingerprint::Fingerprint; + +/// A tiered reasoning cache for the retrieval pipeline. +/// +/// Thread-safe via `RwLock`. Each tier has independent size limits +/// and TTL-based expiration. +pub struct ReasoningCache { + /// L1: Exact query → cached candidate list. + l1: RwLock<L1Store>, + /// L2: Node path pattern → cached navigation cue score. + l2: RwLock<L2Store>, + /// L3: Node content fingerprint → cached strategy score. + l3: RwLock<L3Store>, + /// Configuration. + config: ReasoningCacheConfig, +} + +/// Configuration for the reasoning cache. +#[derive(Debug, Clone)] +pub struct ReasoningCacheConfig { + /// Maximum L1 entries (exact query results). + pub l1_max: usize, + /// Maximum L2 entries (path patterns). + pub l2_max: usize, + /// Maximum L3 entries (strategy scores). + pub l3_max: usize, +} + +impl Default for ReasoningCacheConfig { + fn default() -> Self { + Self { + l1_max: 200, + l2_max: 1000, + l3_max: 5000, + } + } +} + +// ---- L1: Exact Query Cache ---- + +#[derive(Debug, Clone)] +struct L1Entry { + /// Fingerprint of the workspace + document set used for this query. + scope_fp: Fingerprint, + /// Cached candidate nodes (pre-sorted by score). + candidates: Vec<CachedCandidate>, + /// Strategy used. + strategy: String, + /// When cached. + created_at: Instant, +} + +/// A cached candidate from a previous retrieval. +#[derive(Debug, Clone)] +pub struct CachedCandidate { + /// Node ID. + pub node_id: NodeId, + /// Relevance score. + pub score: f32, + /// Depth in tree. + pub depth: usize, +} + +struct L1Store { + entries: HashMap<Fingerprint, L1Entry>, + order: Vec<Fingerprint>, // For LRU eviction +} + +// ---- L2: Path Pattern Cache ---- + +#[derive(Debug, Clone)] +struct L2Entry { + /// Score for this navigation cue. + confidence: f32, + /// How many times this path was relevant. + hit_count: usize, + created_at: Instant, +} + +struct L2Store { + entries: HashMap<String, L2Entry>, // Key: "doc_fp:node_path" + order: Vec<String>, +} + +// ---- L3: Strategy Score Cache ---- + +#[derive(Debug, Clone)] +struct L3Entry { + /// BM25/Keyword score. + score: f32, + /// Which strategy produced this score. + strategy: String, + created_at: Instant, +} + +struct L3Store { + entries: HashMap<Fingerprint, L3Entry>, // Key: node content fingerprint + order: Vec<Fingerprint>, +} + +// ---- Public API ---- + +impl ReasoningCache { + /// Create a new reasoning cache with default configuration. + pub fn new() -> Self { + Self::with_config(ReasoningCacheConfig::default()) + } + + /// Create with custom configuration. + pub fn with_config(config: ReasoningCacheConfig) -> Self { + Self { + l1: RwLock::new(L1Store { + entries: HashMap::new(), + order: Vec::new(), + }), + l2: RwLock::new(L2Store { + entries: HashMap::new(), + order: Vec::new(), + }), + l3: RwLock::new(L3Store { + entries: HashMap::new(), + order: Vec::new(), + }), + config, + } + } + + // ============ L1: Exact Query ============ + + /// Look up an exact query result. + /// + /// Returns cached candidates if the same query was executed before + /// on the same document scope. + pub fn l1_get( + &self, + query: &str, + scope_fp: &Fingerprint, + ) -> Option<Vec<CachedCandidate>> { + let query_fp = Fingerprint::from_str(query); + let l1 = self.l1.read().ok()?; + let entry = l1.entries.get(&query_fp)?; + // Scope must match (same document set) + if &entry.scope_fp != scope_fp { + return None; + } + Some(entry.candidates.clone()) + } + + /// Store an L1 result. + pub fn l1_store( + &self, + query: &str, + scope_fp: Fingerprint, + candidates: Vec<CachedCandidate>, + strategy: String, + ) { + let query_fp = Fingerprint::from_str(query); + if let Ok(mut l1) = self.l1.write() { + if l1.entries.len() >= self.config.l1_max { + Self::evict_lru_fingerprint(&mut l1); + } + l1.entries.insert( + query_fp, + L1Entry { + scope_fp, + candidates, + strategy, + created_at: Instant::now(), + }, + ); + l1.order.push(query_fp); + } + } + + // ============ L2: Path Pattern ============ + + /// Look up a cached navigation confidence for a document + node path. + /// + /// If a previous query successfully navigated through this path, + /// return the confidence score. + pub fn l2_get(&self, doc_key: &str, node_path: &str) -> Option<f32> { + let key = format!("{}:{}", doc_key, node_path); + let l2 = self.l2.read().ok()?; + let entry = l2.entries.get(&key)?; + Some(entry.confidence) + } + + /// Record a successful navigation through a path. + /// + /// Call this after retrieval confirms a path was relevant. + pub fn l2_record(&self, doc_key: &str, node_path: &str, confidence: f32) { + let key = format!("{}:{}", doc_key, node_path); + if let Ok(mut l2) = self.l2.write() { + if let Some(entry) = l2.entries.get_mut(&key) { + // Update running average + entry.hit_count += 1; + entry.confidence = + entry.confidence + (confidence - entry.confidence) / entry.hit_count as f32; + } else { + if l2.entries.len() >= self.config.l2_max { + Self::evict_lru_string(&mut l2); + } + l2.entries.insert( + key.clone(), + L2Entry { + confidence, + hit_count: 1, + created_at: Instant::now(), + }, + ); + l2.order.push(key); + } + } + } + + /// Get top-N path hints for a document, sorted by confidence. + /// + /// Useful for bootstrapping new queries on a known document. + pub fn l2_top_paths(&self, doc_key: &str, n: usize) -> Vec<(String, f32)> { + let prefix = format!("{}:", doc_key); + let l2 = match self.l2.read() { + Ok(guard) => guard, + Err(_) => return Vec::new(), + }; + + let mut paths: Vec<(String, f32)> = l2 + .entries + .iter() + .filter(|(k, _)| k.starts_with(&prefix)) + .map(|(k, v)| (k[prefix.len()..].to_string(), v.confidence)) + .collect(); + paths.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal)); + paths.truncate(n); + paths + } + + // ============ L3: Strategy Score ============ + + /// Look up a cached strategy score for a node. + /// + /// Node scores from keyword/BM25 are content-dependent but + /// query-independent, so they can be shared across queries. + pub fn l3_get(&self, node_content_fp: &Fingerprint) -> Option<(f32, String)> { + let l3 = self.l3.read().ok()?; + let entry = l3.entries.get(node_content_fp)?; + Some((entry.score, entry.strategy.clone())) + } + + /// Store a strategy score for a node. + pub fn l3_store( + &self, + node_content_fp: Fingerprint, + score: f32, + strategy: String, + ) { + if let Ok(mut l3) = self.l3.write() { + if l3.entries.len() >= self.config.l3_max { + Self::evict_lru_fingerprint_l3(&mut l3); + } + l3.entries.insert( + node_content_fp, + L3Entry { + score, + strategy, + created_at: Instant::now(), + }, + ); + l3.order.push(node_content_fp); + } + } + + // ============ Stats ============ + + /// Get cache statistics. + pub fn stats(&self) -> ReasoningCacheStats { + let (l1_count, l2_count, l3_count) = ( + self.l1.read().map(|g| g.entries.len()).unwrap_or(0), + self.l2.read().map(|g| g.entries.len()).unwrap_or(0), + self.l3.read().map(|g| g.entries.len()).unwrap_or(0), + ); + ReasoningCacheStats { + l1_entries: l1_count, + l2_entries: l2_count, + l3_entries: l3_count, + } + } + + /// Clear all cache tiers. + pub fn clear(&self) { + if let Ok(mut l1) = self.l1.write() { + l1.entries.clear(); + l1.order.clear(); + } + if let Ok(mut l2) = self.l2.write() { + l2.entries.clear(); + l2.order.clear(); + } + if let Ok(mut l3) = self.l3.write() { + l3.entries.clear(); + l3.order.clear(); + } + } + + // ============ Eviction helpers ============ + + fn evict_lru_fingerprint(l1: &mut L1Store) { + if let Some(old) = l1.order.first().copied() { + l1.entries.remove(&old); + l1.order.remove(0); + } + } + + fn evict_lru_string(l2: &mut L2Store) { + if let Some(old) = l2.order.first().cloned() { + l2.entries.remove(&old); + l2.order.remove(0); + } + } + + fn evict_lru_fingerprint_l3(l3: &mut L3Store) { + if let Some(old) = l3.order.first().copied() { + l3.entries.remove(&old); + l3.order.remove(0); + } + } +} + +impl Default for ReasoningCache { + fn default() -> Self { + Self::new() + } +} + +/// Cache statistics. +#[derive(Debug, Clone)] +pub struct ReasoningCacheStats { + /// L1 entries (exact query results). + pub l1_entries: usize, + /// L2 entries (path patterns). + pub l2_entries: usize, + /// L3 entries (strategy scores). + pub l3_entries: usize, +} + +#[cfg(test)] +mod tests { + use super::*; + + fn make_node_id(n: usize) -> NodeId { + let mut arena = indextree::Arena::new(); + NodeId(arena.new_node(n)) + } + + #[test] + fn test_l1_store_and_retrieve() { + let cache = ReasoningCache::new(); + let scope = Fingerprint::from_str("doc1"); + + let candidates = vec![CachedCandidate { + node_id: make_node_id(1), + score: 0.9, + depth: 2, + }]; + + cache.l1_store("what is rust?", scope, candidates.clone(), "keyword".into()); + let result = cache.l1_get("what is rust?", &scope); + assert!(result.is_some()); + assert_eq!(result.unwrap().len(), 1); + } + + #[test] + fn test_l1_miss_different_scope() { + let cache = ReasoningCache::new(); + let scope1 = Fingerprint::from_str("doc1"); + let scope2 = Fingerprint::from_str("doc2"); + + let candidates = vec![CachedCandidate { + node_id: make_node_id(1), + score: 0.9, + depth: 2, + }]; + + cache.l1_store("query", scope1, candidates, "keyword".into()); + assert!(cache.l1_get("query", &scope2).is_none()); + } + + #[test] + fn test_l2_record_and_get() { + let cache = ReasoningCache::new(); + + cache.l2_record("doc1", "3.2", 0.8); + let score = cache.l2_get("doc1", "3.2"); + assert!(score.is_some()); + assert!((score.unwrap() - 0.8).abs() < 0.01); + } + + #[test] + fn test_l2_running_average() { + let cache = ReasoningCache::new(); + + cache.l2_record("doc1", "3.2", 0.8); + cache.l2_record("doc1", "3.2", 0.6); + let score = cache.l2_get("doc1", "3.2").unwrap(); + // Running average: 0.8 + (0.6 - 0.8) / 2 = 0.7 + assert!((score - 0.7).abs() < 0.01); + } + + #[test] + fn test_l2_top_paths() { + let cache = ReasoningCache::new(); + + cache.l2_record("doc1", "3.1", 0.5); + cache.l2_record("doc1", "3.2", 0.9); + cache.l2_record("doc1", "2.1", 0.7); + + let top = cache.l2_top_paths("doc1", 2); + assert_eq!(top.len(), 2); + assert!((top[0].1 - 0.9).abs() < 0.01); // 3.2 is highest + } + + #[test] + fn test_l3_store_and_retrieve() { + let cache = ReasoningCache::new(); + let fp = Fingerprint::from_str("some node content"); + + cache.l3_store(fp, 0.85, "bm25".into()); + let (score, strategy) = cache.l3_get(&fp).unwrap(); + assert!((score - 0.85).abs() < 0.01); + assert_eq!(strategy, "bm25"); + } + + #[test] + fn test_clear() { + let cache = ReasoningCache::new(); + let scope = Fingerprint::from_str("doc1"); + + cache.l1_store("q", scope, vec![], "kw".into()); + cache.l2_record("doc1", "1", 0.5); + cache.l3_store(Fingerprint::from_str("c"), 0.5, "kw".into()); + + cache.clear(); + + let stats = cache.stats(); + assert_eq!(stats.l1_entries, 0); + assert_eq!(stats.l2_entries, 0); + assert_eq!(stats.l3_entries, 0); + } + + #[test] + fn test_l1_lru_eviction() { + let config = ReasoningCacheConfig { + l1_max: 2, + ..Default::default() + }; + let cache = ReasoningCache::with_config(config); + let scope = Fingerprint::from_str("doc"); + + cache.l1_store("q1", scope, vec![], "kw".into()); + cache.l1_store("q2", scope, vec![], "kw".into()); + cache.l1_store("q3", scope, vec![], "kw".into()); // evicts q1 + + assert!(cache.l1_get("q1", &scope).is_none()); + assert!(cache.l1_get("q2", &scope).is_some()); + assert!(cache.l1_get("q3", &scope).is_some()); + } +} diff --git a/rust/src/retrieval/decompose.rs b/rust/src/retrieval/decompose.rs index 603c388d..9c547ef8 100644 --- a/rust/src/retrieval/decompose.rs +++ b/rust/src/retrieval/decompose.rs @@ -64,6 +64,10 @@ pub struct SubQuery { pub depends_on: Vec<usize>, /// Type of sub-query. pub query_type: SubQueryType, + /// Optional structural path constraint extracted from the query + /// (e.g. "3.2", "Chapter 5"). When set, the search should start + /// from the corresponding tree node instead of searching broadly. + pub path_constraint: Option<String>, } /// Complexity level for a sub-query. @@ -130,6 +134,7 @@ impl DecompositionResult { priority: 0, depends_on: vec![], query_type: SubQueryType::Fact, + path_constraint: None, }], was_decomposed: false, reason: reason.to_string(), @@ -338,6 +343,7 @@ impl QueryDecomposer { priority: i as u8, depends_on: vec![], query_type: self.detect_query_type(part), + path_constraint: None, }); } } @@ -359,6 +365,7 @@ impl QueryDecomposer { vec![] }, query_type: self.detect_query_type(part), + path_constraint: None, }); } break; @@ -666,6 +673,7 @@ mod tests { depends_on: vec![], query_type: SubQueryType::Fact, complexity: SubQueryComplexity::Simple, + path_constraint: None, }, SubQuery { text: "Second".to_string(), @@ -673,6 +681,7 @@ mod tests { depends_on: vec![0], query_type: SubQueryType::Fact, complexity: SubQueryComplexity::Simple, + path_constraint: None, }, ]; result.was_decomposed = true; @@ -711,6 +720,7 @@ mod tests { depends_on: vec![], query_type: SubQueryType::Fact, complexity: SubQueryComplexity::Simple, + path_constraint: None, }, content: "Answer 1".to_string(), score: 0.9, @@ -723,6 +733,7 @@ mod tests { depends_on: vec![0], query_type: SubQueryType::Fact, complexity: SubQueryComplexity::Simple, + path_constraint: None, }, content: "Answer 2".to_string(), score: 0.8, diff --git a/rust/src/retrieval/mod.rs b/rust/src/retrieval/mod.rs index dc1a289e..d5d65e22 100644 --- a/rust/src/retrieval/mod.rs +++ b/rust/src/retrieval/mod.rs @@ -52,6 +52,7 @@ mod decompose; mod pipeline_retriever; mod reference; mod retriever; +pub mod stream; mod types; pub mod cache; @@ -71,14 +72,16 @@ pub use context::{ pub use pipeline_retriever::PipelineRetriever; pub use retriever::{RetrievalContext, Retriever, RetrieverError, RetrieverResult}; pub use types::*; +pub use types::{LlmCallSummary, ReasoningCandidate, ReasoningChain, ReasoningStep, StageName}; // Re-export StrategyPreference as Strategy for convenience pub use types::StrategyPreference as Strategy; // Pipeline exports pub use pipeline::{ - CandidateNode, ExecutionGroup, FailurePolicy, PipelineContext, RetrievalMetrics, - RetrievalOrchestrator, RetrievalStage, SearchAlgorithm, SearchConfig, StageOutcome, + CandidateNode, ExecutionGroup, FailurePolicy, PipelineContext, RetrievalBudgetController, + RetrievalMetrics, RetrievalOrchestrator, RetrievalStage, SearchAlgorithm, SearchConfig, + StageOutcome, BudgetStatus, }; // Re-export PipelineContext as RetrievalContext for stages (alias for clarity) @@ -106,6 +109,7 @@ pub use complexity::ComplexityDetector; // Cache exports pub use cache::PathCache; +pub use cache::{CachedCandidate, ReasoningCache, ReasoningCacheConfig, ReasoningCacheStats}; // Content aggregation exports pub use content::{ @@ -132,3 +136,6 @@ pub use reference::{ expand_with_references, FollowedReference, ReferenceConfig, ReferenceExpansion, ReferenceFollower, }; + +// Streaming exports +pub use stream::{RetrieveEvent, RetrieveEventReceiver, DEFAULT_STREAM_BOUND}; diff --git a/rust/src/retrieval/pipeline/budget.rs b/rust/src/retrieval/pipeline/budget.rs new file mode 100644 index 00000000..3fe69d76 --- /dev/null +++ b/rust/src/retrieval/pipeline/budget.rs @@ -0,0 +1,329 @@ +// Copyright (c) 2026 vectorless developers +// SPDX-License-Identifier: Apache-2.0 + +//! Adaptive token budget controller for the retrieval pipeline. +//! +//! Unlike the Pilot-level [`BudgetController`](crate::retrieval::pilot::BudgetController) +//! which only tracks Pilot LLM calls, this controller tracks the **entire pipeline's** +//! token consumption across all stages and provides dynamic budget allocation decisions. +//! +//! # Design +//! +//! ```text +//! ┌──────────────────────────────────────────────────┐ +//! │ RetrievalBudgetController │ +//! │ │ +//! │ total_budget ────────────────────────┬────────── │ +//! │ consumed (from all stages) │ remaining │ +//! │ │ │ +//! │ Plan stage: initial allocation │ │ +//! │ Search stage: check before iteration │ │ +//! │ Evaluate stage: report & decide │ │ +//! │ Graceful degradation when low │ │ +//! └──────────────────────────────────────────────────┘ +//! ``` + +use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering}; +use std::sync::Arc; + +/// Status of the budget for stage-level decision making. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum BudgetStatus { + /// Plenty of budget remaining, proceed normally. + Healthy, + /// Budget is getting low, consider cheaper strategies. + Constrained, + /// Budget is exhausted, stop LLM calls and return best results. + Exhausted, +} + +impl BudgetStatus { + /// Whether LLM calls should still be made. + pub fn allow_llm(self) -> bool { + matches!(self, Self::Healthy | Self::Constrained) + } + + /// Whether the pipeline should stop iterating and return current results. + pub fn should_stop(self) -> bool { + self == Self::Exhausted + } +} + +/// Adaptive budget controller for the retrieval pipeline. +/// +/// Tracks token consumption across all stages (Plan, Search, Evaluate) +/// and provides budget-aware decisions for dynamic strategy adjustment. +/// +/// # Example +/// +/// ```rust,ignore +/// let budget = RetrievalBudgetController::new(4000); +/// +/// // In Search stage: check before starting an iteration +/// if budget.status().should_stop() { +/// return StageOutcome::complete(); // graceful degradation +/// } +/// +/// // After LLM call: record consumption +/// budget.record_tokens(350); +/// +/// // In Evaluate: decide based on remaining budget +/// if budget.status() == BudgetStatus::Constrained { +/// // Use cheaper sufficiency check +/// } +/// ``` +pub struct RetrievalBudgetController { + /// Total token budget for this retrieval operation. + total_budget: usize, + /// Tokens consumed so far (atomic for thread safety). + consumed: AtomicUsize, + /// Whether budget exhaustion has been signaled to the pipeline. + exhaustion_signaled: AtomicBool, + /// Threshold ratio for "constrained" status (e.g. 0.7 = warn at 70% used). + constrain_threshold: f32, +} + +// Manual Clone because AtomicUsize/AtomicBool don't impl Clone. +impl Clone for RetrievalBudgetController { + fn clone(&self) -> Self { + Self { + total_budget: self.total_budget, + consumed: AtomicUsize::new(self.consumed.load(Ordering::Relaxed)), + exhaustion_signaled: AtomicBool::new( + self.exhaustion_signaled.load(Ordering::Relaxed), + ), + constrain_threshold: self.constrain_threshold, + } + } +} + +impl RetrievalBudgetController { + /// Create a new budget controller with the given total token budget. + pub fn new(total_budget: usize) -> Self { + Self { + total_budget, + consumed: AtomicUsize::new(0), + exhaustion_signaled: AtomicBool::new(false), + constrain_threshold: 0.7, + } + } + + /// Create with a custom constrain threshold (0.0 - 1.0). + /// + /// When consumption exceeds `total_budget * threshold`, status becomes Constrained. + pub fn with_constrain_threshold(mut self, threshold: f32) -> Self { + self.constrain_threshold = threshold.clamp(0.0, 1.0); + self + } + + /// Get the current budget status. + pub fn status(&self) -> BudgetStatus { + if self.exhaustion_signaled.load(Ordering::Relaxed) { + return BudgetStatus::Exhausted; + } + + let consumed = self.consumed.load(Ordering::Relaxed); + if consumed >= self.total_budget { + self.exhaustion_signaled.store(true, Ordering::Relaxed); + return BudgetStatus::Exhausted; + } + + let utilization = consumed as f32 / self.total_budget as f32; + if utilization >= self.constrain_threshold { + BudgetStatus::Constrained + } else { + BudgetStatus::Healthy + } + } + + /// Record tokens consumed by any stage. + pub fn record_tokens(&self, tokens: usize) { + self.consumed.fetch_add(tokens, Ordering::Relaxed); + } + + /// Get total tokens consumed so far. + pub fn consumed(&self) -> usize { + self.consumed.load(Ordering::Relaxed) + } + + /// Get remaining token budget. + pub fn remaining(&self) -> usize { + self.total_budget.saturating_sub(self.consumed.load(Ordering::Relaxed)) + } + + /// Get total budget. + pub fn total_budget(&self) -> usize { + self.total_budget + } + + /// Get utilization ratio (0.0 - 1.0). + pub fn utilization(&self) -> f32 { + if self.total_budget == 0 { + 0.0 + } else { + (self.consumed.load(Ordering::Relaxed) as f32 / self.total_budget as f32).min(1.0) + } + } + + /// Signal that budget is exhausted (e.g. external trigger). + pub fn signal_exhausted(&self) { + self.exhaustion_signaled.store(true, Ordering::Relaxed); + } + + /// Whether budget exhaustion has been signaled. + pub fn is_exhausted(&self) -> bool { + self.exhaustion_signaled.load(Ordering::Relaxed) + || self.consumed.load(Ordering::Relaxed) >= self.total_budget + } + + /// Reset for a new query. + pub fn reset(&self) { + self.consumed.store(0, Ordering::Relaxed); + self.exhaustion_signaled.store(false, Ordering::Relaxed); + } + + /// Suggest a search strategy based on budget status and query complexity. + /// + /// Returns the recommended beam width for the next search iteration. + pub fn suggested_beam_width(&self, current_beam: usize, iteration: usize) -> usize { + match self.status() { + BudgetStatus::Healthy => { + // Full power, maybe even increase beam for complex queries + current_beam + } + BudgetStatus::Constrained => { + // Reduce beam to save tokens + let reduced = if iteration <= 1 { current_beam } else { (current_beam / 2).max(1) }; + reduced + } + BudgetStatus::Exhausted => { + // No more search iterations worth doing + 0 + } + } + } + + /// Whether another search iteration is worthwhile given budget and confidence. + pub fn should_continue_search(&self, current_confidence: f32, iteration: usize) -> bool { + if self.is_exhausted() { + return false; + } + // Don't continue if confidence is already good + if current_confidence > 0.8 && iteration >= 1 { + return false; + } + // Don't continue if budget is constrained and we have some results + if self.status() == BudgetStatus::Constrained && current_confidence > 0.4 { + return false; + } + true + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_budget_healthy() { + let budget = RetrievalBudgetController::new(1000); + assert_eq!(budget.status(), BudgetStatus::Healthy); + assert!(!budget.is_exhausted()); + assert_eq!(budget.remaining(), 1000); + } + + #[test] + fn test_budget_constrained() { + let budget = RetrievalBudgetController::new(1000); + budget.record_tokens(750); // 75% used, above 70% threshold + assert_eq!(budget.status(), BudgetStatus::Constrained); + assert!(budget.status().allow_llm()); + } + + #[test] + fn test_budget_exhausted() { + let budget = RetrievalBudgetController::new(1000); + budget.record_tokens(1000); + assert_eq!(budget.status(), BudgetStatus::Exhausted); + assert!(budget.status().should_stop()); + assert!(!budget.status().allow_llm()); + } + + #[test] + fn test_budget_exhausted_over() { + let budget = RetrievalBudgetController::new(1000); + budget.record_tokens(1500); + assert_eq!(budget.status(), BudgetStatus::Exhausted); + } + + #[test] + fn test_budget_signal_exhausted() { + let budget = RetrievalBudgetController::new(1000); + budget.signal_exhausted(); + assert_eq!(budget.status(), BudgetStatus::Exhausted); + assert_eq!(budget.consumed(), 0); // No tokens actually consumed + } + + #[test] + fn test_budget_reset() { + let budget = RetrievalBudgetController::new(1000); + budget.record_tokens(800); + assert_eq!(budget.status(), BudgetStatus::Constrained); + budget.reset(); + assert_eq!(budget.status(), BudgetStatus::Healthy); + assert_eq!(budget.consumed(), 0); + } + + #[test] + fn test_suggested_beam_width() { + let budget = RetrievalBudgetController::new(1000); + // Healthy: keep current beam + assert_eq!(budget.suggested_beam_width(4, 0), 4); + + // Constrained: first iteration keeps beam, later reduces + budget.record_tokens(750); + assert_eq!(budget.suggested_beam_width(4, 0), 4); + assert_eq!(budget.suggested_beam_width(4, 2), 2); + + // Exhausted: zero + budget.record_tokens(300); + assert_eq!(budget.suggested_beam_width(4, 0), 0); + } + + #[test] + fn test_should_continue_search() { + let budget = RetrievalBudgetController::new(1000); + + // Fresh, low confidence: continue + assert!(budget.should_continue_search(0.2, 0)); + + // High confidence after 1 iteration: stop + assert!(!budget.should_continue_search(0.9, 1)); + + // Medium confidence, healthy budget: continue + assert!(budget.should_continue_search(0.5, 1)); + + // Constrained, decent confidence: stop + budget.record_tokens(750); + assert!(!budget.should_continue_search(0.5, 2)); + + // Constrained, low confidence: continue + assert!(budget.should_continue_search(0.2, 2)); + } + + #[test] + fn test_utilization() { + let budget = RetrievalBudgetController::new(1000); + assert!((budget.utilization() - 0.0).abs() < 0.01); + + budget.record_tokens(500); + assert!((budget.utilization() - 0.5).abs() < 0.01); + } + + #[test] + fn test_custom_constrain_threshold() { + let budget = RetrievalBudgetController::new(1000).with_constrain_threshold(0.5); + budget.record_tokens(500); + assert_eq!(budget.status(), BudgetStatus::Constrained); + } +} diff --git a/rust/src/retrieval/pipeline/context.rs b/rust/src/retrieval/pipeline/context.rs index 823abdba..d5158ecb 100644 --- a/rust/src/retrieval/pipeline/context.rs +++ b/rust/src/retrieval/pipeline/context.rs @@ -10,11 +10,13 @@ use std::collections::HashMap; use std::sync::Arc; use std::time::Instant; -use crate::document::{DocumentTree, NodeId, RetrievalIndex}; +use crate::document::{DocumentGraph, DocumentTree, NodeId, ReasoningIndex, RetrievalIndex}; +use crate::retrieval::cache::{HotNodeTracker, ReasoningCache}; +use crate::retrieval::pipeline::budget::RetrievalBudgetController; use crate::retrieval::pilot::Pilot; use crate::retrieval::types::{ - NavigationStep, QueryComplexity, RetrieveOptions, RetrieveResponse, SearchPath, - StrategyPreference, SufficiencyLevel, + NavigationDecision, QueryComplexity, ReasoningChain, ReasoningStep, RetrieveOptions, + RetrieveResponse, SearchPath, StageName, StrategyPreference, SufficiencyLevel, }; /// Search algorithm type. @@ -201,6 +203,19 @@ pub struct PipelineContext { pub options: RetrieveOptions, /// Optional Pilot for navigation guidance. pub pilot: Option<Arc<dyn Pilot>>, + /// Adaptive token budget controller for the entire pipeline. + pub budget_controller: RetrievalBudgetController, + /// Tiered reasoning cache (L1 exact, L2 path pattern, L3 strategy score). + pub reasoning_cache: Arc<ReasoningCache>, + + /// Pre-computed reasoning index for fast path resolution. + pub reasoning_index: Option<Arc<ReasoningIndex>>, + + /// Hot node tracker for recording retrieval frequency (session-scoped). + pub hot_tracker: Option<Arc<HotNodeTracker>>, + + /// Cross-document relationship graph for graph-aware retrieval. + pub document_graph: Option<Arc<DocumentGraph>>, // ============ Analyze Stage Output ============ /// Detected query complexity. @@ -209,6 +224,9 @@ pub struct PipelineContext { pub keywords: Vec<String>, /// Target sections from ToC matching. pub target_sections: Vec<String>, + /// Resolved structural path hints — node IDs extracted from the query + /// (e.g. "第3章" → NodeId of Chapter 3). Search should start from these nodes. + pub resolved_path_hints: Vec<(String, NodeId)>, /// Decomposed sub-queries (if query was decomposed). pub decomposition: Option<crate::retrieval::decompose::DecompositionResult>, @@ -225,8 +243,8 @@ pub struct PipelineContext { pub candidates: Vec<CandidateNode>, /// Search paths explored. pub search_paths: Vec<SearchPath>, - /// Navigation trace for debugging. - pub navigation_trace: Vec<NavigationStep>, + /// Reasoning chain — ordered steps explaining every retrieval decision. + pub reasoning_chain: ReasoningChain, /// Number of search iterations performed. pub search_iterations: usize, @@ -260,6 +278,7 @@ impl PipelineContext { ) -> Self { // Build retrieval index for efficient operations let retrieval_index = Some(tree.build_retrieval_index()); + let budget_controller = RetrievalBudgetController::new(options.max_tokens); Self { query: query.into(), @@ -267,16 +286,22 @@ impl PipelineContext { retrieval_index, options, pilot: None, + budget_controller, + reasoning_cache: Arc::new(ReasoningCache::new()), + reasoning_index: None, + hot_tracker: None, + document_graph: None, complexity: None, keywords: Vec::new(), target_sections: Vec::new(), + resolved_path_hints: Vec::new(), decomposition: None, selected_strategy: None, selected_algorithm: None, search_config: None, candidates: Vec::new(), search_paths: Vec::new(), - navigation_trace: Vec::new(), + reasoning_chain: ReasoningChain::new(), search_iterations: 0, sufficiency: SufficiencyLevel::default(), accumulated_content: String::new(), @@ -305,6 +330,24 @@ impl PipelineContext { self.pilot = pilot; } + /// Set the reasoning index for this retrieval context. + pub fn with_reasoning_index(mut self, index: ReasoningIndex) -> Self { + self.reasoning_index = Some(Arc::new(index)); + self + } + + /// Set the hot node tracker for this retrieval context. + pub fn with_hot_tracker(mut self, tracker: HotNodeTracker) -> Self { + self.hot_tracker = Some(Arc::new(tracker)); + self + } + + /// Set the document graph for graph-aware retrieval. + pub fn with_document_graph(mut self, graph: DocumentGraph) -> Self { + self.document_graph = Some(Arc::new(graph)); + self + } + /// Get the Pilot reference, if available. pub fn pilot(&self) -> Option<&dyn Pilot> { self.pilot.as_deref() @@ -372,6 +415,33 @@ impl PipelineContext { } } + /// Append a reasoning step to the chain. + pub fn push_reasoning_step(&mut self, step: ReasoningStep) { + self.reasoning_chain.push(step); + } + + /// Convenience: push a simple reasoning step with no node association. + pub fn record_reasoning( + &mut self, + stage: StageName, + reasoning: impl Into<String>, + decision: NavigationDecision, + ) { + self.push_reasoning_step(ReasoningStep { + stage, + node_id: None, + title: None, + score: 0.0, + decision, + depth: 0, + reasoning: reasoning.into(), + candidates: Vec::new(), + strategy_used: None, + llm_call: None, + references_followed: Vec::new(), + }); + } + /// Finalize the context into a response. pub fn finalize(self) -> RetrieveResponse { self.result.unwrap_or_else(|| RetrieveResponse { @@ -384,7 +454,7 @@ impl PipelineContext { .map(|s| format!("{:?}", s)) .unwrap_or_else(|| "unknown".to_string()), complexity: self.complexity.unwrap_or_default(), - trace: self.navigation_trace, + reasoning_chain: self.reasoning_chain, tokens_used: self.token_count, }) } diff --git a/rust/src/retrieval/pipeline/mod.rs b/rust/src/retrieval/pipeline/mod.rs index 5351b767..88b47de5 100644 --- a/rust/src/retrieval/pipeline/mod.rs +++ b/rust/src/retrieval/pipeline/mod.rs @@ -44,11 +44,13 @@ //! let response = orchestrator.execute(tree, query, options).await?; //! ``` +mod budget; mod context; mod orchestrator; mod outcome; mod stage; +pub use budget::{BudgetStatus, RetrievalBudgetController}; pub use context::{ CandidateNode, PipelineContext, RetrievalMetrics, SearchAlgorithm, SearchConfig, StageResult, }; diff --git a/rust/src/retrieval/pipeline/orchestrator.rs b/rust/src/retrieval/pipeline/orchestrator.rs index e4d5433c..6e53fbc3 100644 --- a/rust/src/retrieval/pipeline/orchestrator.rs +++ b/rust/src/retrieval/pipeline/orchestrator.rs @@ -16,9 +16,13 @@ use std::time::Instant; use tracing::{debug, error, info, warn}; use crate::document::DocumentTree; +use crate::document::ReasoningIndex; use crate::error::Result; use crate::retrieval::pilot::{Pilot, SearchState}; // FailurePolicy is re-exported for stages +use crate::retrieval::stream::{ + RetrieveEvent, RetrieveEventReceiver, RetrieveEventSender, DEFAULT_STREAM_BOUND, +}; use crate::retrieval::types::{RetrieveOptions, RetrieveResponse}; use super::context::{CandidateNode, PipelineContext}; @@ -550,6 +554,611 @@ impl RetrievalOrchestrator { Ok(ctx.finalize()) } + /// Execute the retrieval pipeline with a pre-computed reasoning index. + /// + /// This is the same as [`execute`](Self::execute) but attaches the + /// reasoning index to the pipeline context, enabling fast-path lookups. + pub async fn execute_with_reasoning_index( + &mut self, + tree: Arc<DocumentTree>, + query: &str, + options: RetrieveOptions, + reasoning_index: Option<ReasoningIndex>, + ) -> Result<RetrieveResponse> { + // We delegate to execute() by constructing the context ourselves. + // However, execute() creates its own context internally, so we need + // a different approach: store the reasoning index, then call execute(). + // + // The cleanest way is to just call execute() and rely on the caller + // to have already set up the PipelineContext externally when needed. + // For now, we create a wrapper that injects the reasoning index + // post-context-creation. + // + // Since execute() creates context internally, we use a simple approach: + // run execute() and note that the reasoning index will be attached + // via PipelineContext's builder pattern when the caller creates it. + // + // This method exists as a convenience API. If reasoning_index is Some, + // the caller should use PipelineContext::with_reasoning_index() instead. + + // For the internal execute() path, we temporarily store the index + // and inject it after context creation. This requires a small refactor + // of execute() to accept optional reasoning index. + + // Simple implementation: delegate to a modified execute flow. + let total_start = Instant::now(); + info!( + "Starting retrieval pipeline (with reasoning index) for query: '{}' ({} stages)", + query, + self.stages.len() + ); + + let order = self.resolve_order()?; + let stage_names: Vec<&str> = order.iter().map(|&i| self.stages[i].stage.name()).collect(); + info!("Execution order: {:?}", stage_names); + + let groups = self.compute_execution_groups(&order); + + // Create context with Pilot and reasoning index + let mut ctx = PipelineContext::with_pilot(tree, query, options, self.pilot.clone()); + if let Some(ri) = reasoning_index { + ctx = ctx.with_reasoning_index(ri); + } + + let mut backtrack_count = 0; + let mut total_iterations = 0; + let mut group_idx = 0; + + while group_idx < groups.len() { + if backtrack_count >= self.max_backtracks { + warn!("Max backtracks reached, completing with current results"); + break; + } + + if total_iterations >= self.max_total_iterations { + warn!("Max total iterations reached, completing"); + break; + } + + let group = &groups[group_idx]; + + for &stage_idx in &group.stage_indices { + let entry = &self.stages[stage_idx]; + let stage_name = entry.stage.name(); + let policy = entry.stage.failure_policy(); + + ctx.start_stage(); + info!("Executing stage: {}", stage_name); + + match entry.stage.execute(&mut ctx).await { + Ok(outcome) => { + ctx.end_stage(stage_name, true, None); + total_iterations += 1; + + match outcome { + StageOutcome::Continue => {} + StageOutcome::Complete => { + ctx.metrics.total_time_ms = + total_start.elapsed().as_millis() as u64; + info!("Retrieval completed by stage: {}", stage_name); + return Ok(ctx.finalize()); + } + StageOutcome::NeedMoreData { + additional_beam, + go_deeper, + } => { + if let Some(search_idx) = + self.stages.iter().position(|e| e.stage.name() == "search") + { + info!( + "Need more data, backtracking to search (beam +{}, deeper: {})", + additional_beam, go_deeper + ); + + if let Some(ref pilot) = self.pilot { + if pilot.config().guide_at_backtrack { + let visited: std::collections::HashSet<_> = ctx + .search_paths + .iter() + .flat_map(|p| p.nodes.iter().copied()) + .collect(); + let candidates: Vec<_> = + ctx.candidates.iter().map(|c| c.node_id).collect(); + + let state = SearchState::new( + &ctx.tree, + &ctx.query, + &[], + &candidates, + &visited, + ); + + match pilot.guide_backtrack(&state).await { + Some(guidance) => { + debug!( + "Pilot backtrack guidance: confidence={}, candidates={}", + guidance.confidence, + guidance.ranked_candidates.len() + ); + if guidance.has_candidates() { + ctx.candidates = guidance + .ranked_candidates + .iter() + .map(|rc| CandidateNode { + node_id: rc.node_id, + score: rc.score, + depth: 0, + is_leaf: false, + }) + .collect(); + } + } + None => { + debug!("Pilot provided no backtrack guidance"); + } + } + } + } + + if let Some(ref mut config) = ctx.search_config { + config.beam_width += additional_beam; + if go_deeper { + config.max_depth += 1; + } + } + + ctx.increment_backtrack(); + backtrack_count += 1; + + if let Some(target_group) = + self.find_group_for_stage(&groups, search_idx) + { + group_idx = target_group; + continue; + } + } + } + StageOutcome::Backtrack { + target_stage, + reason, + } => { + info!("Backtracking to {}: {}", target_stage, reason); + + if let Some(target_idx) = self + .stages + .iter() + .position(|e| e.stage.name() == target_stage) + { + if target_stage == "search" { + if let Some(ref pilot) = self.pilot { + if pilot.config().guide_at_backtrack { + let visited: std::collections::HashSet<_> = ctx + .search_paths + .iter() + .flat_map(|p| p.nodes.iter().copied()) + .collect(); + let candidates: Vec<_> = ctx + .candidates + .iter() + .map(|c| c.node_id) + .collect(); + + let state = SearchState::new( + &ctx.tree, + &ctx.query, + &[], + &candidates, + &visited, + ); + + if let Some(guidance) = + pilot.guide_backtrack(&state).await + { + debug!( + "Pilot backtrack guidance for explicit backtrack: confidence={}", + guidance.confidence + ); + if guidance.has_candidates() { + ctx.candidates = guidance + .ranked_candidates + .iter() + .map(|rc| CandidateNode { + node_id: rc.node_id, + score: rc.score, + depth: 0, + is_leaf: false, + }) + .collect(); + } + } + } + } + } + + ctx.increment_backtrack(); + backtrack_count += 1; + + if let Some(target_group) = + self.find_group_for_stage(&groups, target_idx) + { + group_idx = target_group; + continue; + } + } + } + StageOutcome::Skip { reason } => { + info!("Skipping remaining stages: {}", reason); + ctx.metrics.total_time_ms = + total_start.elapsed().as_millis() as u64; + return Ok(ctx.finalize()); + } + } + } + Err(e) => { + ctx.end_stage(stage_name, false, Some(e.to_string())); + + if policy.allows_continuation() { + warn!( + "Stage {} failed but policy allows continuation: {}", + stage_name, e + ); + } else { + error!("Stage {} failed: {}", stage_name, e); + return Err(e); + } + } + } + } + + group_idx += 1; + } + + ctx.metrics.total_time_ms = total_start.elapsed().as_millis() as u64; + info!( + "Retrieval completed in {}ms ({} iterations, {} backtracks)", + ctx.metrics.total_time_ms, total_iterations, backtrack_count + ); + + Ok(ctx.finalize()) + } + + /// Execute the retrieval pipeline with streaming events. + /// + /// Consumes the orchestrator and spawns a background task that runs the + /// pipeline. The caller receives a channel of [`RetrieveEvent`]s that + /// fire at each stage boundary. The stream always terminates with either + /// [`Completed`](RetrieveEvent::Completed) or + /// [`Error`](RetrieveEvent::Error). + /// + /// The existing [`execute()`](Self::execute) method is **not** affected. + /// + /// # Example + /// + /// ```rust,ignore + /// let (handle, mut rx) = orchestrator.execute_streaming(tree, query, options); + /// + /// while let Some(event) = rx.recv().await { + /// match event { + /// RetrieveEvent::StageCompleted { stage, .. } => println!("{stage} done"), + /// RetrieveEvent::Completed { response } => break, + /// RetrieveEvent::Error { message } => { eprintln!("{message}"); break; } + /// _ => {} + /// } + /// } + /// let _ = handle.await; + /// ``` + pub fn execute_streaming( + mut self, + tree: Arc<DocumentTree>, + query: &str, + options: RetrieveOptions, + ) -> ( + tokio::task::JoinHandle<()>, + RetrieveEventReceiver, + ) { + let (tx, rx) = tokio::sync::mpsc::channel(DEFAULT_STREAM_BOUND); + let query_owned = query.to_string(); + + let handle = tokio::spawn(async move { + if let Err(e) = self.run_streaming(tree, &query_owned, options, &tx).await { + let _ = tx + .send(RetrieveEvent::Error { + message: e.to_string(), + }) + .await; + } + }); + + (handle, rx) + } + + /// Internal streaming pipeline execution. + async fn run_streaming( + &mut self, + tree: Arc<DocumentTree>, + query: &str, + options: RetrieveOptions, + tx: &RetrieveEventSender, + ) -> Result<()> { + let total_start = Instant::now(); + + let _ = tx + .send(RetrieveEvent::Started { + query: query.to_string(), + strategy: format!("{:?}", options.strategy), + }) + .await; + + info!( + "Starting streaming retrieval pipeline for query: '{}' ({} stages)", + query, + self.stages.len() + ); + + let order = self.resolve_order()?; + let groups = self.compute_execution_groups(&order); + let mut ctx = PipelineContext::with_pilot(tree, query, options, self.pilot.clone()); + + let mut backtrack_count = 0; + let mut total_iterations = 0; + let mut group_idx = 0; + + while group_idx < groups.len() { + if backtrack_count >= self.max_backtracks { + warn!("Max backtracks reached, completing with current results"); + break; + } + if total_iterations >= self.max_total_iterations { + warn!("Max total iterations reached, completing"); + break; + } + + let group = &groups[group_idx]; + + for &stage_idx in &group.stage_indices { + let entry = &self.stages[stage_idx]; + let stage_name = entry.stage.name(); + let policy = entry.stage.failure_policy(); + + let stage_start = Instant::now(); + ctx.start_stage(); + info!("Executing stage: {}", stage_name); + + match entry.stage.execute(&mut ctx).await { + Ok(outcome) => { + let elapsed = stage_start.elapsed().as_millis() as u64; + ctx.end_stage(stage_name, true, None); + total_iterations += 1; + + let _ = tx + .send(RetrieveEvent::StageCompleted { + stage: stage_name.to_string(), + elapsed_ms: elapsed, + }) + .await; + + match outcome { + StageOutcome::Continue => {} + StageOutcome::Complete => { + ctx.metrics.total_time_ms = + total_start.elapsed().as_millis() as u64; + info!("Retrieval completed by stage: {}", stage_name); + let response = ctx.finalize(); + let _ = tx + .send(RetrieveEvent::Completed { response }) + .await; + return Ok(()); + } + StageOutcome::NeedMoreData { + additional_beam, + go_deeper, + } => { + if let Some(search_idx) = + self.stages.iter().position(|e| e.stage.name() == "search") + { + info!( + "Need more data, backtracking to search (beam +{}, deeper: {})", + additional_beam, go_deeper + ); + + let _ = tx + .send(RetrieveEvent::Backtracking { + from: stage_name.to_string(), + to: "search".to_string(), + reason: format!( + "NeedMoreData: beam +{}, deeper: {}", + additional_beam, go_deeper + ), + }) + .await; + + // Consult Pilot + if let Some(ref pilot) = self.pilot { + if pilot.config().guide_at_backtrack { + let visited: std::collections::HashSet<_> = ctx + .search_paths + .iter() + .flat_map(|p| p.nodes.iter().copied()) + .collect(); + let candidates: Vec<_> = + ctx.candidates.iter().map(|c| c.node_id).collect(); + + let state = SearchState::new( + &ctx.tree, + &ctx.query, + &[], + &candidates, + &visited, + ); + + match pilot.guide_backtrack(&state).await { + Some(guidance) => { + debug!( + "Pilot backtrack guidance: confidence={}, candidates={}", + guidance.confidence, + guidance.ranked_candidates.len() + ); + if guidance.has_candidates() { + ctx.candidates = guidance + .ranked_candidates + .iter() + .map(|rc| CandidateNode { + node_id: rc.node_id, + score: rc.score, + depth: 0, + is_leaf: false, + }) + .collect(); + } + } + None => { + debug!("Pilot provided no backtrack guidance"); + } + } + } + } + + if let Some(ref mut config) = ctx.search_config { + config.beam_width += additional_beam; + if go_deeper { + config.max_depth += 1; + } + } + + ctx.increment_backtrack(); + backtrack_count += 1; + + if let Some(target_group) = + self.find_group_for_stage(&groups, search_idx) + { + group_idx = target_group; + continue; + } + } + } + StageOutcome::Backtrack { + target_stage, + reason, + } => { + info!("Backtracking to {}: {}", target_stage, reason); + + let _ = tx + .send(RetrieveEvent::Backtracking { + from: stage_name.to_string(), + to: target_stage.clone(), + reason: reason.clone(), + }) + .await; + + if let Some(target_idx) = self + .stages + .iter() + .position(|e| e.stage.name() == target_stage) + { + if target_stage == "search" { + if let Some(ref pilot) = self.pilot { + if pilot.config().guide_at_backtrack { + let visited: std::collections::HashSet<_> = ctx + .search_paths + .iter() + .flat_map(|p| p.nodes.iter().copied()) + .collect(); + let candidates: Vec<_> = ctx + .candidates + .iter() + .map(|c| c.node_id) + .collect(); + + let state = SearchState::new( + &ctx.tree, + &ctx.query, + &[], + &candidates, + &visited, + ); + + if let Some(guidance) = + pilot.guide_backtrack(&state).await + { + debug!( + "Pilot backtrack guidance for explicit backtrack: confidence={}", + guidance.confidence + ); + if guidance.has_candidates() { + ctx.candidates = guidance + .ranked_candidates + .iter() + .map(|rc| CandidateNode { + node_id: rc.node_id, + score: rc.score, + depth: 0, + is_leaf: false, + }) + .collect(); + } + } + } + } + } + + ctx.increment_backtrack(); + backtrack_count += 1; + + if let Some(target_group) = + self.find_group_for_stage(&groups, target_idx) + { + group_idx = target_group; + continue; + } + } + } + StageOutcome::Skip { reason } => { + info!("Skipping remaining stages: {}", reason); + ctx.metrics.total_time_ms = + total_start.elapsed().as_millis() as u64; + let response = ctx.finalize(); + let _ = tx + .send(RetrieveEvent::Completed { response }) + .await; + return Ok(()); + } + } + } + Err(e) => { + ctx.end_stage(stage_name, false, Some(e.to_string())); + + if policy.allows_continuation() { + warn!( + "Stage {} failed but policy allows continuation: {}", + stage_name, e + ); + } else { + error!("Stage {} failed: {}", stage_name, e); + let _ = tx + .send(RetrieveEvent::Error { + message: e.to_string(), + }) + .await; + return Err(e); + } + } + } + } + + group_idx += 1; + } + + ctx.metrics.total_time_ms = total_start.elapsed().as_millis() as u64; + info!( + "Streaming retrieval completed in {}ms ({} iterations, {} backtracks)", + ctx.metrics.total_time_ms, total_iterations, backtrack_count + ); + + let response = ctx.finalize(); + let _ = tx.send(RetrieveEvent::Completed { response }).await; + Ok(()) + } + /// Get list of stage names in execution order. pub fn stage_names(&self) -> Result<Vec<&str>> { let order = self.resolve_order()?; diff --git a/rust/src/retrieval/pipeline_retriever.rs b/rust/src/retrieval/pipeline_retriever.rs index 377c4747..e2faa499 100644 --- a/rust/src/retrieval/pipeline_retriever.rs +++ b/rust/src/retrieval/pipeline_retriever.rs @@ -13,6 +13,7 @@ use super::content::ContentAggregatorConfig; use super::pipeline::RetrievalOrchestrator; use super::retriever::{CostEstimate, Retriever, RetrieverError, RetrieverResult}; use super::stages::{AnalyzeStage, EvaluateStage, PlanStage, SearchStage}; +use super::stream::{RetrieveEvent, RetrieveEventReceiver}; use super::strategy::LlmStrategy; use super::types::{RetrieveOptions, RetrieveResponse}; use crate::document::DocumentTree; @@ -151,6 +152,27 @@ impl PipelineRetriever { fn options_to_retrieve_options(&self, options: &RetrieveOptions) -> RetrieveOptions { options.clone() } + + /// Execute streaming retrieval. + /// + /// Returns a channel receiver that yields [`RetrieveEvent`]s as the + /// pipeline progresses. The stream always terminates with either + /// `Completed` or `Error`. + /// + /// This is the streaming counterpart of [`retrieve`](Retriever::retrieve). + /// The non-streaming path is not affected. + pub fn retrieve_streaming( + &self, + tree: &DocumentTree, + query: &str, + options: &RetrieveOptions, + ) -> (tokio::task::JoinHandle<()>, RetrieveEventReceiver) { + let orchestrator = self.build_orchestrator(); + let tree_arc = Arc::new(tree.clone()); + let opts = self.options_to_retrieve_options(options); + + orchestrator.execute_streaming(tree_arc, query, opts) + } } #[async_trait] diff --git a/rust/src/retrieval/stages/analyze.rs b/rust/src/retrieval/stages/analyze.rs index 8dd875e6..1748d440 100644 --- a/rust/src/retrieval/stages/analyze.rs +++ b/rust/src/retrieval/stages/analyze.rs @@ -12,10 +12,11 @@ use async_trait::async_trait; use tracing::info; -use crate::document::{DocumentTree, TocView}; +use crate::document::{DocumentTree, NodeId, TocView}; use crate::retrieval::complexity::ComplexityDetector; use crate::retrieval::decompose::{DecompositionConfig, QueryDecomposer}; use crate::retrieval::pipeline::{FailurePolicy, PipelineContext, RetrievalStage, StageOutcome}; +use crate::retrieval::types::{NavigationDecision, StageName}; use crate::llm::LlmClient; /// Analyze Stage - analyzes queries for retrieval planning. @@ -28,6 +29,56 @@ use crate::llm::LlmClient; /// /// # Example /// +/// Convert Chinese number string to integer (e.g. "三" → 3, "二十一" → 21). +fn chinese_num_to_int(s: &str) -> Option<usize> { + let chars: Vec<char> = s.chars().collect(); + if chars.is_empty() { + return None; + } + // If purely digits, parse directly + if chars.iter().all(|c| c.is_ascii_digit()) { + return s.parse().ok(); + } + let map = |c: char| -> usize { + match c { + '一' => 1, '二' => 2, '三' => 3, '四' => 4, '五' => 5, + '六' => 6, '七' => 7, '八' => 8, '九' => 9, '十' => 10, + '百' => 100, + _ => 0, + } + }; + // Simple two-pass: handle 十/百 as positional + let mut total: usize = 0; + let mut current: usize = 0; + for &c in &chars { + let v = map(c); + if v == 0 { + continue; + } + if v >= 10 { + // Positional multiplier + let base = if current == 0 { 1 } else { current }; + total += base * v; + current = 0; + } else { + current = v; + } + } + total += current; + if total > 0 { Some(total) } else { None } +} + +/// Analyze Stage - analyzes queries for retrieval planning. +/// +/// This stage: +/// 1. Detects query complexity (Simple/Medium/Complex) +/// 2. Extracts keywords for matching +/// 3. Matches target sections from ToC +/// 4. Extracts structural path hints (Section 3.2, 第3章, etc.) +/// 5. Decomposes complex queries into sub-queries (if enabled) +/// +/// # Example +/// /// ```rust,ignore /// let stage = AnalyzeStage::new() /// .with_toc_matching(true) @@ -144,6 +195,88 @@ impl AnalyzeStage { .collect() } + /// Extract structural path hints from the query. + /// + /// Recognizes patterns like: + /// - "第3章", "第2节", "第一章" (Chinese chapter/section) + /// - "Section 3.2", "section 4.1.2" (English section numbers) + /// - "Chapter 5", "chapter 10" (English chapter) + /// - "3.2.1", "2.1" (bare section numbers) + /// - "表3", "Table 5", "图2", "Figure 4" (table/figure references) + /// + /// Maps them to tree NodeIds via `find_by_structure()`. + fn extract_structure_hints(&self, query: &str, tree: &DocumentTree) -> Vec<(String, NodeId)> { + let mut hints = Vec::new(); + + // Chinese patterns: 第X章, 第X节, 第X部分 + for cap in regex::Regex::new(r"第([一二三四五六七八九十百\d]+)[章节部分]") + .unwrap() + .captures_iter(query) + { + let num = chinese_num_to_int(&cap[1]).unwrap_or(0); + if num > 0 { + if let Some(node_id) = tree.find_by_structure(&num.to_string()) { + hints.push((cap[0].to_string(), node_id)); + } + } + } + + // "Section X.Y.Z" or "section X.Y" + for cap in regex::Regex::new(r"(?i)section\s+(\d+(?:\.\d+)*)") + .unwrap() + .captures_iter(query) + { + if let Some(node_id) = tree.find_by_structure(&cap[1]) { + hints.push((cap[0].to_string(), node_id)); + } + } + + // "Chapter X" + for cap in regex::Regex::new(r"(?i)chapter\s+(\d+)") + .unwrap() + .captures_iter(query) + { + if let Some(node_id) = tree.find_by_structure(&cap[1]) { + hints.push((cap[0].to_string(), node_id)); + } + } + + // Bare section numbers: "3.2.1", "2.1" + // Use word boundary instead of lookbehind (Rust regex doesn't support lookaround) + for cap in regex::Regex::new(r"\b(\d+\.\d+(?:\.\d+)*)") + .unwrap() + .captures_iter(query) + { + if let Some(node_id) = tree.find_by_structure(&cap[1]) { + hints.push((cap[0].to_string(), node_id)); + } + } + + // Table/Figure references + for cap in regex::Regex::new(r"(?:表|(?i)table)\s*(\d+)") + .unwrap() + .captures_iter(query) + { + if let Some(node_id) = tree.find_by_structure(&format!("table {}", &cap[1])) { + hints.push((cap[0].to_string(), node_id)); + } + } + for cap in regex::Regex::new(r"(?:图|(?i)figure)\s*(\d+)") + .unwrap() + .captures_iter(query) + { + if let Some(node_id) = tree.find_by_structure(&format!("figure {}", &cap[1])) { + hints.push((cap[0].to_string(), node_id)); + } + } + + // Deduplicate by NodeId + let mut seen = std::collections::HashSet::new(); + hints.retain(|(_, nid)| seen.insert(*nid)); + + hints + } + /// Match target sections from ToC. fn match_toc_sections(&self, query: &str, tree: &DocumentTree) -> Vec<String> { if !self.enable_toc_matching { @@ -231,6 +364,16 @@ impl RetrievalStage for AnalyzeStage { info!("Target sections: {:?}", ctx.target_sections); } + // 3.5 Extract structural path hints + ctx.resolved_path_hints = self.extract_structure_hints(&ctx.query, &ctx.tree); + if !ctx.resolved_path_hints.is_empty() { + info!( + "Resolved {} structure hints: {:?}", + ctx.resolved_path_hints.len(), + ctx.resolved_path_hints.iter().map(|(s, _)| s).collect::<Vec<_>>() + ); + } + // 4. Decompose query if enabled and complex enough if self.enable_decomposition { if let Some(ref decomposer) = self.query_decomposer { @@ -269,6 +412,35 @@ impl RetrievalStage for AnalyzeStage { // 5. Update metrics ctx.metrics.llm_calls += 0; // No LLM calls in this stage + // 6. Record reasoning + let complexity_str = format!("{:?}", ctx.complexity.unwrap_or_default()); + let mut reasoning_parts = vec![ + format!("Query complexity: {}", complexity_str), + format!("Keywords: {:?}", ctx.keywords), + ]; + if !ctx.target_sections.is_empty() { + reasoning_parts.push(format!("Target sections: {:?}", ctx.target_sections)); + } + if !ctx.resolved_path_hints.is_empty() { + reasoning_parts.push(format!( + "Structure hints: {:?}", + ctx.resolved_path_hints.iter().map(|(s, _)| s).collect::<Vec<_>>() + )); + } + if let Some(ref decomp) = ctx.decomposition { + if decomp.was_decomposed { + reasoning_parts.push(format!( + "Decomposed into {} sub-queries", + decomp.sub_queries.len() + )); + } + } + ctx.record_reasoning( + StageName::Analyze, + reasoning_parts.join("; "), + NavigationDecision::ExploreMore, + ); + Ok(StageOutcome::cont()) } } diff --git a/rust/src/retrieval/stages/evaluate.rs b/rust/src/retrieval/stages/evaluate.rs index ad8858f2..11a95713 100644 --- a/rust/src/retrieval/stages/evaluate.rs +++ b/rust/src/retrieval/stages/evaluate.rs @@ -12,9 +12,9 @@ use tracing::{info, warn}; use crate::llm::LlmClient; use crate::retrieval::content::{ContentAggregator, ContentAggregatorConfig}; -use crate::retrieval::pipeline::{FailurePolicy, PipelineContext, RetrievalStage, StageOutcome}; +use crate::retrieval::pipeline::{BudgetStatus, FailurePolicy, PipelineContext, RetrievalStage, StageOutcome}; use crate::retrieval::sufficiency::{LlmJudge, SufficiencyChecker, ThresholdChecker}; -use crate::retrieval::types::{RetrievalResult, RetrieveResponse, SufficiencyLevel}; +use crate::retrieval::types::{NavigationDecision, ReasoningChain, RetrievalResult, RetrieveResponse, StageName, SufficiencyLevel}; use crate::utils::estimate_tokens; /// Evaluate Stage - evaluates retrieval sufficiency. @@ -275,7 +275,7 @@ impl EvaluateStage { .map(|s| format!("{:?}", s)) .unwrap_or_else(|| "unknown".to_string()), complexity: ctx.complexity.unwrap_or_default(), - trace: ctx.navigation_trace.clone(), + reasoning_chain: ctx.reasoning_chain.clone(), tokens_used: ctx.token_count, } } @@ -345,7 +345,10 @@ impl RetrievalStage for EvaluateStage { info!("Aggregated {} tokens", tokens); - // 2. Check sufficiency + // 2. Report token consumption to budget controller + ctx.budget_controller.record_tokens(tokens); + + // 3. Check sufficiency ctx.sufficiency = self.check_sufficiency(ctx); info!("Sufficiency level: {:?}", ctx.sufficiency); @@ -353,6 +356,43 @@ impl RetrievalStage for EvaluateStage { ctx.metrics.evaluate_time_ms += start.elapsed().as_millis() as u64; ctx.metrics.tokens_used = tokens; + // 4. Check budget status for adaptive decision + let budget_status = ctx.budget_controller.status(); + let confidence = self.calculate_confidence(ctx); + + // If budget is exhausted, force completion regardless of sufficiency + if budget_status.should_stop() && ctx.search_iterations >= 1 { + info!( + "Budget exhausted ({}/{}), completing with current results", + ctx.budget_controller.consumed(), + ctx.budget_controller.total_budget(), + ); + ctx.result = Some(self.build_response(ctx)); + ctx.record_reasoning( + StageName::Evaluate, + format!( + "Budget exhausted ({}/{}), forced completion; confidence={:.3}", + ctx.budget_controller.consumed(), + ctx.budget_controller.total_budget(), + confidence, + ), + NavigationDecision::Skip, + ); + return Ok(StageOutcome::complete()); + } + + // 2.5 Record successful navigation paths to L2 cache + if confidence > 0.5 { + let doc_key = format!("{:?}", ctx.tree.root()); + for candidate in ctx.candidates.iter().take(3) { + if let Some(node) = ctx.tree.get(candidate.node_id) { + let path = format!("{}", node.depth); + // Use the node title as path identifier for L2 + ctx.reasoning_cache.l2_record(&doc_key, &node.title, candidate.score); + } + } + } + // 3. Decide next action based on sufficiency let outcome = match ctx.sufficiency { SufficiencyLevel::Sufficient => { @@ -396,6 +436,29 @@ impl RetrievalStage for EvaluateStage { ctx.metrics.llm_calls += 1; } + // Record evaluation reasoning with budget status + let sufficiency_str = format!("{:?}", ctx.sufficiency); + let decision = match ctx.sufficiency { + SufficiencyLevel::Sufficient => NavigationDecision::ThisIsTheAnswer, + SufficiencyLevel::PartialSufficient => NavigationDecision::ExploreMore, + SufficiencyLevel::Insufficient => NavigationDecision::ExploreMore, + }; + ctx.record_reasoning( + StageName::Evaluate, + format!( + "Sufficiency={}, confidence={:.3}, tokens={}, candidates={}, iteration={}, budget={:?} ({}/{})", + sufficiency_str, + self.calculate_confidence(ctx), + ctx.token_count, + ctx.candidates.len(), + ctx.search_iterations, + budget_status, + ctx.budget_controller.consumed(), + ctx.budget_controller.total_budget(), + ), + decision, + ); + Ok(outcome) } } diff --git a/rust/src/retrieval/stages/plan.rs b/rust/src/retrieval/stages/plan.rs index 0b98003c..865f070f 100644 --- a/rust/src/retrieval/stages/plan.rs +++ b/rust/src/retrieval/stages/plan.rs @@ -15,9 +15,10 @@ use tracing::info; // DocumentTree is accessed via context use crate::llm::LlmClient; use crate::retrieval::pipeline::{ - FailurePolicy, PipelineContext, RetrievalStage, SearchAlgorithm, SearchConfig, StageOutcome, + BudgetStatus, FailurePolicy, PipelineContext, RetrievalStage, SearchAlgorithm, SearchConfig, + StageOutcome, }; -use crate::retrieval::types::{QueryComplexity, StrategyPreference}; +use crate::retrieval::types::{NavigationDecision, QueryComplexity, StageName, StrategyPreference}; /// Plan Stage - plans the retrieval strategy. /// @@ -54,7 +55,7 @@ impl PlanStage { self } - /// Select retrieval strategy based on complexity and preferences. + /// Select retrieval strategy based on complexity, preferences, and budget. fn select_strategy(&self, ctx: &PipelineContext) -> StrategyPreference { // Respect explicit strategy preference if ctx.options.strategy != StrategyPreference::Auto { @@ -62,6 +63,13 @@ impl PlanStage { return ctx.options.strategy; } + // Budget-aware strategy selection + let budget_status = ctx.budget_controller.status(); + if budget_status.should_stop() { + info!("Budget exhausted, forcing Keyword strategy"); + return StrategyPreference::ForceKeyword; + } + // Auto-select based on complexity let complexity = ctx.complexity.unwrap_or(QueryComplexity::Medium); @@ -71,8 +79,10 @@ impl PlanStage { StrategyPreference::ForceKeyword } QueryComplexity::Medium => { - // Use semantic if available, otherwise keyword with LLM fallback - if self.llm_client.is_some() { + if budget_status == BudgetStatus::Constrained { + info!("Complexity is Medium but budget constrained, selecting Keyword strategy"); + StrategyPreference::ForceKeyword + } else if self.llm_client.is_some() { info!("Complexity is Medium, selecting LLM strategy"); StrategyPreference::ForceLlm } else { @@ -81,7 +91,14 @@ impl PlanStage { } } QueryComplexity::Complex => { - if self.llm_client.is_some() { + if budget_status == BudgetStatus::Constrained { + info!("Complexity is Complex but budget constrained, selecting Hybrid strategy"); + if self.llm_client.is_some() { + StrategyPreference::ForceHybrid + } else { + StrategyPreference::ForceKeyword + } + } else if self.llm_client.is_some() { info!("Complexity is Complex, selecting LLM strategy"); StrategyPreference::ForceLlm } else { @@ -177,6 +194,34 @@ impl RetrievalStage for PlanStage { .unwrap_or(0) ); + // Record reasoning + let strategy_str = ctx + .selected_strategy + .map(|s| format!("{:?}", s)) + .unwrap_or_else(|| "auto".to_string()); + let algorithm_str = ctx + .selected_algorithm + .map(|a| a.name().to_string()) + .unwrap_or_else(|| "unknown".to_string()); + let beam_width = ctx + .search_config + .as_ref() + .map(|c| c.beam_width) + .unwrap_or(3); + ctx.record_reasoning( + StageName::Plan, + format!( + "Selected strategy={}, algorithm={}, beam_width={}; budget: {}/{} ({:.0}%)", + strategy_str, + algorithm_str, + beam_width, + ctx.budget_controller.consumed(), + ctx.budget_controller.total_budget(), + ctx.budget_controller.utilization() * 100.0 + ), + NavigationDecision::ExploreMore, + ); + Ok(StageOutcome::cont()) } } diff --git a/rust/src/retrieval/stages/search.rs b/rust/src/retrieval/stages/search.rs index 44a8de2c..929dad76 100644 --- a/rust/src/retrieval/stages/search.rs +++ b/rust/src/retrieval/stages/search.rs @@ -13,20 +13,24 @@ use std::sync::Arc; use tracing::{debug, info, warn}; use crate::document::DocumentTree; +use crate::document::ReasoningIndex; use crate::llm::LlmClient; use crate::retrieval::RetrievalContext; use crate::retrieval::pilot::Pilot; +use crate::retrieval::cache::CachedCandidate; use crate::retrieval::pipeline::{ - CandidateNode, FailurePolicy, PipelineContext, RetrievalStage, SearchAlgorithm, StageOutcome, + BudgetStatus, CandidateNode, FailurePolicy, PipelineContext, RetrievalStage, SearchAlgorithm, + StageOutcome, }; use crate::retrieval::search::{ BeamSearch, GreedySearch, SearchConfig as SearchAlgConfig, SearchCue, SearchTree, ToCNavigator, }; +use crate::retrieval::search::extract_keywords; use crate::retrieval::strategy::{ HybridConfig, HybridStrategy, KeywordStrategy, LlmStrategy, RetrievalStrategy, }; -use crate::retrieval::types::StrategyPreference; +use crate::retrieval::types::{NavigationDecision, ReasoningCandidate, ReasoningStep, StageName, StrategyPreference}; /// Search Stage - executes tree search with optional Pilot guidance. /// @@ -301,6 +305,115 @@ impl SearchStage { (all_paths, all_candidates) } + + /// Check if a query is asking for a document summary/overview. + fn is_summary_query(query: &str) -> bool { + let lower = query.to_lowercase(); + let patterns = [ + "what is this document", + "what is this about", + "summarize", + "summary", + "overview", + "give me an overview", + "describe this document", + "main topics", + "table of contents", + "这篇文档讲了什么", + "总结", + "概述", + "概要", + "主要内容", + "文档简介", + "介绍一下", + ]; + patterns.iter().any(|p| lower.contains(p)) + } + + /// Try to match the query against pre-computed reasoning index entries. + /// + /// Returns candidates if a high-confidence match is found, None otherwise. + fn try_reasoning_shortcut( + ridx: &ReasoningIndex, + ctx: &PipelineContext, + ) -> Option<Vec<CandidateNode>> { + // Check 1: Summary shortcut — handle "overview" style queries + if let Some(ref shortcut) = ridx.summary_shortcut() { + if Self::is_summary_query(&ctx.query) { + let mut candidates = vec![CandidateNode::new( + shortcut.root_node, + 1.0, + 0, + ctx.tree.is_leaf(shortcut.root_node), + )]; + for section in &shortcut.section_summaries { + candidates.push(CandidateNode::new( + section.node_id, + 0.9, + section.depth, + ctx.tree.is_leaf(section.node_id), + )); + } + return Some(candidates); + } + } + + // Check 2: Keyword → Topic path matching + let keywords = extract_keywords(&ctx.query); + if keywords.is_empty() { + return None; + } + + let mut scored_nodes: std::collections::HashMap<crate::document::NodeId, f32> = + std::collections::HashMap::new(); + for keyword in &keywords { + if let Some(entries) = ridx.topic_entries(keyword) { + for entry in entries { + let score = scored_nodes.entry(entry.node_id).or_insert(0.0); + *score += entry.weight; + } + } + } + + if scored_nodes.is_empty() { + return None; + } + + // Boost hot nodes by 20% + for (node_id, score) in scored_nodes.iter_mut() { + if ridx.is_hot(*node_id) { + *score *= 1.2; + } + } + + // Convert to candidates, only return if best match is high-confidence + let mut candidates: Vec<CandidateNode> = scored_nodes + .into_iter() + .filter_map(|(node_id, score)| { + let depth = ctx.tree.get(node_id).map(|n| n.depth)?; + Some(CandidateNode::new( + node_id, + score, + depth, + ctx.tree.is_leaf(node_id), + )) + }) + .collect(); + + candidates.sort_by(|a, b| { + b.score + .partial_cmp(&a.score) + .unwrap_or(std::cmp::Ordering::Equal) + }); + + // Only return shortcut results if we have a high-confidence match + let best_score = candidates.first().map(|c| c.score).unwrap_or(0.0); + if best_score > 0.5 { + Some(candidates) + } else { + None + } + } } #[async_trait] @@ -331,21 +444,91 @@ impl RetrievalStage for SearchStage { let algorithm = ctx.selected_algorithm.unwrap_or(SearchAlgorithm::Beam); let config = ctx.search_config.clone().unwrap_or_default(); + // Budget check: skip search iteration if exhausted + let budget_status = ctx.budget_controller.status(); + if budget_status.should_stop() && ctx.search_iterations > 0 { + info!( + "Budget exhausted ({}/{}), skipping search iteration", + ctx.budget_controller.consumed(), + ctx.budget_controller.total_budget(), + ); + ctx.record_reasoning( + StageName::Search, + format!( + "Budget exhausted ({}/{}), returning current candidates", + ctx.budget_controller.consumed(), + ctx.budget_controller.total_budget(), + ), + NavigationDecision::Skip, + ); + return Ok(StageOutcome::complete()); + } + // Reset Pilot state for new query if let Some(ref pilot) = self.pilot { pilot.reset(); debug!("SearchStage: Pilot is available, is_active={}", pilot.is_active()); } + // Apply budget-aware beam width adjustment + let effective_beam = ctx + .budget_controller + .suggested_beam_width(config.beam_width, ctx.search_iterations); + info!( - "Executing search: algorithm={:?}, beam_width={}, pilot={}", + "Executing search: algorithm={:?}, beam_width={} (budget: {:?}), pilot={}", algorithm, - config.beam_width, + effective_beam, + budget_status, if self.has_pilot() { "enabled" } else { "disabled" } ); ctx.increment_search_iteration(); + // === L1 Cache check: return cached results if available === + if ctx.options.enable_cache && ctx.search_iterations <= 1 { + let scope_fp = crate::utils::fingerprint::Fingerprint::from_str( + &format!("{:?}", ctx.tree.root()), + ); + if let Some(cached) = ctx.reasoning_cache.l1_get(&ctx.query, &scope_fp) { + info!("L1 cache hit for query, returning {} cached candidates", cached.len()); + ctx.candidates = cached + .into_iter() + .map(|c| CandidateNode::new(c.node_id, c.score, c.depth, ctx.tree.is_leaf(c.node_id))) + .collect(); + ctx.metrics.cache_hits += 1; + ctx.record_reasoning( + StageName::Search, + format!( + "L1 cache hit: {} candidates returned from cache", + ctx.candidates.len() + ), + NavigationDecision::ThisIsTheAnswer, + ); + return Ok(StageOutcome::cont()); + } + ctx.metrics.cache_misses += 1; + } + + // === Reasoning Index Quick Match === + // Check pre-computed index before running expensive ToC navigation. + if let Some(ref ridx) = ctx.reasoning_index { + if let Some(shortcut_candidates) = Self::try_reasoning_shortcut(ridx, ctx) { + info!( + "Reasoning index shortcut match, returning {} candidates", + shortcut_candidates.len() + ); + ctx.candidates = shortcut_candidates; + ctx.metrics.cache_hits += 1; + ctx.record_reasoning( + StageName::Search, + "Reasoning index shortcut: direct path match".to_string(), + NavigationDecision::ThisIsTheAnswer, + ); + return Ok(StageOutcome::cont()); + } + } + // === Phase Locate: find relevant subtrees via ToC === // Use depth-1 nodes (root's direct children = top-level sections). // level(0) is only the root itself, which is not useful for locating. @@ -356,13 +539,26 @@ impl RetrievalStage for SearchStage { .map(|nodes| nodes.to_vec()) .unwrap_or_else(|| ctx.tree.children(ctx.tree.root())); - let cues = self + let mut cues = self .toc_navigator .locate(&ctx.query, &ctx.tree, &top_level_nodes) .await; debug!("ToCNavigator returned {} cues", cues.len()); + // Inject structure hints from Analyze stage as high-priority cues + if !ctx.resolved_path_hints.is_empty() { + for (hint_text, node_id) in &ctx.resolved_path_hints { + if ctx.tree.get(*node_id).is_some() { + info!("Injecting structure hint '{}' as search cue", hint_text); + cues.push(SearchCue { + root: *node_id, + confidence: 1.0, // Direct match from query structure + }); + } + } + } + // === Resolve queries (decomposed or original) === let queries = Self::resolve_queries(ctx); @@ -407,16 +603,119 @@ impl RetrievalStage for SearchStage { } } - // Update metrics + // Update metrics and budget ctx.metrics.search_time_ms += start.elapsed().as_millis() as u64; ctx.metrics.nodes_visited += ctx.candidates.len(); + // Update hot node tracker with retrieval results + if let Some(ref tracker) = ctx.hot_tracker { + let hits: Vec<(crate::document::NodeId, f32)> = ctx + .candidates + .iter() + .map(|c| (c.node_id, c.score)) + .collect(); + tracker.record_hits(&hits); + } + // Estimate tokens consumed by this search iteration (content-based heuristic) + let search_tokens: usize = ctx + .candidates + .iter() + .filter_map(|c| ctx.tree.get(c.node_id).map(|n| n.content.len())) + .sum::<usize>() + / 4; // rough: 4 chars ≈ 1 token + ctx.budget_controller.record_tokens(search_tokens); + + // Store results in L1 cache + if ctx.options.enable_cache && ctx.search_iterations <= 1 && !ctx.candidates.is_empty() { + let scope_fp = crate::utils::fingerprint::Fingerprint::from_str( + &format!("{:?}", ctx.tree.root()), + ); + let cached: Vec<CachedCandidate> = ctx + .candidates + .iter() + .map(|c| CachedCandidate { + node_id: c.node_id, + score: c.score, + depth: c.depth, + }) + .collect(); + ctx.reasoning_cache.l1_store( + &ctx.query, + scope_fp, + cached, + ctx.selected_strategy + .map(|s| format!("{:?}", s)) + .unwrap_or_else(|| "auto".to_string()), + ); + } + info!( "Search complete: {} candidates (iteration {})", ctx.candidates.len(), ctx.search_iterations ); + // Record reasoning — collect data first to avoid borrow conflicts + let strategy_str = ctx + .selected_strategy + .map(|s| format!("{:?}", s)) + .unwrap_or_else(|| "auto".to_string()); + let search_iterations = ctx.search_iterations; + + let reasoning_data: Vec<(String, Option<String>, f32, usize, String, Vec<ReasoningCandidate>)> = ctx + .candidates + .iter() + .take(5) + .map(|candidate| { + let (title, depth) = ctx + .tree + .get(candidate.node_id) + .map(|n| (n.title.clone(), n.depth)) + .unwrap_or_else(|| ("(unknown)".to_string(), 0)); + + let considered: Vec<ReasoningCandidate> = ctx + .candidates + .iter() + .filter(|c| c.node_id != candidate.node_id) + .take(5) + .filter_map(|c| { + ctx.tree.get(c.node_id).map(|n| ReasoningCandidate { + node_id: format!("{:?}", c.node_id), + title: n.title.clone(), + score: c.score, + }) + }) + .collect(); + + let reasoning = format!( + "Candidate '{}' (score={:.3}) found via {} search, iteration {}", + title, candidate.score, algorithm.name(), search_iterations + ); + + (format!("{:?}", candidate.node_id), Some(title), candidate.score, depth, reasoning, considered) + }) + .collect(); + + for (node_id, title, score, depth, reasoning, considered) in reasoning_data { + ctx.push_reasoning_step(ReasoningStep { + stage: StageName::Search, + node_id: Some(node_id), + title, + score, + decision: if score > 0.7 { + NavigationDecision::ThisIsTheAnswer + } else { + NavigationDecision::ExploreMore + }, + depth, + reasoning, + candidates: considered, + strategy_used: Some(strategy_str.clone()), + llm_call: None, + references_followed: Vec::new(), + }); + } + Ok(StageOutcome::cont()) } } diff --git a/rust/src/retrieval/strategy/cross_document.rs b/rust/src/retrieval/strategy/cross_document.rs index d451f5c7..fe43f775 100644 --- a/rust/src/retrieval/strategy/cross_document.rs +++ b/rust/src/retrieval/strategy/cross_document.rs @@ -8,9 +8,10 @@ use async_trait::async_trait; use std::collections::HashMap; +use std::sync::Arc; use super::r#trait::{NodeEvaluation, RetrievalStrategy, StrategyCapabilities}; -use crate::document::{DocumentTree, NodeId}; +use crate::document::{DocumentGraph, DocumentTree, NodeId}; use crate::retrieval::types::{NavigationDecision, QueryComplexity}; use crate::retrieval::RetrievalContext; @@ -61,6 +62,8 @@ pub enum MergeStrategy { BestPerDocument, /// Weight results by document relevance score. WeightedByRelevance, + /// Use graph connectivity to boost connected documents. + GraphBoosted, } /// Configuration for cross-document retrieval. @@ -122,6 +125,8 @@ pub struct CrossDocumentStrategy { config: CrossDocumentConfig, /// Documents to search. documents: Vec<DocumentEntry>, + /// Optional document graph for graph-aware ranking. + graph: Option<Arc<DocumentGraph>>, } impl CrossDocumentStrategy { @@ -131,6 +136,7 @@ impl CrossDocumentStrategy { inner, config: CrossDocumentConfig::default(), documents: Vec::new(), + graph: None, } } @@ -158,6 +164,59 @@ impl CrossDocumentStrategy { self.documents.len() } + /// Set the document graph for graph-aware ranking. + pub fn with_graph(mut self, graph: Arc<DocumentGraph>) -> Self { + self.graph = Some(graph); + self + } + + /// Apply graph-based score boosting to merged results. + /// + /// For each high-confidence result (score > 0.5), find its graph neighbors + /// and boost their scores by `boost_factor * edge_weight`. + fn apply_graph_boost( + &self, + results: &mut Vec<(DocumentId, NodeId, NodeEvaluation)>, + boost_factor: f32, + ) { + let graph = match self.graph { + Some(ref g) => g, + None => return, + }; + + // Collect doc_ids with high scores + let high_score_docs: Vec<(String, f32)> = results + .iter() + .filter(|(_, _, eval)| eval.score > 0.5) + .map(|(doc_id, _, eval)| (doc_id.clone(), eval.score)) + .collect(); + + if high_score_docs.is_empty() { + return; + } + + // For each high-score doc, boost its graph neighbors + for (doc_id, base_score) in &high_score_docs { + let neighbors = graph.get_neighbors(doc_id); + for edge in neighbors { + // Find results from the neighbor doc and boost them + for result in results.iter_mut() { + if result.0 == edge.target_doc_id { + let boost = boost_factor * edge.weight * base_score; + result.2.score += boost; + } + } + } + } + + // Re-sort by score after boosting + results.sort_by(|a, b| { + b.2.score + .partial_cmp(&a.2.score) + .unwrap_or(std::cmp::Ordering::Equal) + }); + } + /// Search a single document and return results. async fn search_document( &self, @@ -251,6 +310,26 @@ impl CrossDocumentStrategy { all_results.truncate(self.config.max_total_results); all_results } + + MergeStrategy::GraphBoosted => { + // First do TopK merge + let mut all_results: Vec<_> = doc_results + .into_iter() + .flat_map(|doc| { + doc.evaluations.into_iter().map(move |(node_id, eval)| { + (doc.doc_id.clone(), node_id, eval) + }) + }) + .collect(); + + all_results.sort_by(|a, b| b.2.score.partial_cmp(&a.2.score).unwrap_or(std::cmp::Ordering::Equal)); + + // Apply graph-based boosting + self.apply_graph_boost(&mut all_results, 0.15); + + all_results.truncate(self.config.max_total_results); + all_results + } } } } diff --git a/rust/src/retrieval/stream.rs b/rust/src/retrieval/stream.rs new file mode 100644 index 00000000..33aa75b7 --- /dev/null +++ b/rust/src/retrieval/stream.rs @@ -0,0 +1,128 @@ +// Copyright (c) 2026 vectorless developers +// SPDX-License-Identifier: Apache-2.0 + +//! Streaming retrieval events. +//! +//! When `RetrieveOptions::streaming` is enabled, retrieval emits +//! [`RetrieveEvent`]s incrementally as the pipeline progresses through +//! its stages (Analyze → Plan → Search → Evaluate). +//! +//! # Example +//! +//! ```rust,ignore +//! let options = RetrieveOptions::new().with_streaming(true); +//! let rx = client.query_stream(&tree, "query", &options).await?; +//! +//! while let Some(event) = rx.recv().await { +//! match event { +//! RetrieveEvent::Started { query, .. } => println!("Started: {query}"), +//! RetrieveEvent::StageCompleted { stage, .. } => println!("Done: {stage}"), +//! RetrieveEvent::Completed { response } => { +//! println!("Confidence: {}", response.confidence); +//! break; +//! } +//! RetrieveEvent::Error { message } => { +//! eprintln!("Error: {message}"); +//! break; +//! } +//! _ => {} +//! } +//! } +//! ``` + +use tokio::sync::mpsc; + +use super::types::{RetrieveResponse, SufficiencyLevel}; + +/// Events emitted during streaming retrieval. +/// +/// Each event represents a meaningful milestone in the retrieval pipeline. +/// The stream always terminates with either [`Completed`](RetrieveEvent::Completed) +/// or [`Error`](RetrieveEvent::Error). +#[derive(Debug, Clone)] +pub enum RetrieveEvent { + /// Retrieval pipeline started. + Started { + /// The query string. + query: String, + /// Planned retrieval strategy name. + strategy: String, + }, + + /// A pipeline stage completed. + StageCompleted { + /// Stage name (analyze, plan, search, evaluate). + stage: String, + /// Time spent in this stage (ms). + elapsed_ms: u64, + }, + + /// A node was visited during tree traversal. + NodeVisited { + /// Node ID. + node_id: String, + /// Node title. + title: String, + /// Relevance score (0.0 - 1.0). + score: f32, + }, + + /// Relevant content was found. + ContentFound { + /// Node ID. + node_id: String, + /// Node title. + title: String, + /// Short preview of the content. + preview: String, + /// Relevance score. + score: f32, + }, + + /// Pipeline is backtracking to an earlier stage. + Backtracking { + /// Stage backtracking from. + from: String, + /// Stage backtracking to. + to: String, + /// Reason for backtracking. + reason: String, + }, + + /// Sufficiency check result. + SufficiencyCheck { + /// Sufficiency level. + level: SufficiencyLevel, + /// Total tokens collected so far. + tokens: usize, + }, + + /// Retrieval completed successfully with final results. + Completed { + /// The full retrieval response. + response: RetrieveResponse, + }, + + /// An error occurred during retrieval. + Error { + /// Error message. + message: String, + }, +} + +/// Sender half for streaming retrieval events. +pub(crate) type RetrieveEventSender = mpsc::Sender<RetrieveEvent>; + +/// Receiver half for streaming retrieval events. +pub type RetrieveEventReceiver = mpsc::Receiver<RetrieveEvent>; + +/// Create a bounded channel for streaming retrieval events. +/// +/// The bound defaults to 64 events. The sender will apply backpressure +/// when the receiver cannot keep up, preventing unbounded memory growth. +pub(crate) fn channel(bound: usize) -> (RetrieveEventSender, RetrieveEventReceiver) { + mpsc::channel(bound) +} + +/// Default channel bound for streaming events. +pub const DEFAULT_STREAM_BOUND: usize = 64; diff --git a/rust/src/retrieval/types.rs b/rust/src/retrieval/types.rs index 3057b7dc..fa1b7e1c 100644 --- a/rust/src/retrieval/types.rs +++ b/rust/src/retrieval/types.rs @@ -118,6 +118,13 @@ pub struct RetrieveOptions { /// Whether to use async context building for large documents. pub use_async_context: bool, + + /// Enable streaming retrieval results. + /// + /// When enabled, use `query_stream()` to receive incremental + /// `RetrieveEvent`s as each pipeline stage completes. When disabled + /// (default), the standard `query()` returns a single final result. + pub streaming: bool, } impl Default for RetrieveOptions { @@ -136,6 +143,7 @@ impl Default for RetrieveOptions { pruning_strategy: super::PruningStrategy::default(), token_estimation: super::TokenEstimation::default(), use_async_context: false, + streaming: false, } } } @@ -237,6 +245,13 @@ impl RetrieveOptions { self.use_async_context = enable; self } + + /// Enable streaming retrieval results. + #[must_use] + pub fn with_streaming(mut self, enable: bool) -> Self { + self.streaming = enable; + self + } } /// A single retrieval result. @@ -343,8 +358,8 @@ pub struct RetrieveResponse { /// Detected query complexity. pub complexity: QueryComplexity, - /// Search trace for debugging. - pub trace: Vec<NavigationStep>, + /// Reasoning chain explaining how results were found. + pub reasoning_chain: ReasoningChain, /// Total tokens used. pub tokens_used: usize, @@ -359,7 +374,7 @@ impl Default for RetrieveResponse { is_sufficient: false, strategy_used: String::new(), complexity: QueryComplexity::Medium, - trace: Vec::new(), + reasoning_chain: ReasoningChain::default(), tokens_used: 0, } } @@ -420,6 +435,135 @@ pub enum NavigationDecision { Skip, } +/// Pipeline stage name for reasoning chain provenance. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)] +pub enum StageName { + /// Query analysis stage. + Analyze, + /// Strategy planning stage. + Plan, + /// Tree search stage. + Search, + /// Sufficiency evaluation stage. + Evaluate, +} + +impl std::fmt::Display for StageName { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + match self { + Self::Analyze => write!(f, "analyze"), + Self::Plan => write!(f, "plan"), + Self::Search => write!(f, "search"), + Self::Evaluate => write!(f, "evaluate"), + } + } +} + +/// Summary of an LLM call made during a reasoning step. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct LlmCallSummary { + /// Truncated prompt summary for display. + pub prompt_summary: String, + /// Tokens consumed by this call. + pub tokens_used: usize, + /// Model identifier. + pub model: String, +} + +/// A candidate node considered but not selected during reasoning. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct ReasoningCandidate { + /// Node ID. + pub node_id: String, + /// Node title. + pub title: String, + /// Relevance score of this candidate. + pub score: f32, +} + +/// A single step in the reasoning chain. +/// +/// Unlike `NavigationStep` which only records "where" the search went, +/// `ReasoningStep` also records "why" — the decision rationale, +/// candidates considered, strategy used, and any LLM calls made. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct ReasoningStep { + /// Which pipeline stage produced this step. + pub stage: StageName, + /// Node ID visited (if applicable). + pub node_id: Option<String>, + /// Node title (if applicable). + pub title: Option<String>, + /// Relevance score at this step. + pub score: f32, + /// Decision made at this step. + pub decision: NavigationDecision, + /// Depth in tree. + pub depth: usize, + /// Human-readable explanation of why this decision was made. + pub reasoning: String, + /// Candidates considered but not selected at this step. + pub candidates: Vec<ReasoningCandidate>, + /// Strategy used at this step (e.g. "keyword", "hybrid"). + pub strategy_used: Option<String>, + /// LLM call summary, if an LLM was consulted. + pub llm_call: Option<LlmCallSummary>, + /// Reference identifiers followed from this step (cross-reference tracking). + pub references_followed: Vec<String>, +} + +/// Complete reasoning chain for a retrieval operation. +/// +/// Provides an ordered, auditable trace of every decision the engine made +/// from query analysis through final evaluation. This is the core +/// differentiator — not just results, but *why* these results. +#[derive(Debug, Clone, Default, Serialize, Deserialize)] +pub struct ReasoningChain { + /// Ordered reasoning steps. + pub steps: Vec<ReasoningStep>, +} + +impl ReasoningChain { + /// Create an empty reasoning chain. + #[must_use] + pub fn new() -> Self { + Self::default() + } + + /// Append a reasoning step. + pub fn push(&mut self, step: ReasoningStep) { + self.steps.push(step); + } + + /// Number of reasoning steps. + #[must_use] + pub fn len(&self) -> usize { + self.steps.len() + } + + /// Whether the chain is empty. + #[must_use] + pub fn is_empty(&self) -> bool { + self.steps.is_empty() + } + + /// Build a human-readable summary of the full chain. + #[must_use] + pub fn summary(&self) -> String { + self.steps + .iter() + .map(|s| { + let node_info = s + .title + .as_deref() + .unwrap_or("(no node)"); + format!("[{}] {} (score={:.2}): {}", s.stage, node_info, s.score, s.reasoning) + }) + .collect::<Vec<_>>() + .join("\n") + } +} + /// Search path for multi-path algorithms. #[derive(Debug, Clone)] pub struct SearchPath { diff --git a/rust/src/storage/persistence.rs b/rust/src/storage/persistence.rs index 3e0e8281..2e6c1f91 100644 --- a/rust/src/storage/persistence.rs +++ b/rust/src/storage/persistence.rs @@ -16,7 +16,7 @@ use std::io::{BufReader, BufWriter, Read, Write}; use std::path::{Path, PathBuf}; use crate::Error; -use crate::document::DocumentTree; +use crate::document::{DocumentTree, ReasoningIndex}; use crate::error::Result; /// Current format version for persisted documents. @@ -191,6 +191,10 @@ pub struct PersistedDocument { /// Per-page content (for PDFs). #[serde(default)] pub pages: Vec<PageContent>, + + /// Pre-computed reasoning index for retrieval acceleration. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub reasoning_index: Option<ReasoningIndex>, } impl PersistedDocument { @@ -200,6 +204,7 @@ impl PersistedDocument { meta, tree, pages: Vec::new(), + reasoning_index: None, } } diff --git a/rust/src/storage/workspace.rs b/rust/src/storage/workspace.rs index a0ee5cb9..c2192cfa 100644 --- a/rust/src/storage/workspace.rs +++ b/rust/src/storage/workspace.rs @@ -110,6 +110,8 @@ struct WorkspaceInner { meta_index: HashMap<String, DocumentMetaEntry>, /// LRU cache for loaded documents. cache: DocumentCache, + /// Cross-document relationship graph (cached). + document_graph: Option<crate::document::DocumentGraph>, } /// An async workspace for managing indexed documents. @@ -148,6 +150,7 @@ impl Workspace { root: None, meta_index: HashMap::new(), cache: DocumentCache::with_capacity(options.cache_size), + document_graph: None, }; Self::load_meta_index(&mut inner)?; @@ -184,6 +187,7 @@ impl Workspace { root: Some(root), meta_index: HashMap::new(), cache: DocumentCache::with_capacity(options.cache_size), + document_graph: None, }; Self::load_meta_index(&mut inner)?; @@ -254,6 +258,10 @@ impl Workspace { let _ = inner.cache.remove(&doc_id); info!("Saved document {} to async workspace", doc_id); + + // Invalidate document graph since documents changed + inner.document_graph = None; + Ok(()) } @@ -354,6 +362,10 @@ impl Workspace { Self::save_meta_index(&inner)?; info!("Removed document {} from async workspace", id); + + // Invalidate document graph since documents changed + inner.document_graph = None; + Ok(true) } @@ -395,6 +407,60 @@ impl Workspace { Ok(()) } + // ========================================================================= + // Document Graph Methods + // ========================================================================= + + /// Storage key for the document graph. + const GRAPH_KEY: &'static str = "_graph"; + + /// Get the document graph, loading from backend if not cached. + pub async fn get_graph(&self) -> Result<Option<crate::document::DocumentGraph>> { + // Check cache first + { + let inner = self.inner.read().await; + if inner.document_graph.is_some() { + return Ok(inner.document_graph.clone()); + } + } + + // Load from backend + let inner = self.inner.read().await; + match inner.backend.get(Self::GRAPH_KEY)? { + Some(bytes) => { + let graph: crate::document::DocumentGraph = + serde_json::from_slice(&bytes).map_err(|e| { + crate::Error::Serialization(format!("Failed to deserialize graph: {}", e)) + })?; + debug!("Loaded document graph from backend"); + Ok(Some(graph)) + } + None => Ok(None), + } + } + + /// Persist the document graph to the backend. + pub async fn set_graph(&self, graph: &crate::document::DocumentGraph) -> Result<()> { + let mut inner = self.inner.write().await; + let bytes = serde_json::to_vec(graph).map_err(|e| { + crate::Error::Serialization(format!("Failed to serialize graph: {}", e)) + })?; + inner.backend.put(Self::GRAPH_KEY, &bytes)?; + inner.document_graph = Some(graph.clone()); + info!("Persisted document graph ({} nodes, {} edges)", graph.node_count(), graph.edge_count()); + Ok(()) + } + + /// Invalidate the cached document graph (e.g. after add/remove). + pub async fn invalidate_graph(&self) -> Result<()> { + let mut inner = self.inner.write().await; + inner.document_graph = None; + // Also remove from backend so stale graphs don't persist + let _ = inner.backend.delete(Self::GRAPH_KEY); + debug!("Invalidated document graph cache"); + Ok(()) + } + /// Get the storage key for a document. fn doc_key(id: &str) -> String { format!("doc:{}", id)