diff --git a/docs/design/opt.md b/docs/design/opt.md
new file mode 100644
index 00000000..0840e197
--- /dev/null
+++ b/docs/design/opt.md
@@ -0,0 +1,414 @@
+# Phase 2: Performance Optimization Design
+
+## Overview
+
+This document outlines the performance optimization strategies for vectorless v0.3.0, targeting millisecond-level response times. The optimizations are prioritized based on infrastructure readiness and expected impact.
+
+## Priority Order
+
+| Priority | Task | Status | Estimated Effort |
+|----------|------|--------|------------------|
+| 1 | Cache Strategy Optimization | **Ready** | 1 day |
+| 2 | Incremental Indexing Optimization | **Ready** | 1 day |
+| 3 | Parallel Retrieval Optimization | Needs baseline | 2 days |
+| 4 | Memory Footprint Optimization | Needs evaluation | 2 days |
+
+---
+
+## 1. Cache Strategy Optimization
+
+### Current State
+
+The `MemoStore` is now integrated with `LlmPilot` for caching navigation decisions. However, cache hit rates can be improved through smarter caching strategies.
+
+### Problem Statement
+
+- Cache keys are based on exact content fingerprints
+- Similar queries with slightly different phrasing cause cache misses
+- No semantic similarity matching
+- Cache warming is manual
+
+### Proposed Improvements
+
+#### 1.1 Semantic Cache Keys
+
+Instead of exact fingerprint matching, use semantic similarity for cache lookups:
+
+```
+Current:  query_fp == cached_query_fp → hit
+Proposed: similarity(query_embedding, cached_embedding) > threshold → hit
+```
+
+**Approach:**
+- Pre-compute embeddings for cached queries
+- Use cosine similarity or dot product for matching
+- Threshold: 0.85+ similarity for cache hit
+- Store top-k similar queries for approximate matching
+
+**Benefits:**
+- Higher hit rate for semantically equivalent queries
+- Reduced LLM calls for similar user questions
+
+#### 1.2 Cache Warming
+
+Pre-populate cache with common query patterns:
+
+**Approach:**
+- Analyze historical query logs
+- Identify top-N most frequent query patterns
+- Pre-compute and cache Pilot decisions for common document structures
+- Support configurable warm-up on engine startup
+
+**Configuration:**
+```toml
+[memo]
+warmup_enabled = true
+warmup_top_queries = 100
+warmup_on_startup = true
+```
+
+#### 1.3 Adaptive TTL
+
+Adjust TTL based on content stability:
+
+**Approach:**
+- Static content (documentation): longer TTL (30 days)
+- Dynamic content (news, logs): shorter TTL (1 day)
+- Track content change frequency per document
+- Adjust TTL dynamically based on change history
+
+#### 1.4 Multi-Level Caching
+
+Implement hierarchical caching:
+
+```
+L1: In-memory LRU (current MemoStore) - microseconds
+L2: Local disk (persisted cache) - milliseconds
+L3: Redis (distributed cache) - milliseconds
+```
+
+**Use Cases:**
+- L1: Single-session hot data
+- L2: Cross-session persistence
+- L3: Multi-instance sharing
+
+### Metrics to Track
+
+| Metric | Current | Target |
+|--------|---------|--------|
+| Hit rate (repeated queries) | ~50% | **90%+** |
+| Hit rate (similar queries) | 0% | **60%+** |
+| Cache lookup latency | <1µs | <1µs |
+| Memory per entry | ~500 bytes | ~300 bytes |
+
+---
+
+## 2. Incremental Indexing Optimization
+
+### Current State
+
+The fingerprint system (`NodeFingerprint`) is implemented and can detect subtree-level changes. However, the indexer still reprocesses entire documents on updates.
+
+### Problem Statement
+
+- Full document reprocessing on any change
+- No partial tree updates
+- Wasted LLM calls for unchanged sections
+
+### Proposed Improvements
+
+#### 2.1 Subtree-Level Updates
+
+Only reprocess changed subtrees:
+
+**Approach:**
+1. Load existing document tree and fingerprints
+2. Parse new document, compute new fingerprints
+3. Compare `NodeFingerprint` at each level
+4. Only reprocess nodes where `content_changed() == true`
+5. Propagate `subtree_fp` changes upward
+
+**Detection Logic:**
+```
+if node_fp.content_changed():
+    → Regenerate summary for this node
+if node_fp.only_descendants_changed():
+    → Skip this node, process children only
+if node_fp.subtree_changed():
+    → Update ancestor subtree fingerprints
+```
+
+#### 2.2 Lazy Summary Regeneration
+
+Defer summary regeneration until needed:
+
+**Approach:**
+- Mark nodes with `summary_stale = true` on content change
+- Regenerate summaries lazily on first query access
+- Use MemoStore to cache regenerated summaries
+- Track staleness in `DocumentChangeInfo`
+
+**Benefits:**
+- Fast document updates (no immediate LLM calls)
+- Spread LLM cost over time
+- Better user experience for large documents
+
+#### 2.3 Batch Processing
+
+Process multiple changed documents efficiently:
+
+**Approach:**
+- Collect changed documents into batches
+- Group similar content types together
+- Use single LLM call for multiple summaries (where token budget allows)
+- Implement priority queue for urgent documents
+
+#### 2.4 Change Propagation
+
+Optimize how changes propagate through the tree:
+
+**Approach:**
+- Use bottom-up propagation for fingerprint updates
+- Only update ancestors of changed nodes
+- Implement efficient diff algorithm (Myers or patience diff)
+- Cache intermediate results during propagation
+
+### Metrics to Track
+
+| Metric | Current | Target |
+|--------|---------|--------|
+| Full reindex time (100KB doc) | ~5s | **<1s** |
+| Incremental update (1 section) | ~5s (full) | **<100ms** |
+| LLM calls per update | 10-50 | **1-5** |
+| Memory during update | 2x doc size | **1.2x** |
+
+---
+
+## 3. Parallel Retrieval Optimization
+
+### Current State
+
+Retrieval is primarily sequential through the pipeline stages.
+
+### Problem Statement
+
+- Sequential stage execution
+- No parallel candidate evaluation
+- Underutilized multi-core CPUs
+
+### Prerequisites
+
+- [ ] Establish performance baseline with benchmarks
+- [ ] Profile hot paths
+- [ ] Identify parallelizable operations
+
+### Proposed Improvements
+
+#### 3.1 Parallel Stage Execution
+
+Execute independent pipeline stages concurrently:
+
+**Approach:**
+- `AnalyzeStage` and initial `PlanStage` can run in parallel
+- Fork-join pattern for search branches
+- Use `tokio::join!` for concurrent stage execution
+
+**Parallelization Points:**
+```
+┌─────────────┐
+│   Analyze   │────┐
+└─────────────┘    │
+                   ├──▶ ┌─────────────┐ ──▶ ┌─────────────┐
+┌─────────────┐    │    │   Search    │     │  Evaluate   │
+│    Plan     │────┘    │  (parallel) │     │             │
+└─────────────┘         └─────────────┘     └─────────────┘
+```
+
+#### 3.2 Parallel Candidate Evaluation
+
+Evaluate multiple search candidates simultaneously:
+
+**Approach:**
+- Use `futures::stream` for concurrent evaluation
+- Limit concurrency with semaphore
+- Collect results with timeout
+- Merge and rank results
+
+**Concurrency Control:**
+- Max concurrent evaluations: 4-8 (configurable)
+- Per-evaluation timeout: 500ms
+- Early termination on high-confidence result
+
+#### 3.3 Parallel Tree Traversal
+
+Traverse document tree branches in parallel:
+
+**Approach:**
+- Spawn tasks for each top-level branch
+- Use work-stealing for load balancing
+- Aggregate results with structured concurrency
+
+### Metrics to Track
+
+| Metric | Current | Target |
+|--------|---------|--------|
+| P50 retrieval latency | ~200ms | **<50ms** |
+| P99 retrieval latency | ~1s | **<200ms** |
+| CPU utilization | ~30% | **70%+** |
+| Throughput (queries/sec) | ~5 | **20+** |
+
+---
+
+## 4. Memory Footprint Optimization
+
+### Current State
+
+Memory usage scales linearly with document size and cache capacity.
+
+### Problem Statement
+
+- Large documents (10MB+) can use 50MB+ memory
+- Cache entries hold full strings
+- No memory pressure handling
+
+### Prerequisites
+
+- [ ] Complete other Phase 2 optimizations
+- [ ] Profile memory usage patterns
+- [ ] Identify memory hot spots
+
+### Proposed Improvements
+
+#### 4.1 String Interning
+
+Deduplicate common strings:
+
+**Approach:**
+- Use `string_interner` crate for titles, common phrases
+- Intern node titles during parsing
+- Store indices instead of full strings in hot paths
+
+**Expected Savings:**
+- 20-40% reduction in string memory
+- Faster string comparisons
+
+#### 4.2 Compressed Cache Entries
+
+Compress cached values:
+
+**Approach:**
+- Use `zstd` or `lz4` for cache value compression
+- Compress summaries and reasoning strings
+- Decompress on cache hit
+
+**Trade-offs:**
+- Extra CPU for compression/decompression
+- Significant memory savings for text-heavy caches
+
+#### 4.3 Memory-Mapped Large Documents
+
+Use mmap for large document content:
+
+**Approach:**
+- Store large documents as memory-mapped files
+- Only load accessed sections into memory
+- OS handles paging automatically
+
+**Threshold:**
+- Documents > 1MB: use mmap
+- Documents < 1MB: load entirely
+
+#### 4.4 Cache Eviction Under Pressure
+
+Respond to memory pressure:
+
+**Approach:**
+- Monitor system memory usage
+- Implement adaptive cache sizing
+- Aggressive eviction when memory > 80% used
+- Use `jemalloc` with background threads
+
+### Metrics to Track
+
+| Metric | Current | Target |
+|--------|---------|--------|
+| Memory per 1MB document | ~5MB | **<2MB** |
+| Peak memory (10 docs) | ~500MB | **<200MB** |
+| Cache memory efficiency | ~60% | **80%+** |
+| GC pause time | N/A | **<10ms** |
+
+---
+
+## Implementation Timeline
+
+```
+Week 1:
+├── Day 1-2: Cache Strategy Optimization
+│   ├── Semantic cache keys
+│   └── Adaptive TTL
+├── Day 3-4: Incremental Indexing
+│   ├── Subtree-level updates
+│   └── Lazy summary regeneration
+└── Day 5: Integration testing
+
+Week 2:
+├── Day 1-2: Performance Baseline
+│   ├── Benchmark suite setup
+│   └── Profiling infrastructure
+├── Day 3-4: Parallel Retrieval
+│   ├── Parallel stages
+│   └── Concurrent evaluation
+└── Day 5: Memory profiling
+
+Week 3:
+├── Day 1-2: Memory Optimization
+│   ├── String interning
+│   └── Compressed cache
+├── Day 3-4: Final tuning
+│   └── Integration testing
+└── Day 5: Documentation & release prep
+```
+
+## Success Criteria
+
+### Must Have (v0.3.0)
+
+- [ ] 90%+ cache hit rate for repeated queries
+- [ ] <1s incremental update time
+- [ ] <100ms P50 retrieval latency
+
+### Should Have
+
+- [ ] 60%+ cache hit rate for similar queries
+- [ ] 70%+ CPU utilization during retrieval
+- [ ] <200MB memory for 10 documents
+
+### Nice to Have
+
+- [ ] Multi-level caching (L1/L2/L3)
+- [ ] Memory-mapped document storage
+- [ ] Distributed cache support
+
+## Dependencies
+
+| Optimization | Requires |
+|-------------|----------|
+| Semantic cache keys | Embedding model (local or API) |
+| Parallel retrieval | `tokio` profiling tools |
+| Memory optimization | Memory profiler (`dhall` or `bytehound`) |
+
+## Risks
+
+| Risk | Mitigation |
+|------|------------|
+| Semantic cache adds latency | Use local embedding model (all-MiniLM) |
+| Parallel execution complexity | Extensive testing, structured concurrency |
+| Memory optimization regressions | Benchmark before/after each change |
+| Cache coherence issues | Clear invalidation strategy, versioning |
+
+## References
+
+- [MemoStore Design](./memo.md)
+- [Fingerprint System](./fingerprint.md)
+- [Incremental Indexing](./incremental.md)
+- [Pilot Architecture](./pilot.md)
diff --git a/examples/memo_cache.rs b/examples/memo_cache.rs
new file mode 100644
index 00000000..d4655189
--- /dev/null
+++ b/examples/memo_cache.rs
@@ -0,0 +1,264 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! MemoStore verification example.
+//!
+//! This example demonstrates the LLM memoization system working in a real scenario,
+//! showing cache hits/misses and cost savings.
+//!
+//! # Usage
+//!
+//! ```bash
+//! cargo run --example memo_cache
+//! ```
+//!
+//! # Environment
+//!
+//! Set OPENAI_API_KEY or ANTHROPIC_API_KEY for full functionality.
+//! The example will still run without API keys (using fallback mode).
+
+use chrono::Duration;
+use vectorless::memo::{MemoKey, MemoOpType, MemoStore, MemoValue};
+
+fn print_separator(title: &str) {
+    println!("\n{}", "=".repeat(60));
+    println!("  {}", title);
+    println!("{}", "=".repeat(60));
+}
+
+fn main() -> vectorless::Result<()> {
+    println!("=== MemoStore Verification Example ===\n");
+
+    // ============================================================
+    // Part 1: Basic MemoStore Operations
+    // ============================================================
+    print_separator("Part 1: Basic Operations");
+
+    let store = MemoStore::new()
+        .with_ttl(Duration::days(7))
+        .with_model("gpt-4o")
+        .with_version(1);
+
+    println!("Created MemoStore with:");
+    println!("  - TTL: 7 days");
+    println!("  - Model: gpt-4o");
+    println!("  - Version: 1");
+
+    // Create a summary cache key
+    let content = "This is a long document about machine learning...";
+    let content_fp = vectorless::utils::fingerprint::Fingerprint::from_str(content);
+    let key = MemoKey::summary(&content_fp).with_model("gpt-4o").with_version(1);
+
+    println!("\nCache key created:");
+    println!("  - Op type: {:?}", key.op_type);
+    println!("  - Input FP: {}", key.input_fp);
+
+    // Check cache (should miss)
+    println!("\nChecking cache (first time)...");
+    let cached = store.get(&key);
+    println!("  Cache hit: {}", cached.is_some());
+
+    // Store a value
+    println!("\nStoring summary...");
+    let summary = "Machine learning is a subset of AI that enables systems to learn from data.";
+    store.put_with_tokens(key.clone(), MemoValue::Summary(summary.to_string()), 500);
+    println!("  Stored: \"{}\"", summary);
+    println!("  Tokens saved estimate: 500");
+
+    // Check cache again (should hit)
+    println!("\nChecking cache (second time)...");
+    let cached = store.get(&key);
+    println!("  Cache hit: {}", cached.is_some());
+    if let Some(value) = cached {
+        println!("  Value: \"{}\"", value.as_summary().unwrap_or("(not a summary)"));
+    }
+
+    // ============================================================
+    // Part 2: Statistics Tracking
+    // ============================================================
+    print_separator("Part 2: Statistics Tracking");
+
+    // Create a new store for this demo
+    let store = MemoStore::with_capacity(100)
+        .with_model("gpt-4o-mini");
+
+    println!("Simulating cache usage...\n");
+
+    // Simulate 10 operations
+    let operations = [
+        ("doc1", "Content about Rust programming"),
+        ("doc2", "Introduction to machine learning"),
+        ("doc1", "Content about Rust programming"), // Repeat - should hit
+        ("doc3", "Deep learning fundamentals"),
+        ("doc2", "Introduction to machine learning"), // Repeat - should hit
+        ("doc1", "Content about Rust programming"), // Repeat - should hit
+        ("doc4", "Natural language processing"),
+        ("doc3", "Deep learning fundamentals"), // Repeat - should hit
+        ("doc5", "Computer vision basics"),
+        ("doc2", "Introduction to machine learning"), // Repeat - should hit
+    ];
+
+    let mut hits = 0u64;
+    let mut misses = 0u64;
+
+    for (i, (doc_id, content)) in operations.iter().enumerate() {
+        let content_fp = vectorless::utils::fingerprint::Fingerprint::from_str(content);
+        let key = MemoKey::summary(&content_fp);
+
+        if let Some(_value) = store.get(&key) {
+            hits += 1;
+            println!("  [{:2}] {} - CACHE HIT", i + 1, doc_id);
+        } else {
+            misses += 1;
+            println!("  [{:2}] {} - cache miss (storing...)", i + 1, doc_id);
+            store.put_with_tokens(key, MemoValue::Summary(format!("Summary of {}", content)), 100);
+        }
+    }
+
+    println!("\nStatistics:");
+    println!("  - Hits: {}", hits);
+    println!("  - Misses: {}", misses);
+    println!("  - Hit rate: {:.1}%", (hits as f64 / (hits + misses) as f64) * 100.0);
+
+    // ============================================================
+    // Part 3: Cache Invalidation
+    // ============================================================
+    print_separator("Part 3: Cache Invalidation");
+
+    let store = MemoStore::new().with_model("gpt-4o");
+
+    // Store different operation types
+    let fp1 = vectorless::utils::fingerprint::Fingerprint::from_str("content1");
+    let fp2 = vectorless::utils::fingerprint::Fingerprint::from_str("content2");
+
+    store.put(MemoKey::summary(&fp1), MemoValue::Summary("Summary 1".to_string()));
+    store.put(MemoKey::summary(&fp2), MemoValue::Summary("Summary 2".to_string()));
+    store.put(
+        MemoKey::pilot_decision(&fp1, &fp2),
+        MemoValue::PilotDecision(vectorless::memo::PilotDecisionValue {
+            selected_idx: 0,
+            confidence: 0.9,
+            reasoning: "Test decision".to_string(),
+        }),
+    );
+
+    println!("Stored 3 entries:");
+    println!("  - 2 Summary entries");
+    println!("  - 1 PilotDecision entry");
+    println!("  - Total: {} entries", store.len());
+
+    // Invalidate by operation type
+    println!("\nInvalidating all Summary entries...");
+    let removed = store.invalidate_by_op_type(MemoOpType::Summary);
+    println!("  Removed: {} entries", removed);
+    println!("  Remaining: {} entries", store.len());
+
+    // ============================================================
+    // Part 4: Persistence
+    // ============================================================
+    print_separator("Part 4: Persistence");
+
+    let temp_dir = tempfile::TempDir::new().expect("Failed to create temp dir");
+    let cache_path = temp_dir.path().join("memo_cache.json");
+
+    println!("Cache path: {:?}", cache_path);
+
+    // Create and populate store
+    let store = MemoStore::new().with_model("gpt-4o");
+
+    for i in 0..5 {
+        let content = format!("Document content {}", i);
+        let fp = vectorless::utils::fingerprint::Fingerprint::from_str(&content);
+        store.put(
+            MemoKey::summary(&fp),
+            MemoValue::Summary(format!("Summary {}", i)),
+        );
+    }
+    println!("Created store with {} entries", store.len());
+
+    // Note: save/load are async, skip for this sync example
+    println!("\n(Async save/load skipped in sync example)");
+    println!("Use store.save(&path).await and store.load(&path).await in async context");
+
+    // ============================================================
+    // Part 5: Real-World Scenario Simulation
+    // ============================================================
+    print_separator("Part 5: Real-World Scenario");
+
+    println!("Simulating a document query session...\n");
+
+    let store = MemoStore::new()
+        .with_ttl(Duration::hours(24))
+        .with_model("gpt-4o-mini");
+
+    // Simulate multiple queries to the same document
+    let document_content = r#"
+        # Vectorless Documentation
+
+        Vectorless is a hierarchical, reasoning-native document intelligence engine.
+        It provides tree-based document understanding without vector databases.
+
+        ## Features
+        - Multi-format parsing (Markdown, PDF, DOCX)
+        - LLM-powered summarization
+        - Adaptive retrieval strategies
+    "#;
+
+    let doc_fp = vectorless::utils::fingerprint::Fingerprint::from_str(document_content);
+
+    // Simulate query context fingerprints
+    let queries = [
+        ("What is Vectorless?", 0.85),
+        ("How does it work?", 0.72),
+        ("What formats are supported?", 0.91),
+        ("What is Vectorless?", 0.85),  // Repeat
+        ("How does it work?", 0.72),    // Repeat
+    ];
+
+    println!("Processing {} queries...\n", queries.len());
+
+    for (i, (query, confidence)) in queries.iter().enumerate() {
+        let query_fp = vectorless::utils::fingerprint::Fingerprint::from_str(query);
+        let key = MemoKey::pilot_decision(&doc_fp, &query_fp);
+
+        if let Some(_value) = store.get(&key) {
+            println!("  [{:2}] \"{}\" - CACHED (confidence: {:.2})", i + 1, query, confidence);
+        } else {
+            println!("  [{:2}] \"{}\" - Computing... (confidence: {:.2})", i + 1, query, confidence);
+            store.put_with_tokens(
+                key,
+                MemoValue::PilotDecision(vectorless::memo::PilotDecisionValue {
+                    selected_idx: 0,
+                    confidence: *confidence as f32,
+                    reasoning: format!("Reasoning for: {}", query),
+                }),
+                150, // ~150 tokens per pilot decision
+            );
+        }
+    }
+
+    // Final statistics
+    // Note: get() updates entry-level hits, but global stats are only
+    // updated by get_or_compute(). For accurate global stats, use get_or_compute.
+    println!("\n=== Final Statistics ===");
+    println!("  Cache entries: {}", store.len());
+    println!("\nNote: Global stats (hits/misses/tokens_saved) are tracked by");
+    println!("get_or_compute(), not by direct get() calls. For accurate tracking,");
+    println!("use get_or_compute() in production code.");
+
+    // Cost estimation (based on manual tracking above)
+    let manual_hits = 2u64; // Queries 4 and 5 were cache hits
+    let tokens_per_decision = 150u64;
+    let tokens_saved = manual_hits * tokens_per_decision;
+    let cost_per_1k_tokens = 0.0015; // GPT-4o-mini input
+    let saved_cost = (tokens_saved as f64 / 1000.0) * cost_per_1k_tokens;
+    println!("\n  Manual calculation:");
+    println!("    Cache hits: {}", manual_hits);
+    println!("    Tokens saved: {}", tokens_saved);
+    println!("    Estimated cost saved: ${:.4}", saved_cost);
+
+    println!("\n=== Verification Complete ===");
+    println!("MemoStore is working correctly!");
+
+    Ok(())
+}
diff --git a/src/index/incremental/detector.rs b/src/index/incremental/detector.rs
index 748d0f2b..73c018b2 100644
--- a/src/index/incremental/detector.rs
+++ b/src/index/incremental/detector.rs
@@ -232,18 +232,12 @@ impl ChangeDetector {
         current_mtime > *recorded_mtime
     }
 
-    /// Check if content needs reindexing based on simple hash.
+    /// Check if content needs reindexing based on fingerprint.
     pub fn needs_reindex_by_hash(&self, doc_id: &str, content: &str) -> bool {
-        let current_hash = Self::hash_content(content);
+        let current_fp = Fingerprint::from_str(content);
 
         match self.content_fps.get(doc_id) {
-            Some(recorded_fp) => {
-                // Compare first 8 bytes of fingerprint to hash
-                let recorded_hash = u64::from_le_bytes(
-                    recorded_fp.as_bytes()[..8].try_into().unwrap_or([0u8; 8]),
-                );
-                recorded_hash != current_hash
-            }
+            Some(recorded_fp) => recorded_fp != &current_fp,
             None => true,
         }
     }