Skip to content
Merged

Dev #33

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
251 changes: 33 additions & 218 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,20 @@
<div align="center">

<img src="https://raw.githubusercontent.com/vectorlessflow/vectorless/main/docs/design/logo-horizontal.svg" alt="Vectorless" width="400">

<h1>Document inteligence engine for AI</h1>
<div align="center">
<img src="https://raw.githubusercontent.com/vectorlessflow/vectorless/main/docs/design/lovable-vectorless.png" alt="Vectorless" width="100" style="vertical-align:middle;">
&nbsp;
<span style="font-size:48px; font-weight:800; vertical-align:middle; color:#AF788B;">
Vectorless
</span>
</div>

<h1>Reasoning-native Document Intelligence Engine</h1>

[![PyPI](https://img.shields.io/pypi/v/vectorless.svg)](https://pypi.org/project/vectorless/)
[![Python](https://img.shields.io/pypi/pyversions/vectorless.svg)](https://pypi.org/project/vectorless/)
[![PyPI Downloads](https://static.pepy.tech/badge/vectorless/month)](https://pepy.tech/projects/vectorless)
[![Crates.io](https://img.shields.io/crates/v/vectorless.svg)](https://crates.io/crates/vectorless)
[![Crates.io Downloads](https://img.shields.io/crates/d/vectorless.svg)](https://crates.io/crates/vectorless)
[![Crates.io Downloads](https://img.shields.io/crates/v/vectorless.svg)](https://crates.io/crates/vectorless)
[![Docs](https://docs.rs/vectorless/badge.svg)](https://docs.rs/vectorless)
[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE)
[![Rust](https://img.shields.io/badge/rust-1.85%2B-orange.svg)](https://www.rust-lang.org/)
Expand All @@ -18,102 +23,40 @@

**Vectorless** is an ultra-performant reasoning-native document intelligence engine for AI, with the core written in Rust. It transforms documents into rich semantic trees and uses LLMs to intelligently traverse the hierarchy — retrieving the most relevant content through structural reasoning and deep contextual understanding.

⭐ Drop a star to help us grow!

## How It Works

<img src="https://raw.githubusercontent.com/vectorlessflow/vectorless/main/docs/design/how-it-works.svg" alt="How it works">

### 1. Index: Build a Navigable Tree

```
Technical Manual (root)
├── Chapter 1: Introduction
├── Chapter 2: Architecture
│ ├── 2.1 System Design
│ └── 2.2 Implementation
└── Chapter 3: API Reference
```

Each node gets an AI-generated summary, enabling fast navigation.

### 2. Query: Navigate with LLM

When you ask "How do I reset the device?":

1. **Analyze** — Understand query intent and complexity
2. **Navigate** — LLM guides tree traversal
3. **Retrieve** — Return the exact section with context
4. **Verify** — Check if more information is needed
<img src="https://raw.githubusercontent.com/vectorlessflow/vectorless/main/docs/design/positioning.svg" alt="Vectorless" width="550">

## Traditional RAG vs Vectorless

<img src="https://raw.githubusercontent.com/vectorlessflow/vectorless/main/docs/design/comparison.svg" alt="Traditional RAG vs Vectorless">

| Aspect | Traditional RAG | Vectorless |
|--------|----------------|------------|
| **Infrastructure** | Vector DB + Embedding Model | Just LLM API |
| **Document Structure** | Lost in chunking | Preserved |
| **Context** | Fragment only | Section + surrounding context |
| **Setup Time** | Hours to Days | Minutes |
| **Best For** | Unstructured text | Structured documents |

## Example

**Input:**
```
Document: 100-page technical manual (PDF)
Query: "How do I reset the device?"
```
## Quick Start

**Output:**
```
Answer: "To reset the device, hold the power button for 10 seconds
until the LED flashes blue, then release..."
### Install

Source: Chapter 4 > Section 4.2 > Reset Procedure
```bash
pip install vectorless
```

## When to Use

✅ **Good fit:**
- Technical documentation
- Manuals and guides
- Structured reports
- Policy documents
- Any document with clear hierarchy

❌ **Not ideal:**
- Unstructured text (tweets, chat logs)
- Very short documents (< 1 page)
- Pure Q&A datasets without structure

## Quick Start

<details open>
<summary><b>Python</b></summary>
### Set your API key

```bash
pip install vectorless
export OPENAI_API_KEY="sk-..."
```

### Index and Query

```python
from vectorless import Engine, IndexContext

# Create engine (uses OPENAI_API_KEY env var)
# Create engine with a workspace directory
engine = Engine(workspace="./data")

# Index a document
ctx = IndexContext.from_file("./report.pdf")
doc_id = engine.index(ctx)
# Index a document (PDF, Markdown, DOCX, HTML)
doc_id = engine.index(IndexContext.from_file("./report.pdf"))

# Query
result = engine.query(doc_id, "What is the total revenue?")
print(f"Answer: {result.content}")
print(result.content)
print(f"Score: {result.score}")
```

</details>

<details>
<summary><b>Rust</b></summary>

Expand All @@ -122,158 +65,30 @@ print(f"Answer: {result.content}")
vectorless = "0.1"
```

```bash
cp vectorless.example.toml ./vectorless.toml
```

```rust
use vectorless::Engine;
use vectorless::client::{Engine, EngineBuilder, IndexContext};

#[tokio::main]
async fn main() -> vectorless::Result<()> {
let client = Engine::builder()
.with_workspace("./workspace")
.build()?;

let doc_id = client.index("./document.pdf").await?;
let engine = EngineBuilder::new()
.with_workspace("./data")
.build()
.await?;

let result = client.query(&doc_id,
"What are the system requirements?").await?;
// Index
let doc_id = engine.index(IndexContext::from_path("./report.pdf")).await?;

// Query
let result = engine.query(&doc_id, "What is the total revenue?").await?;
println!("Answer: {}", result.content);
println!("Source: {}", result.path);

Ok(())
}
```

</details>

## Features

| Feature | Description |
|---------|-------------|
| **Zero Infrastructure** | No vector DB, no embedding model — just an LLM API |
| **Multi-format Support** | PDF, Markdown, DOCX, HTML out of the box |
| **Incremental Updates** | Add/remove documents without full re-index |
| **Traceable Results** | See the exact navigation path taken |
| **Feedback Learning** | Improves from user feedback over time |
| **Multi-turn Queries** | Handles complex questions with decomposition |

## Configuration

### Zero Configuration (Recommended)

Just set `OPENAI_API_KEY` and you're ready to go:

```bash
export OPENAI_API_KEY="sk-..."
```

<details>
<summary><b>Python</b></summary>

```python
from vectorless import Engine

# Uses OPENAI_API_KEY from environment
engine = Engine(workspace="./data")
```

</details>

<details>
<summary><b>Rust</b></summary>

```rust
use vectorless::Engine;

let client = Engine::builder()
.with_workspace("./workspace")
.build().await?;
```

</details>

### Environment Variables

| Variable | Description |
|----------|-------------|
| `OPENAI_API_KEY` | LLM API key |
| `VECTORLESS_MODEL` | Default model (e.g., `gpt-4o-mini`) |
| `VECTORLESS_ENDPOINT` | API endpoint URL |
| `VECTORLESS_WORKSPACE` | Workspace directory |

### Advanced Configuration

For fine-grained control, use a config file:

```bash
cp config.toml ./vectorless.toml
```

<details>
<summary><b>Python</b></summary>

```python
from vectorless import Engine

# Use full configuration file
engine = Engine(config_path="./vectorless.toml")

# Or override specific settings
engine = Engine(
config_path="./vectorless.toml",
model="gpt-4o", # Override model from config
)
```

</details>

<details>
<summary><b>Rust</b></summary>

```rust
use vectorless::Engine;

// Use full configuration file
let client = Engine::builder()
.with_config_path("./vectorless.toml")
.build().await?;

// Or override specific settings
let client = Engine::builder()
.with_config_path("./vectorless.toml")
.with_model("gpt-4o", None) // Override model
.build().await?;
```

</details>

### Configuration Priority

Later overrides earlier:

1. Default configuration
2. Auto-detected config file (`vectorless.toml`, `config.toml`, `.vectorless.toml`)
3. Explicit config file (`config_path` / `with_config_path`)
4. Environment variables
5. Constructor/builder parameters (highest priority)

## Architecture

<img src="https://raw.githubusercontent.com/vectorlessflow/vectorless/main/docs/design/architecture.svg" alt="Architecture">

### Core Components

- **Index Pipeline** — Parses documents, builds tree, generates summaries
- **Retrieval Pipeline** — Analyzes query, navigates tree, returns results
- **Pilot** — LLM-powered navigator that guides retrieval decisions
- **Metrics Hub** — Unified observability for LLM calls, retrieval, and feedback

## Examples

See the [examples/](examples/) directory for more usage patterns.
See [examples/](examples/) for more Rust patterns — streaming, document graph, custom pilot, cross-document retrieval, and more.|

## Contributing

Expand Down
54 changes: 0 additions & 54 deletions docs/README.md

This file was deleted.

Loading