Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 102 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
<div align="center">

<img src="https://vectorless.dev/img/with-title.png" alt="Vectorless" width="400" style="vertical-align:middle;">
<img src="https://vectorless.dev/img/with-title.png" alt="Vectorless" width="400">

<h1>Reasoning-native Document Intelligence Engine</h1>
<h1>Document Engine for AI</h1>

[![PyPI](https://img.shields.io/pypi/v/vectorless.svg)](https://pypi.org/project/vectorless/)
[![PyPI Downloads](https://static.pepy.tech/badge/vectorless/month)](https://pepy.tech/projects/vectorless)
Expand All @@ -13,45 +13,27 @@

</div>

**Vectorless** is a reasoning-native document intelligence engine written in Rust — **no vector database, no embeddings, no similarity search**. It transforms documents into hierarchical semantic trees and uses LLMs to navigate the structure, retrieving the most relevant content through deep contextual understanding instead of vector math.
**Vectorless** is a reasoning-native document engine designed to be the foundational layer for AI applications that need structured access to documents, with the core written in Rust. It does not use vector databases, embeddings, or similarity search. Instead, it transforms documents into hierarchical semantic trees and uses the LLM itself to navigate and retrieve — purely LLM-guided, from indexing to querying.

---

## Quick Start

### Install
## Why Vectorless

```bash
pip install vectorless
```

### Index and Query

```python
import asyncio
from vectorless import Engine, IndexContext, QueryContext
Most document retrieval solutions rely on vector similarity — splitting documents into chunks, embedding them, and searching by cosine distance. This works for rough topic matching, but breaks down when you need **precision**: specific numbers, cross-section references, or multi-step reasoning across a document.

async def main():
# Create engine — api_key and model are required
engine = Engine(
api_key="sk-...",
model="gpt-4o",
)
Vectorless takes a different approach. No vectors at all. It builds a **semantic tree index** of each document — preserving the original hierarchy — and uses the LLM itself to navigate that structure. The LLM generates the tree during indexing and reasons through it during retrieval. Pure LLM guidance, end to end.

# Index a document (PDF or Markdown)
result = await engine.index(IndexContext.from_path("./report.pdf"))
doc_id = result.doc_id
<div align="center">
<img src="https://vectorless.dev/img/workflow.svg" alt="Vectorless Workflow" width="720">
</div>

# Query
result = await engine.query(
QueryContext("What is the total revenue?").with_doc_ids([doc_id])
)
print(result.single().content)
<div align="center">
<img src="https://vectorless.dev/img/demo.gif" alt="Vectorless Demo" width="720">
</div>

asyncio.run(main())
```
## Quick Start

<details>
<summary><b>Rust</b></summary>
### Rust

```toml
[dependencies]
Expand All @@ -69,24 +51,106 @@ async fn main() -> vectorless::Result<()> {
.build()
.await?;

// Index
// Index a document
let result = engine.index(IndexContext::from_path("./report.pdf")).await?;
let doc_id = result.doc_id().unwrap();

// Query
let result = engine.query(
QueryContext::new("What is the total revenue?").with_doc_ids(vec![doc_id.to_string()])
QueryContext::new("What is the total revenue?")
.with_doc_ids(vec![doc_id.to_string()])
).await?;
println!("Answer: {}", result.content);
println!("{}", result.content);

Ok(())
}
```
</details>

### Python

```bash
pip install vectorless
```

```python
import asyncio
from vectorless import Engine, IndexContext, QueryContext

async def main():
engine = Engine(api_key="sk-...", model="gpt-4o")

# Index a document
result = await engine.index(IndexContext.from_path("./report.pdf"))
doc_id = result.doc_id

# Query
result = await engine.query(
QueryContext("What is the total revenue?").with_doc_ids([doc_id])
)
print(result.single().content)

asyncio.run(main())
```

## Core Concepts

### Semantic Tree Index

When you index a document, Vectorless builds a tree structure that mirrors the document's hierarchy:

```
Annual Report 2024
├── Executive Summary
│ ├── Financial Highlights
│ └── Strategic Outlook
├── Financial Statements
│ ├── Revenue Analysis ← "What is the total revenue?" lands here
│ ├── Operating Expenses
│ └── Net Income
└── Risk Factors
├── Market Risks
└── Regulatory Risks
```

Each node contains a summary generated by the LLM. During retrieval, the engine uses these summaries to reason about which path to follow — just like a human would scan a table of contents.

### Cross-Document Graph

When multiple documents are indexed, Vectorless builds a relationship graph connecting them through shared keywords and concepts. This enables queries across your entire document collection.

```python
# Query across all indexed documents
result = await engine.query(
QueryContext("Compare revenue trends across all reports")
)
```

### Workspace Persistence

Indexed documents are stored in a workspace — there's no need to reprocess files between sessions:

```python
engine = Engine(api_key="sk-...", model="gpt-4o")

# List all indexed documents
docs = await engine.list()
for doc in docs:
print(f"{doc.name} ({doc.format}) — {doc.page_count} pages")
```

## What It's For

Vectorless is designed for applications that need **precise** document retrieval:

- **Financial analysis** — Extract specific figures from reports, compare across filings
- **Legal research** — Find relevant clauses, trace definitions across documents
- **Technical documentation** — Navigate large manuals, locate specific procedures
- **Academic research** — Cross-reference findings across papers
- **Compliance** — Audit trails with source references for every answer

## Examples

See [examples](examples/) for more and stay tuned.
See [examples/](examples/) for complete usage patterns.

## Contributing

Expand Down
Binary file added docs/static/img/demo.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading