Skip to content
Merged

Dev #46

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,10 @@ workspace/
test_output/
temp/

# Documentation
docs/_build/
# Documentation (Docusaurus)
docs/node_modules/
docs/build/
docs/.docusaurus/

# Benchmarks
.criterion/
Expand Down
3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,8 @@ rand = "0.8"
bm25 = { version = "2.3.2", features = ["parallelism"] }

# Python bindings
pyo3 = { version = "0.22", features = ["extension-module"] }
pyo3 = { version = "0.28", features = ["extension-module"] }
pyo3-async-runtimes = { version = "0.28", features = ["tokio-runtime"] }

# Dev dependencies
tempfile = "3.10"
Expand Down
35 changes: 19 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<div align="center">

<img src="https://raw.githubusercontent.com/vectorlessflow/vectorless/main/docs/design/with-title.png" alt="Vectorless" width="400" style="vertical-align:middle;">
<img src="https://vectorless.dev/img/with-title.png" alt="Vectorless" width="400" style="vertical-align:middle;">

<h1>Reasoning-native Document Intelligence Engine</h1>

Expand All @@ -27,23 +27,26 @@ pip install vectorless
### Index and Query

```python
import asyncio
from vectorless import Engine, IndexContext

# Create engine — api_key and model are required
engine = Engine(
workspace="./data",
api_key="sk-...",
model="gpt-4o",
)

# Index a document (PDF or Markdown)
result = engine.index(IndexContext.from_file("./report.pdf"))
doc_id = result.doc_id

# Query
result = engine.query(doc_id, "What is the total revenue?")
print(result.content)
print(f"Score: {result.score}")
async def main():
# Create engine — api_key and model are required
engine = Engine(
workspace="./data",
api_key="sk-...",
model="gpt-4o",
)

# Index a document (PDF or Markdown)
result = await engine.index(IndexContext.from_file("./report.pdf"))
doc_id = result.doc_id

# Query
result = await engine.query(doc_id, "What is the total revenue?")
print(result.single().content)

asyncio.run(main())
```

<details>
Expand Down
20 changes: 20 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Dependencies
/node_modules

# Production
/build

# Generated files
.docusaurus
.cache-loader

# Misc
.DS_Store
.env.local
.env.development.local
.env.test.local
.env.production.local

npm-debug.log*
yarn-debug.log*
yarn-error.log*
41 changes: 41 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Website

This website is built using [Docusaurus](https://docusaurus.io/), a modern static website generator.

## Installation

```bash
yarn
```

## Local Development

```bash
yarn start
```

This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.

## Build

```bash
yarn build
```

This command generates static content into the `build` directory and can be served using any static contents hosting service.

## Deployment

Using SSH:

```bash
USE_SSH=true yarn deploy
```

Not using SSH:

```bash
GIT_USER=<Your GitHub username> yarn deploy
```

If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch.
84 changes: 84 additions & 0 deletions docs/blog/2026-04-12-welcome/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
slug: welcome
title: Welcome to Vectorless
authors: [zTgx]
tags: [vectorless, rag, llm, announcement]
---

Vectorless is a reasoning-native document intelligence engine written in Rust — **no vector database, no embeddings, no similarity search**.

{/* truncate */}

## Why Vectorless?

Traditional RAG systems rely on vector embeddings and similarity search. This approach loses document structure, requires a vector database, and often returns chunks that lack context.

Vectorless takes a different path:

- **Hierarchical Semantic Trees** — Documents are parsed into a tree of sections, preserving structure and relationships.
- **LLM Navigation** — Queries are resolved by intelligently traversing the tree, not by comparing vectors.
- **Zero Infrastructure** — No vector DB, no embedding models, no similarity search. Just an LLM API key.

## Quick Start

### Python

```python
import asyncio
from vectorless import Engine, IndexContext

async def main():
engine = Engine(
workspace="./data",
api_key="sk-...",
model="gpt-4o",
)

# Index a document
result = await engine.index(IndexContext.from_file("./report.pdf"))
doc_id = result.doc_id

# Query
answer = await engine.query(doc_id, "What is the total revenue?")
print(answer.single().content)

asyncio.run(main())
```

### Rust

```rust
use vectorless::{EngineBuilder, IndexContext, QueryContext};

#[tokio::main]
async fn main() -> vectorless::Result<()> {
let engine = EngineBuilder::new()
.with_workspace("./data")
.with_key("sk-...")
.with_model("gpt-4o")
.build()
.await?;

let result = engine.index(IndexContext::from_path("./report.pdf")).await?;
let doc_id = result.doc_id().unwrap();

let result = engine.query(
QueryContext::new("What is the total revenue?").with_doc_id(doc_id)
).await?;
println!("{}", result.content);

Ok(())
}
```

## What's Next?

- Cross-document relationship graph
- Incremental indexing with content fingerprinting
- Multi-format support (Markdown, PDF, DOCX)

The project is open source under Apache-2.0. Contributions welcome!

- [GitHub](https://github.com/vectorlessflow/vectorless)
- [PyPI](https://pypi.org/project/vectorless/)
- [crates.io](https://crates.io/crates/vectorless)
8 changes: 8 additions & 0 deletions docs/blog/authors.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
zTgx:
name: zTgx
title: Rust Developer
url: https://beautifularea.com
page: true
socials:
x: pendingcode
github: zTgx
4 changes: 4 additions & 0 deletions docs/blog/tags.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
hello:
label: Hello
permalink: /hello
description: Hello tag description
84 changes: 84 additions & 0 deletions docs/docs/intro.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
sidebar_position: 1
---

# Introduction

**Vectorless** is a reasoning-native document intelligence engine written in Rust — **no vector database, no embeddings, no similarity search**.

It transforms documents into hierarchical semantic trees and uses LLMs to navigate the structure, retrieving the most relevant content through deep contextual understanding instead of vector math.

## How It Works

1. **Parse** — Documents (Markdown, PDF) are parsed into hierarchical semantic trees, preserving structure and relationships between sections.
2. **Index** — Trees are stored with metadata, keywords, and optional summaries. Incremental indexing skips unchanged files via content fingerprinting.
3. **Query** — An LLM navigates the tree to find the most relevant sections. No embeddings, no similarity search — just structural reasoning.

## Quick Start

### Python

```bash
pip install vectorless
```

```python
import asyncio
from vectorless import Engine, IndexContext

async def main():
engine = Engine(
workspace="./data",
api_key="sk-...",
model="gpt-4o",
)

result = await engine.index(IndexContext.from_file("./report.pdf"))
doc_id = result.doc_id

answer = await engine.query(doc_id, "What is the total revenue?")
print(answer.single().content)

asyncio.run(main())
```

### Rust

```toml
[dependencies]
vectorless = "0.1"
tokio = { version = "1", features = ["full"] }
```

```rust
use vectorless::{EngineBuilder, IndexContext, QueryContext};

#[tokio::main]
async fn main() -> vectorless::Result<()> {
let engine = EngineBuilder::new()
.with_workspace("./data")
.with_key("sk-...")
.with_model("gpt-4o")
.build()
.await?;

let result = engine.index(IndexContext::from_path("./report.pdf")).await?;
let doc_id = result.doc_id().unwrap();

let result = engine.query(
QueryContext::new("What is the total revenue?").with_doc_id(doc_id)
).await?;
println!("{}", result.content);

Ok(())
}
```

## Features

- **Hierarchical Semantic Trees** — Preserves document structure, not flat chunks
- **LLM-Powered Retrieval** — Structural reasoning over the tree, not vector similarity
- **Incremental Indexing** — Content fingerprinting skips unchanged files
- **Cross-Document Graph** — Automatic relationship discovery between documents
- **Multi-Format** — Markdown and PDF support
- **Zero Infrastructure** — No vector DB, no embedding models, just an LLM API key
Loading