⚡ TurboQuant

8× vector database compression with 99% search accuracy. Zero training required.

Python implementation of TurboQuant (ICLR 2026) — near-optimal vector quantization for vector database compression.

What is this?

TurboQuant compresses high-dimensional embedding vectors (like OpenAI's text-embedding-3-large) from 32-bit to 4-bit per coordinate, cutting storage by 8× while maintaining 99% search accuracy.

Unlike traditional methods (Product Quantization, etc.), TurboQuant:

Requires zero training — no k-means, no codebook fitting
Works on any data — same quantizer for any dataset, any embedding model
Near-optimal — within 2.7× of the information-theoretic lower bound
Online — quantize each vector independently, no batch processing needed

Benchmark Results (1M Real OpenAI Embeddings)

Tested on 1M vectors from DBpedia (OpenAI text-embedding-3-large, d=1536):

Metric	Raw float32	TurboQuant 4-bit
Storage	6,143 MB	772 MB (8×)
Recall@1	100%	99.0%
Recall@10	100%	95.5%
MSE	0	0.009
Quantize time	—	47s (one-time)

The Dimension Tradeoff is Dead

Many teams truncate embeddings from 1536-d → 768-d to save storage. TurboQuant makes this obsolete:

Config	Storage per 1M	Quality
❌ float32 at 768-d	3.1 GB	Mediocre
✅ TurboQuant 4-bit at 1536-d	0.77 GB	Excellent

4× less storage AND better quality. Keep your full dimensions.

Quick Start

pip install numpy scipy
git clone https://github.com/Anrahya/turbo-quant.git
cd turbo-quant

Compress and search

import numpy as np
from turboquant import TurboQuantMSE, QuantizerConfig

# Your embedding vectors (n, 1536)
vectors = np.load("my_embeddings.npy")

# Quantize — zero training, just works
config = QuantizerConfig(bit_width=4, dimension=1536, seed=42)
quantizer = TurboQuantMSE(config)
compressed = quantizer.quantize(vectors)

print(f"Compressed: {compressed.size_bytes() / 1e6:.0f} MB")
# Compressed: 772 MB (was 6,143 MB)

# Search
query = np.random.randn(1, 1536).astype(np.float32)
indices, scores = quantizer.search(query, compressed, k=10)

Run the benchmark

# Quick test with synthetic data
python benchmark.py --n 50000 --d 1536 --bits 4

# With your own vectors
python benchmark.py --dataset path/to/vectors.npy --bits 4

# Full sweep
python benchmark.py --dataset vectors.npy --bits 1 2 3 4 --k 1 10 100

How It Works

TurboQuant is a 3-step process:

1. ROTATE      — Multiply by a random orthogonal matrix
                 → Every coordinate now follows the same bell curve

2. QUANTIZE    — Snap each coordinate to the nearest of 16 precomputed
                 optimal bin centers (Lloyd-Max quantizer)
                 → Store as 4-bit index

3. DEQUANTIZE  — Look up bin centers, reverse the rotation
                 → Approximate original vector (MSE ≈ 0.009)

Why does random rotation help? It converts any input distribution into a known one (Gaussian), letting us use the same optimal quantizer for all data — no training needed.

For details, see the technical deep-dive and the paper.

Project Structure

turbo-quant/
├── turboquant/
│   ├── types.py       — Data classes and interfaces (BaseQuantizer ABC)
│   ├── codebook.py    — Lloyd-Max optimal scalar quantizer
│   ├── rotation.py    — Random orthogonal rotation (QR of Gaussian)
│   ├── qjl.py         — QJL 1-bit sign projection (for inner-product variant)
│   ├── quantizer.py   — TurboQuantMSE + TurboQuantProd (Algorithms 1 & 2)
│   ├── storage.py     — Bit-packed save/load (.tqdb format)
│   └── search.py      — Brute-force search + recall@k metrics
├── docs/
│   ├── HOW_IT_WORKS.md   — Technical deep-dive (the math explained simply)
│   ├── USE_CASES.md      — Real-world applications with code examples
│   └── REPRODUCING.md    — Steps to reproduce our benchmark results
├── benchmark.py       — End-to-end benchmark script
├── FINDINGS.md        — Infrastructure cost analysis & results
└── pyproject.toml

Two Quantizer Variants

Variant	Use case	Bit budget
`TurboQuantMSE`	Vector search (nearest neighbor)	All b bits for MSE
`TurboQuantProd`	KV cache compression (unbiased attention)	(b-1) bits MSE + 1 bit QJL

For vector databases, use TurboQuantMSE — it achieves lower reconstruction error, which means better search ranking.

Infrastructure Impact

Scale	Raw float32	TurboQuant 4-bit	Cloud savings
1M vectors	6.1 GB	0.77 GB	70%
10M vectors	61 GB	7.7 GB	87%
50M vectors	307 GB	38 GB	88%
100M vectors	614 GB	77 GB	90%

At 10M+ vectors, databases that required distributed clusters now fit on a single server.

Citation

@inproceedings{shen2025turboquant,
  title={TurboQuant: Online Vector Quantization with Near-Optimal Distortion Rate},
  author={Shen, Hao and Silwal, Sandeep and Zakynthinou, Lydia and Silvestri, Francesco and Fanti, Giulia and Indyk, Piotr},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2026}
}

License

MIT — see LICENSE.

Acknowledgments

This is an independent implementation of the TurboQuant algorithm. The original paper is by researchers at Google Research and Carnegie Mellon University.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ TurboQuant

What is this?

Benchmark Results (1M Real OpenAI Embeddings)

The Dimension Tradeoff is Dead

Quick Start

Compress and search

Run the benchmark

How It Works

Project Structure

Two Quantizer Variants

Infrastructure Impact

Citation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
turboquant		turboquant
.gitignore		.gitignore
FINDINGS.md		FINDINGS.md
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

⚡ TurboQuant

What is this?

Benchmark Results (1M Real OpenAI Embeddings)

The Dimension Tradeoff is Dead

Quick Start

Compress and search

Run the benchmark

How It Works

Project Structure

Two Quantizer Variants

Infrastructure Impact

Citation

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages