zvec-ai · richyreachy · Apr 24, 2026 · Apr 24, 2026 · Apr 28, 2026 · feihongxu0824
diff --git a/content/docs/en/db/concepts/vector-index/diskann-index.mdx b/content/docs/en/db/concepts/vector-index/diskann-index.mdx
@@ -0,0 +1,77 @@
+---
+title: DiskANN Index
+description: Disk-based Approximate Nearest Neighbor
+---
+
+
+import { CodeExampleLinkButton, PythonLinkButton, NodeJSLinkButton } from '@/components/LinkButton';
+
+A disk-based graph index designed for **billion-scale** vector search — keeping compressed vectors in memory and full-precision vectors on disk, enabling high-recall approximate nearest neighbor search with a **dramatically smaller memory footprint**.
+
+## How It Works
+
+DiskANN builds a [Vamana graph](https://proceedings.neurips.cc/paper_files/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf) over the full dataset and stores it on disk along with the original vectors. At search time, only compressed PQ (Product Quantization) codes live in memory, while the graph and full-precision vectors are read from disk on demand.
+
+- **Vamana graph for navigation** 🪜
+  - A single-layer graph where each node is connected to up to `max_degree` neighbors.
+  - The graph is built using a greedy search-and-prune strategy with an **alpha parameter** that encourages long-range edges, providing fast convergence to the query's neighborhood.
+  - A **medoid** (the point closest to the dataset's centroid) serves as the fixed entry point for every search.
+- **Product Quantization (PQ) for distance estimation** 🔍
+  - The vector space is split into `pq_chunk_num` sub-spaces, and each sub-vector is quantized into a 256-centroid codebook (8-bit PQ codes).
+  - At query time, a **PQ distance lookup table** is precomputed for the query, allowing approximate distances to all candidates to be computed via fast table lookups instead of full-precision arithmetic.
+- **Cached beam search** 🔎
+  - Search starts from the medoid and explores the graph using a **beam search** strategy — multiple frontier nodes are expanded concurrently via batched disk I/O.
+  - Frequently accessed nodes near the entry point are cached in memory (**BFS-level caching**), reducing disk reads for hot regions of the graph.
+  - For each visited node, approximate distances are computed using PQ codes, and the full-precision vector is read from disk to compute the exact distance for the top-k result candidates.
+
+## When to Use a DiskANN Index?
+
+- ✅ Billion-scale datasets that **cannot fit entirely in memory**
+- ✅ Cost-sensitive deployments where minimizing RAM is critical
+- ✅ Batch workloads and offline analytics that can tolerate slightly higher latency per query compared to in-memory indexes
+
+<Callout className="text-base" type="idea">
+  **Best Practice**: Use DiskANN when your dataset far exceeds available RAM. It provides strong recall with memory consumption proportional only to PQ codes, not full vectors. For datasets that fit in memory, prefer [HNSW](../hnsw-index/) or [HNSW-RaBitQ](../hnsw-rabitq-index/) for lower latency.
+</Callout>
+
+## Advantages
+
+1. ✨ **Extremely low memory footprint** — Only PQ-compressed codes (1 byte per chunk per vector) reside in memory, making billion-scale search feasible on commodity hardware
+1. ✨ **High recall** — The Vamana graph preserves connectivity and diversity through its alpha-based pruning, and exact distances are recomputed for final candidates
+1. ✨ **Scalable graph construction** — Built with a simple greedy insert-and-prune algorithm that can be parallelized across threads
+
+## Trade-offs
+
+1. ⚠️ **Higher query latency** — Each search requires disk I/O for graph traversal, making it slower than pure in-memory indexes like HNSW
+1. ⚠️ **Build-time PQ training** — Requires a KMeans-based PQ training step before index construction, adding to the total build time
+1. ⚠️ **Not suited for real-time workloads** — Disk access latency means DiskANN is better for latency-tolerant use cases or scenarios with relatively low QPS requirements
+
+## Key Parameters
+
+<Callout className="text-base" type="idea">
+  **Tuning Tip**:
+  Start with defaults. Adjust `list_size` at query time first for recall/latency trade-offs. Only increase `max_degree` if you need better recall — but expect higher disk usage and longer build times. Reduce `pq_chunk_num` if you need to cut memory further and can tolerate lower recall.
+</Callout>
+
+### Index-Time Parameters
+
+<div className="flex flex-row flex-wrap gap-3 items-center">
+  <PythonLinkButton url="/api-reference/python/params/#zvec.model.param.DiskAnnIndexParam" label="Python API Reference" />
+</div>
+
+| Parameter | Description | Tuning Guidance |
+| --------- | ----------- | --------------- |
+| `metric_type` | **Similarity metric** used to compare vectors | Choose based on how your embeddings were trained |
+| `max_degree` | **Max neighbors per node** — The maximum number of edges per node in the Vamana graph | • Higher `max_degree` → <br /> ✨ better recall and graph connectivity <br /> ⚠️ more disk usage and longer build time |
+| `list_size` | **Build-time candidate list size** — Number of candidates considered during graph construction when inserting a new vector | • Higher `list_size` → <br /> ✨ better graph quality and higher recall <br /> ⚠️ longer index build time |
+| `pq_chunk_num` | **Number of PQ sub-spaces** — Controls how the vector dimensions are partitioned for Product Quantization | • More chunks → <br /> ✨ finer-grained distance approximation and better recall <br /> ⚠️ more memory for PQ codes (1 byte per chunk per vector) |
+
+### Query-Time Parameters
+
+<div className="flex flex-row flex-wrap gap-3 items-center">
+  <PythonLinkButton url="/api-reference/python/params/#zvec.model.param.DiskAnnQueryParam" label="Python API Reference" />
+</div>
+
+| Parameter | Description | Tuning Guidance |
+| --------- | ----------- | --------------- |
+| `list_size` | **Query-time candidate list size** — Determines how many candidates are maintained during beam search graph traversal | • Higher `list_size` → <br /> ✨ higher recall <br /> ⚠️ more disk I/O and higher query latency |
diff --git a/content/docs/en/db/concepts/vector-index/index.mdx b/content/docs/en/db/concepts/vector-index/index.mdx
@@ -49,6 +49,7 @@ Zvec supports the following vector index types, each suited to different use cas
 1. [Flat (Brute-Force) index](./flat-index/)
 1. [HNSW (Hierarchical Navigable Small World)](./hnsw-index/)
 1. [HNSW-RaBitQ (HNSW with RaBitQ Quantization)](./hnsw-rabitq-index/)
+1. [DiskANN (Disk-based Approximate Nearest Neighbor)](./diskann-index/)
 1. [IVF (Inverted File Index)](./ivf-index/)
 
 <Callout className="text-base" type="idea">

diff --git a/content/docs/en/db/concepts/vector-index/meta.json b/content/docs/en/db/concepts/vector-index/meta.json
@@ -4,6 +4,7 @@
     "flat-index",
     "hnsw-index",
     "hnsw-rabitq-index",
+    "diskann-index",
     "ivf-index",
     "quantization"
   ],

diff --git a/content/docs/zh/db/concepts/vector-index/diskann-index.mdx b/content/docs/zh/db/concepts/vector-index/diskann-index.mdx
@@ -0,0 +1,77 @@
+---
+title: DiskANN 索引
+description: 基于磁盘的近似最近邻
+---
+
+
+import { CodeExampleLinkButton, PythonLinkButton, NodeJSLinkButton } from '@/components/LinkButton';
+
+一种基于磁盘的图索引，专为**十亿级**向量搜索设计 — 将压缩向量保存在内存中，全精度向量存储在磁盘上，实现高 Recall 的近似最近邻搜索，同时**大幅降低内存占用**。
+
+## 工作原理
+
+DiskANN 在完整数据集上构建 [Vamana 图](https://proceedings.neurips.cc/paper_files/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf)，并将其与原始向量一起存储在磁盘上。搜索时，仅 PQ（乘积量化）压缩编码驻留在内存中，图和全精度向量按需从磁盘读取。
+
+- **Vamana 图用于导航** 🪜
+  - 单层图结构，每个节点最多连接 `max_degree` 个邻居。
+  - 图的构建采用贪心搜索-剪枝策略，通过 **alpha 参数**鼓励长距离边，实现快速收敛到查询点的邻域。
+  - **中心点**（medoid，距数据集质心最近的点）作为每次搜索的固定入口。
+- **乘积量化（PQ）用于距离估计** 🔍
+  - 向量空间被划分为 `pq_chunk_num` 个子空间，每个子向量被量化为 256 个质心的码本（8-bit PQ 编码）。
+  - 查询时，预先计算查询向量的 **PQ 距离查找表**，通过快速表查找而非全精度运算来计算候选点的近似距离。
+- **带缓存的束搜索** 🔎
+  - 搜索从中心点开始，使用**束搜索**（beam search）策略探索图 — 多个前沿节点通过批量磁盘 I/O 并发扩展。
+  - 入口点附近的高频访问节点被缓存在内存中（**BFS 层级缓存**），减少热点区域的磁盘读取。
+  - 对于每个访问的节点，使用 PQ 编码计算近似距离，并从磁盘读取全精度向量为 top-k 候选结果计算精确距离。
+
+## 何时使用 DiskANN 索引？
+
+- ✅ **无法完全装入内存**的十亿级数据集
+- ✅ 对成本敏感、需要最小化内存消耗的部署场景
+- ✅ 可以容忍相比内存索引稍高延迟的批处理工作负载和离线分析
+
+<Callout className="text-base" type="idea">
+  **最佳实践**：当数据集远超可用内存时使用 DiskANN。它以仅与 PQ 编码成比例的内存消耗提供出色的 Recall。如果数据集可以装入内存，建议使用 [HNSW](../hnsw-index/) 或 [HNSW-RaBitQ](../hnsw-rabitq-index/) 以获得更低延迟。
+</Callout>
+
+## 优势
+
+1. ✨ **极低的内存占用** — 仅 PQ 压缩编码（每向量每 chunk 1 字节）驻留内存，使十亿级搜索在普通硬件上成为可能
+1. ✨ **高 Recall** — Vamana 图通过基于 alpha 的剪枝保持连通性和多样性，最终候选结果重新计算精确距离
+1. ✨ **可扩展的图构建** — 采用简单的贪心插入-剪枝算法，可跨线程并行化
+
+## 权衡
+
+1. ⚠️ **较高的查询延迟** — 每次搜索需要磁盘 I/O 进行图遍历，比纯内存索引（如 HNSW）慢
+1. ⚠️ **构建时 PQ 训练开销** — 索引构建前需要基于 KMeans 的 PQ 训练步骤，增加总构建时间
+1. ⚠️ **不适合实时工作负载** — 磁盘访问延迟意味着 DiskANN 更适合面向可容忍延迟或QPS要求相对低的场景
+
+## 关键参数
+
+<Callout className="text-base" type="idea">
+  **调参建议**：
+  从默认值开始。先调整查询时的 `list_size` 来权衡 Recall 和延迟。仅在需要更好 Recall 时增加 `max_degree` — 但预期会增加磁盘占用和构建时间。如果需要进一步降低内存且可接受稍低 Recall，可减少 `pq_chunk_num`。
+</Callout>
+
+### 索引构建参数
+
+<div className="flex flex-row flex-wrap gap-3 items-center">
+  <PythonLinkButton url="/api-reference/python/params/#zvec.model.param.DiskAnnIndexParam" label="Python API Reference" />
+</div>
+
+| 参数 | 描述 | 调参指南 |
+| --------- | ----------- | --------------- |
+| `metric_type` | 用于比较向量的**相似度度量** | 根据 Embedding 模型的训练方式选择 |
+| `max_degree` | **每个节点的最大邻居数** — Vamana 图中每个节点的最大边数 | • 更大的 `max_degree` → <br /> ✨ 更好的 Recall 和图连通性 <br /> ⚠️ 更多磁盘占用和更长的构建时间 |
+| `list_size` | **构建时候选列表大小** — 插入新向量时图构建过程中考虑的候选数量 | • 更大的 `list_size` → <br /> ✨ 更好的图质量和更高的 Recall <br /> ⚠️ 更长的索引构建时间 |
+| `pq_chunk_num` | **PQ 子空间数量** — 控制向量维度如何划分以进行乘积量化 | • 更多 chunk → <br /> ✨ 更精细的距离近似和更好的 Recall <br /> ⚠️ PQ 编码占用更多内存（每向量每 chunk 1 字节） |
+
+### 索引查询参数
+
+<div className="flex flex-row flex-wrap gap-3 items-center">
+  <PythonLinkButton url="/api-reference/python/params/#zvec.model.param.DiskAnnQueryParam" label="Python API Reference" />
+</div>
+
+| 参数 | 描述 | 调参指南 |
+| --------- | ----------- | --------------- |
+| `list_size` | **查询时候选列表大小** — 束搜索图遍历时维护的候选数量 | • 更大的 `list_size` → <br /> ✨ 更高的 Recall <br /> ⚠️ 更多磁盘 I/O 和更高的查询延迟 |
diff --git a/content/docs/zh/db/concepts/vector-index/index.mdx b/content/docs/zh/db/concepts/vector-index/index.mdx
@@ -49,6 +49,7 @@ Zvec 支持以下向量索引类型，各自适用于不同的使用场景、数
 1. [Flat（暴力搜索）索引](./flat-index/)
 1. [HNSW（分层可导航小世界）](./hnsw-index/)
 1. [HNSW-RaBitQ（HNSW + RaBitQ 量化）](./hnsw-rabitq-index/)
+1. [DiskANN（基于磁盘的近似最近邻）](./diskann-index/)
 1. [IVF（倒排文件索引）](./ivf-index/)
 
 <Callout className="text-base" type="idea">

diff --git a/content/docs/zh/db/concepts/vector-index/meta.json b/content/docs/zh/db/concepts/vector-index/meta.json
@@ -4,6 +4,7 @@
     "flat-index",
     "hnsw-index",
     "hnsw-rabitq-index",
+    "diskann-index",
     "ivf-index",
     "quantization"
   ],