-
Notifications
You must be signed in to change notification settings - Fork 4
feat/diskann pages #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
richyreachy
wants to merge
3
commits into
main
Choose a base branch
from
feat/diskann_index_pages
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
77 changes: 77 additions & 0 deletions
77
content/docs/en/db/concepts/vector-index/diskann-index.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| --- | ||
| title: DiskANN Index | ||
| description: Disk-based Approximate Nearest Neighbor | ||
| --- | ||
|
|
||
|
|
||
| import { CodeExampleLinkButton, PythonLinkButton, NodeJSLinkButton } from '@/components/LinkButton'; | ||
|
|
||
| A disk-based graph index designed for **billion-scale** vector search — keeping compressed vectors in memory and full-precision vectors on disk, enabling high-recall approximate nearest neighbor search with a **dramatically smaller memory footprint**. | ||
|
|
||
| ## How It Works | ||
|
|
||
| DiskANN builds a [Vamana graph](https://proceedings.neurips.cc/paper_files/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf) over the full dataset and stores it on disk along with the original vectors. At search time, only compressed PQ (Product Quantization) codes live in memory, while the graph and full-precision vectors are read from disk on demand. | ||
|
|
||
| - **Vamana graph for navigation** 🪜 | ||
| - A single-layer graph where each node is connected to up to `max_degree` neighbors. | ||
| - The graph is built using a greedy search-and-prune strategy with an **alpha parameter** that encourages long-range edges, providing fast convergence to the query's neighborhood. | ||
| - A **medoid** (the point closest to the dataset's centroid) serves as the fixed entry point for every search. | ||
| - **Product Quantization (PQ) for distance estimation** 🔍 | ||
| - The vector space is split into `pq_chunk_num` sub-spaces, and each sub-vector is quantized into a 256-centroid codebook (8-bit PQ codes). | ||
| - At query time, a **PQ distance lookup table** is precomputed for the query, allowing approximate distances to all candidates to be computed via fast table lookups instead of full-precision arithmetic. | ||
| - **Cached beam search** 🔎 | ||
| - Search starts from the medoid and explores the graph using a **beam search** strategy — multiple frontier nodes are expanded concurrently via batched disk I/O. | ||
| - Frequently accessed nodes near the entry point are cached in memory (**BFS-level caching**), reducing disk reads for hot regions of the graph. | ||
| - For each visited node, approximate distances are computed using PQ codes, and the full-precision vector is read from disk to compute the exact distance for the top-k result candidates. | ||
|
|
||
| ## When to Use a DiskANN Index? | ||
|
|
||
| - ✅ Billion-scale datasets that **cannot fit entirely in memory** | ||
| - ✅ Cost-sensitive deployments where minimizing RAM is critical | ||
| - ✅ Batch workloads and offline analytics that can tolerate slightly higher latency per query compared to in-memory indexes | ||
|
|
||
| <Callout className="text-base" type="idea"> | ||
| **Best Practice**: Use DiskANN when your dataset far exceeds available RAM. It provides strong recall with memory consumption proportional only to PQ codes, not full vectors. For datasets that fit in memory, prefer [HNSW](../hnsw-index/) or [HNSW-RaBitQ](../hnsw-rabitq-index/) for lower latency. | ||
| </Callout> | ||
|
|
||
| ## Advantages | ||
|
|
||
| 1. ✨ **Extremely low memory footprint** — Only PQ-compressed codes (1 byte per chunk per vector) reside in memory, making billion-scale search feasible on commodity hardware | ||
| 1. ✨ **High recall** — The Vamana graph preserves connectivity and diversity through its alpha-based pruning, and exact distances are recomputed for final candidates | ||
| 1. ✨ **Scalable graph construction** — Built with a simple greedy insert-and-prune algorithm that can be parallelized across threads | ||
|
|
||
| ## Trade-offs | ||
|
|
||
| 1. ⚠️ **Higher query latency** — Each search requires disk I/O for graph traversal, making it slower than pure in-memory indexes like HNSW | ||
| 1. ⚠️ **Build-time PQ training** — Requires a KMeans-based PQ training step before index construction, adding to the total build time | ||
| 1. ⚠️ **Not suited for real-time workloads** — Disk access latency means DiskANN is better for latency-tolerant use cases or scenarios with relatively low QPS requirements | ||
|
|
||
| ## Key Parameters | ||
|
|
||
| <Callout className="text-base" type="idea"> | ||
| **Tuning Tip**: | ||
| Start with defaults. Adjust `list_size` at query time first for recall/latency trade-offs. Only increase `max_degree` if you need better recall — but expect higher disk usage and longer build times. Reduce `pq_chunk_num` if you need to cut memory further and can tolerate lower recall. | ||
| </Callout> | ||
|
|
||
| ### Index-Time Parameters | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 缺少code example |
||
| <div className="flex flex-row flex-wrap gap-3 items-center"> | ||
| <PythonLinkButton url="/api-reference/python/params/#zvec.model.param.DiskAnnIndexParam" label="Python API Reference" /> | ||
| </div> | ||
|
|
||
| | Parameter | Description | Tuning Guidance | | ||
| | --------- | ----------- | --------------- | | ||
| | `metric_type` | **Similarity metric** used to compare vectors | Choose based on how your embeddings were trained | | ||
| | `max_degree` | **Max neighbors per node** — The maximum number of edges per node in the Vamana graph | • Higher `max_degree` → <br /> ✨ better recall and graph connectivity <br /> ⚠️ more disk usage and longer build time | | ||
| | `list_size` | **Build-time candidate list size** — Number of candidates considered during graph construction when inserting a new vector | • Higher `list_size` → <br /> ✨ better graph quality and higher recall <br /> ⚠️ longer index build time | | ||
| | `pq_chunk_num` | **Number of PQ sub-spaces** — Controls how the vector dimensions are partitioned for Product Quantization | • More chunks → <br /> ✨ finer-grained distance approximation and better recall <br /> ⚠️ more memory for PQ codes (1 byte per chunk per vector) | | ||
|
|
||
| ### Query-Time Parameters | ||
|
|
||
| <div className="flex flex-row flex-wrap gap-3 items-center"> | ||
| <PythonLinkButton url="/api-reference/python/params/#zvec.model.param.DiskAnnQueryParam" label="Python API Reference" /> | ||
| </div> | ||
|
|
||
| | Parameter | Description | Tuning Guidance | | ||
| | --------- | ----------- | --------------- | | ||
| | `list_size` | **Query-time candidate list size** — Determines how many candidates are maintained during beam search graph traversal | • Higher `list_size` → <br /> ✨ higher recall <br /> ⚠️ more disk I/O and higher query latency | | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,6 +4,7 @@ | |
| "flat-index", | ||
| "hnsw-index", | ||
| "hnsw-rabitq-index", | ||
| "diskann-index", | ||
| "ivf-index", | ||
| "quantization" | ||
| ], | ||
|
|
||
77 changes: 77 additions & 0 deletions
77
content/docs/zh/db/concepts/vector-index/diskann-index.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| --- | ||
| title: DiskANN 索引 | ||
| description: 基于磁盘的近似最近邻 | ||
| --- | ||
|
|
||
|
|
||
| import { CodeExampleLinkButton, PythonLinkButton, NodeJSLinkButton } from '@/components/LinkButton'; | ||
|
|
||
| 一种基于磁盘的图索引,专为**十亿级**向量搜索设计 — 将压缩向量保存在内存中,全精度向量存储在磁盘上,实现高 Recall 的近似最近邻搜索,同时**大幅降低内存占用**。 | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 同时应该指出 diskann 的劣势,qps 很低 |
||
|
|
||
| ## 工作原理 | ||
|
|
||
| DiskANN 在完整数据集上构建 [Vamana 图](https://proceedings.neurips.cc/paper_files/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf),并将其与原始向量一起存储在磁盘上。搜索时,仅 PQ(乘积量化)压缩编码驻留在内存中,图和全精度向量按需从磁盘读取。 | ||
|
|
||
| - **Vamana 图用于导航** 🪜 | ||
| - 单层图结构,每个节点最多连接 `max_degree` 个邻居。 | ||
| - 图的构建采用贪心搜索-剪枝策略,通过 **alpha 参数**鼓励长距离边,实现快速收敛到查询点的邻域。 | ||
| - **中心点**(medoid,距数据集质心最近的点)作为每次搜索的固定入口。 | ||
| - **乘积量化(PQ)用于距离估计** 🔍 | ||
| - 向量空间被划分为 `pq_chunk_num` 个子空间,每个子向量被量化为 256 个质心的码本(8-bit PQ 编码)。 | ||
| - 查询时,预先计算查询向量的 **PQ 距离查找表**,通过快速表查找而非全精度运算来计算候选点的近似距离。 | ||
| - **带缓存的束搜索** 🔎 | ||
| - 搜索从中心点开始,使用**束搜索**(beam search)策略探索图 — 多个前沿节点通过批量磁盘 I/O 并发扩展。 | ||
| - 入口点附近的高频访问节点被缓存在内存中(**BFS 层级缓存**),减少热点区域的磁盘读取。 | ||
| - 对于每个访问的节点,使用 PQ 编码计算近似距离,并从磁盘读取全精度向量为 top-k 候选结果计算精确距离。 | ||
|
|
||
| ## 何时使用 DiskANN 索引? | ||
|
|
||
| - ✅ **无法完全装入内存**的十亿级数据集 | ||
| - ✅ 对成本敏感、需要最小化内存消耗的部署场景 | ||
| - ✅ 可以容忍相比内存索引稍高延迟的批处理工作负载和离线分析 | ||
|
|
||
| <Callout className="text-base" type="idea"> | ||
| **最佳实践**:当数据集远超可用内存时使用 DiskANN。它以仅与 PQ 编码成比例的内存消耗提供出色的 Recall。如果数据集可以装入内存,建议使用 [HNSW](../hnsw-index/) 或 [HNSW-RaBitQ](../hnsw-rabitq-index/) 以获得更低延迟。 | ||
| </Callout> | ||
|
|
||
| ## 优势 | ||
|
|
||
| 1. ✨ **极低的内存占用** — 仅 PQ 压缩编码(每向量每 chunk 1 字节)驻留内存,使十亿级搜索在普通硬件上成为可能 | ||
| 1. ✨ **高 Recall** — Vamana 图通过基于 alpha 的剪枝保持连通性和多样性,最终候选结果重新计算精确距离 | ||
| 1. ✨ **可扩展的图构建** — 采用简单的贪心插入-剪枝算法,可跨线程并行化 | ||
|
|
||
| ## 权衡 | ||
|
|
||
| 1. ⚠️ **较高的查询延迟** — 每次搜索需要磁盘 I/O 进行图遍历,比纯内存索引(如 HNSW)慢 | ||
| 1. ⚠️ **构建时 PQ 训练开销** — 索引构建前需要基于 KMeans 的 PQ 训练步骤,增加总构建时间 | ||
| 1. ⚠️ **不适合实时工作负载** — 磁盘访问延迟意味着 DiskANN 更适合面向可容忍延迟或QPS要求相对低的场景 | ||
|
|
||
| ## 关键参数 | ||
|
|
||
| <Callout className="text-base" type="idea"> | ||
| **调参建议**: | ||
| 从默认值开始。先调整查询时的 `list_size` 来权衡 Recall 和延迟。仅在需要更好 Recall 时增加 `max_degree` — 但预期会增加磁盘占用和构建时间。如果需要进一步降低内存且可接受稍低 Recall,可减少 `pq_chunk_num`。 | ||
| </Callout> | ||
|
|
||
| ### 索引构建参数 | ||
|
|
||
| <div className="flex flex-row flex-wrap gap-3 items-center"> | ||
| <PythonLinkButton url="/api-reference/python/params/#zvec.model.param.DiskAnnIndexParam" label="Python API Reference" /> | ||
| </div> | ||
|
|
||
| | 参数 | 描述 | 调参指南 | | ||
| | --------- | ----------- | --------------- | | ||
| | `metric_type` | 用于比较向量的**相似度度量** | 根据 Embedding 模型的训练方式选择 | | ||
| | `max_degree` | **每个节点的最大邻居数** — Vamana 图中每个节点的最大边数 | • 更大的 `max_degree` → <br /> ✨ 更好的 Recall 和图连通性 <br /> ⚠️ 更多磁盘占用和更长的构建时间 | | ||
| | `list_size` | **构建时候选列表大小** — 插入新向量时图构建过程中考虑的候选数量 | • 更大的 `list_size` → <br /> ✨ 更好的图质量和更高的 Recall <br /> ⚠️ 更长的索引构建时间 | | ||
| | `pq_chunk_num` | **PQ 子空间数量** — 控制向量维度如何划分以进行乘积量化 | • 更多 chunk → <br /> ✨ 更精细的距离近似和更好的 Recall <br /> ⚠️ PQ 编码占用更多内存(每向量每 chunk 1 字节) | | ||
|
|
||
| ### 索引查询参数 | ||
|
|
||
| <div className="flex flex-row flex-wrap gap-3 items-center"> | ||
| <PythonLinkButton url="/api-reference/python/params/#zvec.model.param.DiskAnnQueryParam" label="Python API Reference" /> | ||
| </div> | ||
|
|
||
| | 参数 | 描述 | 调参指南 | | ||
| | --------- | ----------- | --------------- | | ||
| | `list_size` | **查询时候选列表大小** — 束搜索图遍历时维护的候选数量 | • 更大的 `list_size` → <br /> ✨ 更高的 Recall <br /> ⚠️ 更多磁盘 I/O 和更高的查询延迟 | | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,6 +4,7 @@ | |
| "flat-index", | ||
| "hnsw-index", | ||
| "hnsw-rabitq-index", | ||
| "diskann-index", | ||
| "ivf-index", | ||
| "quantization" | ||
| ], | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
要注明diskann的引用来源