Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/api_basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ cuVS API Basics
Memory management
-----------------

Centralized memory management allows flexible configuration of allocation strategies, such as sharing the same CUDA memory pool across library boundaries. cuVS uses the [RMM](https://github.com/rapidsai/rmm) library, which eases the burden of configuring different allocation strategies globally across GPU-accelerated libraries.
Centralized memory management allows flexible configuration of allocation strategies, such as sharing the same CUDA memory pool across library boundaries. cuVS uses the `RMM <https://github.com/rapidsai/rmm>`_ library, which eases the burden of configuring different allocation strategies globally across GPU-accelerated libraries.

RMM currently has APIs for C++ and Python.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/api_interoperability.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Interoperability
DLPack (C)
^^^^^^^^^^

Approximate nearest neighbor (ANN) indexes provide an interface to build and search an index via a C API. [DLPack v0.8](https://github.com/dmlc/dlpack/blob/main/README.md), a tensor interface framework, is used as the standard to interact with our C API.
Approximate nearest neighbor (ANN) indexes provide an interface to build and search an index via a C API. `DLPack v0.8 <https://github.com/dmlc/dlpack/blob/main/README.md>`_, a tensor interface framework, is used as the standard to interact with our C API.

Representing a tensor with DLPack is simple, as it is a POD struct that stores information about the tensor at runtime. At the moment, `DLManagedTensor` from DLPack v0.8 is compatible with out C API however we will soon upgrade to `DLManagedTensorVersioned` from DLPack v1.0 as it will help us maintain ABI and API compatibility.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/cuvs_bench/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Installing the benchmarks
There are two main ways pre-compiled benchmarks are distributed:

- `Conda`_ For users not using containers but want an easy to install and use Python package. Pip wheels are planned to be added as an alternative for users that cannot use conda and prefer to not use containers.
- `Docker`_ Only needs docker and [NVIDIA docker](https://github.com/NVIDIA/nvidia-docker) to use. Provides a single docker run command for basic dataset benchmarking, as well as all the functionality of the conda solution inside the containers.
- `Docker`_ Only needs docker and `NVIDIA docker <https://github.com/NVIDIA/nvidia-docker>`_ to use. Provides a single docker run command for basic dataset benchmarking, as well as all the functionality of the conda solution inside the containers.

Conda
-----
Expand Down Expand Up @@ -297,7 +297,7 @@ All of the `cuvs-bench` images contain the Conda packages, so they can be used d
-v $DATA_FOLDER:/data/benchmarks \
rapidsai/cuvs-bench:26.04-cuda12.9-py3.13

This will drop you into a command line in the container, with the `cuvs-bench` python package ready to use, as described in the [Running the benchmarks](#running-the-benchmarks) section above:
This will drop you into a command line in the container, with the `cuvs-bench` python package ready to use, as described in the `Running the benchmarks`_ section above:

.. code-block:: bash

Expand Down
2 changes: 1 addition & 1 deletion docs/source/cuvs_bench/param_tuning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -533,7 +533,7 @@ IVF-pq is an inverted-file index, which partitions the vectors into a series of
- Y
- Positive integer. Power of 2 [8-64]
-
- Ratio of numbeer of chunks or subquantizers for each vector. Computed by `dims` / `M_ratio`
- Ratio of number of chunks or subquantizers for each vector. Computed by `dims` / `M_ratio`

* - `usePrecomputed`
- `build`
Expand Down
4 changes: 2 additions & 2 deletions docs/source/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Please start by reading the [Contributor Guide](contributing.md).

Developing features and fixing bugs for the RAFT library itself is straightforward and only requires building and installing the relevant RAFT artifacts.

The process for working on a CUDA/C++ feature which might span RAFT and one or more consuming libraries can vary slightly depending on whether the consuming project relies on a source build (as outlined in the [BUILD](BUILD.md#install_header_only_cpp) docs). In such a case, the option `CPM_raft_SOURCE=/path/to/raft/source` can be passed to the cmake of the consuming project in order to build the local RAFT from source. The PR with relevant changes to the consuming project can also pin the RAFT version temporarily by explicitly changing the `FORK` and `PINNED_TAG` arguments to the RAFT branch containing their changes when invoking `find_and_configure_raft`. The pin should be reverted after the changed is merged to the RAFT project and before it is merged to the dependent project(s) downstream.
The process for working on a CUDA/C++ feature which might span RAFT and one or more consuming libraries can vary slightly depending on whether the consuming project relies on a source build (as outlined in the [BUILD](BUILD.md#install_header_only_cpp) docs). In such a case, the option `CPM_raft_SOURCE=/path/to/raft/source` can be passed to the cmake of the consuming project in order to build the local RAFT from source. The PR with relevant changes to the consuming project can also pin the RAFT version temporarily by explicitly changing the `FORK` and `PINNED_TAG` arguments to the RAFT branch containing their changes when invoking `find_and_configure_raft`. The pin should be reverted after the change is merged to the RAFT project and before it is merged to the dependent project(s) downstream.

If building a feature which spans projects and not using the source build in cmake, the RAFT changes (both C++ and Python) will need to be installed into the environment of the consuming project before they can be used. The ideal integration of RAFT into consuming projects will enable both the source build in the consuming project only for this case but also rely on a more stable packaging (such as conda packaging) otherwise.

Expand Down Expand Up @@ -390,7 +390,7 @@ int main(int argc, char * argv[])
```

A RAFT developer can assume the following:
* A instance of `raft::comms::comms_t` was correctly initialized.
* An instance of `raft::comms::comms_t` was correctly initialized.
* All processes that are part of `raft::comms::comms_t` call into the RAFT algorithm cooperatively.

The initialized instance of `raft::comms::comms_t` can be accessed from the `raft::resources` instance:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/filtering.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Filtering vector indexes
~~~~~~~~~~~~~~~~~~~~~~~~

cuVS supports different type of filtering depending on the vector index being used. The main method used in all of the vector indexes
is pre-filtering, which is a technique that will into account the filtering of the vectors before computing it's closest neighbors, saving
is pre-filtering, which is a technique that will take into account the filtering of the vectors before computing its closest neighbors, saving
some computation from calculating distances.

Bitset
Expand Down
6 changes: 3 additions & 3 deletions docs/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,9 @@ Getting Started
New to vector search?
=====================

If you are unfamiliar with the basics of vector search or how vector search differs from vector databases, then :doc:`this primer on vector search guide <choosing_and_configuring_indexes>` should provide some good insight. Another good resource for the uninitiated is our :doc:`vector databases vs vector search <vector_databases_vs_vector_search>` guide. As outlined in the primer, vector search as used in vector databases is often closer to machine learning than to traditional databases. This means that while traditional databases can often be slow without any performance tuning, they will usually still yield the correct results. Unfortunately, vector search indexes, like other machine learning models, can yield garbage results of not tuned correctly.
If you are unfamiliar with the basics of vector search or how vector search differs from vector databases, then :doc:`this primer on vector search guide <choosing_and_configuring_indexes>` should provide some good insight. Another good resource for the uninitiated is our :doc:`vector databases vs vector search <vector_databases_vs_vector_search>` guide. As outlined in the primer, vector search as used in vector databases is often closer to machine learning than to traditional databases. This means that while traditional databases can often be slow without any performance tuning, they will usually still yield the correct results. Unfortunately, vector search indexes, like other machine learning models, can yield garbage results if not tuned correctly.

Fortunately, this opens up the whole world of hyperparamer optimization to improve vector search performance and quality. Please see our :doc:`index tuning guide <tuning_guide>` for more information.
Fortunately, this opens up the whole world of hyperparameter optimization to improve vector search performance and quality. Please see our :doc:`index tuning guide <tuning_guide>` for more information.

When comparing the performance of vector search indexes, it is important that considerations are made with respect to three main dimensions:

Expand All @@ -58,7 +58,7 @@ Please see the :doc:`primer on comparing vector search index performance <compar
Supported indexes
=================

cuVS supports many of the standard index types with the list continuing to grow and stay current with the state-of-the-art. Please refer to our :doc:`vector search index guide <neighbors/neighbors>` for to learn more about each individual index type, when they can be useful on the GPU, the tuning knobs they offer to trade off performance and quality.
cuVS supports many of the standard index types with the list continuing to grow and stay current with the state-of-the-art. Please refer to our :doc:`vector search index guide <neighbors/neighbors>` to learn more about each individual index type, when they can be useful on the GPU, the tuning knobs they offer to trade off performance and quality.

The primary goal of cuVS is to enable speed, scale, and flexibility (in that order)- and one of the important value propositions is to enhance existing software deployments with extensible GPU capabilities to improve pain points while not interrupting parts of the system that work well today with CPU.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/neighbors/cagra.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Filtering considerations

CAGRA supports filtered search and has improved multi-CTA algorithm in branch-25.02 to provide reasonable recall and performance for filtering rate as high as 90% or more.

To obtain an appropriate recall in filtered search, it is necessary to set search parameters according to the filtering rate, but since it is difficult for users to to this, CAGRA automatically adjusts `itopk_size` internally according to the filtering rate on a heuristic basis. If you want to disable this automatic adjustment, set `filtering_rate`, one of the search parameters, to `0.0`, and `itopk_size` will not be adjusted automatically.
To obtain an appropriate recall in filtered search, it is necessary to set search parameters according to the filtering rate, but since it is difficult for users to do this, CAGRA automatically adjusts `itopk_size` internally according to the filtering rate on a heuristic basis. If you want to disable this automatic adjustment, set `filtering_rate`, one of the search parameters, to `0.0`, and `itopk_size` will not be adjusted automatically.

Configuration parameters
------------------------
Expand Down Expand Up @@ -127,7 +127,7 @@ Optimize: formula for peak memory usage (device): :math:`n\_vectors * (4 + (size
Build with out-of-core IVF-PQ peak memory usage:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Out-of-core CAGA build consists of IVF-PQ build, IVF-PQ search, CAGRA optimization. Note that these steps are performed sequentially, so they are not additive.
Out-of-core CAGRA build consists of IVF-PQ build, IVF-PQ search, CAGRA optimization. Note that these steps are performed sequentially, so they are not additive.

IVF-PQ Build:

Expand Down
6 changes: 3 additions & 3 deletions docs/source/tuning_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,14 @@ But how would this work when we have an index that's massively large- like 1TB?

One benefit to locally indexed vector databases is that they often scale by breaking the larger set of vectors down into a smaller set by uniformly random subsampling and training smaller vector search index models on the sub-samples. Most often, the same set of tuning parameters are applied to all of the smaller sub-index models, rather than trying to set them individually for each one. During search, the query vectors are often sent to all of the sub-indexes and the resulting neighbors list reduced down to `k` based on the closest distances (or similarities).

Because many databases use this sub-sampling trick, it's possible to perform an automated parameter tuning on the larger index just by randomly samplnig some number of vectors from it, splitting them into disjoint train/test/eval datasets, computing ground truth with brute-force, and then performing a hyper-parameter optimization on it. This procedure can also be repeated multiple times to simulate a monte-carlo cross validation.
Because many databases use this sub-sampling trick, it's possible to perform an automated parameter tuning on the larger index just by randomly sampling some number of vectors from it, splitting them into disjoint train/test/eval datasets, computing ground truth with brute-force, and then performing a hyper-parameter optimization on it. This procedure can also be repeated multiple times to simulate a monte-carlo cross validation.

GPUs are naturally great at performing massively parallel tasks, especially when they are largely independent tasks, such as training and evaluating models with different hyper-parameter settings in parallel. Hyper-parameter optimization also lends itself well to distributed processing, such as multi-node multi-GPU operation.

Steps to achieve automated tuning
=================================

More formally, an automated parameter tuning workflow with monte-carlo cross-validaton looks likes something like this:
More formally, an automated parameter tuning workflow with monte-carlo cross-validation looks something like this:

#. Ingest a large dataset into the vector database of your choice

Expand All @@ -42,7 +42,7 @@ More formally, an automated parameter tuning workflow with monte-carlo cross-val

#. Use the test set to compute ground truth on the vectors from prior step against all vectors in the training set.

#. Start the HPO tuning process for the training set, using the test vectors for the query set. It's important to make sure your HPO is multi-objective and optimizes for: a) low build time, b) high throughput or low latency sarch (depending on needs), and c) acceptable recall.
#. Start the HPO tuning process for the training set, using the test vectors for the query set. It's important to make sure your HPO is multi-objective and optimizes for: a) low build time, b) high throughput or low latency search (depending on needs), and c) acceptable recall.

#. Use the evaluation dataset to test that the optimal hyper-parameters generalize to unseen points that were not used in the optimization process.

Expand Down
Loading