rapidsai · rapids-bot · Mar 31, 2026 · Mar 31, 2026
@@ -7,7 +7,7 @@ cuVS API Basics
 Memory management
 -----------------
 
-Centralized memory management allows flexible configuration of allocation strategies, such as sharing the same CUDA memory pool across library boundaries. cuVS uses the [RMM](https://github.com/rapidsai/rmm) library, which eases the burden of configuring different allocation strategies globally across GPU-accelerated libraries.
+Centralized memory management allows flexible configuration of allocation strategies, such as sharing the same CUDA memory pool across library boundaries. cuVS uses the `RMM <https://github.com/rapidsai/rmm>`_ library, which eases the burden of configuring different allocation strategies globally across GPU-accelerated libraries.
 
 RMM currently has APIs for C++ and Python.
 

@@ -4,7 +4,7 @@ Interoperability
 DLPack (C)
 ^^^^^^^^^^
 
-Approximate nearest neighbor (ANN) indexes provide an interface to build and search an index via a C API. [DLPack v0.8](https://github.com/dmlc/dlpack/blob/main/README.md), a tensor interface framework, is used as the standard to interact with our C API.
+Approximate nearest neighbor (ANN) indexes provide an interface to build and search an index via a C API. `DLPack v0.8 <https://github.com/dmlc/dlpack/blob/main/README.md>`_, a tensor interface framework, is used as the standard to interact with our C API.
 
 Representing a tensor with DLPack is simple, as it is a POD struct that stores information about the tensor at runtime. At the moment, `DLManagedTensor` from DLPack v0.8 is compatible with out C API however we will soon upgrade to `DLManagedTensorVersioned` from DLPack v1.0 as it will help us maintain ABI and API compatibility.
 

@@ -54,7 +54,7 @@ Installing the benchmarks
 There are two main ways pre-compiled benchmarks are distributed:
 
 - `Conda`_ For users not using containers but want an easy to install and use Python package. Pip wheels are planned to be added as an alternative for users that cannot use conda and prefer to not use containers.
-- `Docker`_ Only needs docker and [NVIDIA docker](https://github.com/NVIDIA/nvidia-docker) to use. Provides a single docker run command for basic dataset benchmarking, as well as all the functionality of the conda solution inside the containers.
+- `Docker`_ Only needs docker and `NVIDIA docker <https://github.com/NVIDIA/nvidia-docker>`_ to use. Provides a single docker run command for basic dataset benchmarking, as well as all the functionality of the conda solution inside the containers.
 
 Conda
 -----
@@ -297,7 +297,7 @@ All of the `cuvs-bench` images contain the Conda packages, so they can be used d
         -v $DATA_FOLDER:/data/benchmarks                \
         rapidsai/cuvs-bench:26.04-cuda12.9-py3.13
 
-This will drop you into a command line in the container, with the `cuvs-bench` python package ready to use, as described in the [Running the benchmarks](#running-the-benchmarks) section above:
+This will drop you into a command line in the container, with the `cuvs-bench` python package ready to use, as described in the `Running the benchmarks`_ section above:
 
 .. code-block:: bash
 

@@ -533,7 +533,7 @@ IVF-pq is an inverted-file index, which partitions the vectors into a series of
    - Y
    - Positive integer. Power of 2 [8-64]
    -
-   - Ratio of numbeer of chunks or subquantizers for each vector. Computed by `dims` / `M_ratio`
+   - Ratio of number of chunks or subquantizers for each vector. Computed by `dims` / `M_ratio`
 
  * - `usePrecomputed`
    - `build`

@@ -12,7 +12,7 @@ Please start by reading the [Contributor Guide](contributing.md).
 
 Developing features and fixing bugs for the RAFT library itself is straightforward and only requires building and installing the relevant RAFT artifacts.
 
-The process for working on a CUDA/C++ feature which might span RAFT and one or more consuming libraries can vary slightly depending on whether the consuming project relies on a source build (as outlined in the [BUILD](BUILD.md#install_header_only_cpp) docs). In such a case, the option `CPM_raft_SOURCE=/path/to/raft/source` can be passed to the cmake of the consuming project in order to build the local RAFT from source. The PR with relevant changes to the consuming project can also pin the RAFT version temporarily by explicitly changing the `FORK` and `PINNED_TAG` arguments to the RAFT branch containing their changes when invoking `find_and_configure_raft`.  The pin should be reverted after the changed is merged to the RAFT project and before it is merged to the dependent project(s) downstream.
+The process for working on a CUDA/C++ feature which might span RAFT and one or more consuming libraries can vary slightly depending on whether the consuming project relies on a source build (as outlined in the [BUILD](BUILD.md#install_header_only_cpp) docs). In such a case, the option `CPM_raft_SOURCE=/path/to/raft/source` can be passed to the cmake of the consuming project in order to build the local RAFT from source. The PR with relevant changes to the consuming project can also pin the RAFT version temporarily by explicitly changing the `FORK` and `PINNED_TAG` arguments to the RAFT branch containing their changes when invoking `find_and_configure_raft`.  The pin should be reverted after the change is merged to the RAFT project and before it is merged to the dependent project(s) downstream.
 
 If building a feature which spans projects and not using the source build in cmake, the RAFT changes (both C++ and Python) will need to be installed into the environment of the consuming project before they can be used. The ideal integration of RAFT into consuming projects will enable both the source build in the consuming project only for this case but also rely on a more stable packaging (such as conda packaging) otherwise.
 
@@ -390,7 +390,7 @@ int main(int argc, char * argv[])
 ```
 
 A RAFT developer can assume the following:
-* A instance of `raft::comms::comms_t` was correctly initialized.
+* An instance of `raft::comms::comms_t` was correctly initialized.
 * All processes that are part of `raft::comms::comms_t` call into the RAFT algorithm cooperatively.
 
 The initialized instance of `raft::comms::comms_t` can be accessed from the `raft::resources` instance:

@@ -5,7 +5,7 @@ Filtering vector indexes
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
 cuVS supports different type of filtering depending on the vector index being used. The main method used in all of the vector indexes
-is pre-filtering, which is a technique that will into account the filtering of the vectors before computing it's closest neighbors, saving
+is pre-filtering, which is a technique that will take into account the filtering of the vectors before computing its closest neighbors, saving
 some computation from calculating distances.
 
 Bitset

@@ -43,9 +43,9 @@ Getting Started
 New to vector search?
 =====================
 
-If you are unfamiliar with the basics of vector search or how vector search differs from vector databases, then :doc:`this primer on vector search guide <choosing_and_configuring_indexes>` should provide some good insight. Another good resource for the uninitiated is our :doc:`vector databases vs vector search <vector_databases_vs_vector_search>` guide. As outlined in the primer, vector search as used in vector databases is often closer to machine learning than to traditional databases. This means that while traditional databases can often be slow without any performance tuning, they will usually still yield the correct results. Unfortunately, vector search indexes, like other machine learning models, can yield garbage results of not tuned correctly.
+If you are unfamiliar with the basics of vector search or how vector search differs from vector databases, then :doc:`this primer on vector search guide <choosing_and_configuring_indexes>` should provide some good insight. Another good resource for the uninitiated is our :doc:`vector databases vs vector search <vector_databases_vs_vector_search>` guide. As outlined in the primer, vector search as used in vector databases is often closer to machine learning than to traditional databases. This means that while traditional databases can often be slow without any performance tuning, they will usually still yield the correct results. Unfortunately, vector search indexes, like other machine learning models, can yield garbage results if not tuned correctly.
 
-Fortunately, this opens up the whole world of hyperparamer optimization to improve vector search performance and quality. Please see our :doc:`index tuning guide <tuning_guide>` for more information.
+Fortunately, this opens up the whole world of hyperparameter optimization to improve vector search performance and quality. Please see our :doc:`index tuning guide <tuning_guide>` for more information.
 
 When comparing the performance of vector search indexes, it is important that considerations are made with respect to three main dimensions:
 
@@ -58,7 +58,7 @@ Please see the :doc:`primer on comparing vector search index performance <compar
 Supported indexes
 =================
 
-cuVS supports many of the standard index types with the list continuing to grow and stay current with the state-of-the-art. Please refer to our :doc:`vector search index guide <neighbors/neighbors>` for to learn more about each individual index type, when they can be useful on the GPU, the tuning knobs they offer to trade off performance and quality.
+cuVS supports many of the standard index types with the list continuing to grow and stay current with the state-of-the-art. Please refer to our :doc:`vector search index guide <neighbors/neighbors>` to learn more about each individual index type, when they can be useful on the GPU, the tuning knobs they offer to trade off performance and quality.
 
 The primary goal of cuVS is to enable speed, scale, and flexibility (in that order)- and one of the important value propositions is to enhance existing software deployments with extensible GPU capabilities to improve pain points while not interrupting parts of the system that work well today with CPU.
 

@@ -26,7 +26,7 @@ Filtering considerations
 
 CAGRA supports filtered search and has improved multi-CTA algorithm in branch-25.02 to provide reasonable recall and performance for filtering rate as high as 90% or more.
 
-To obtain an appropriate recall in filtered search, it is necessary to set search parameters according to the filtering rate, but since it is difficult for users to to this, CAGRA automatically adjusts `itopk_size` internally according to the filtering rate on a heuristic basis. If you want to disable this automatic adjustment, set `filtering_rate`, one of the search parameters, to `0.0`, and `itopk_size` will not be adjusted automatically.
+To obtain an appropriate recall in filtered search, it is necessary to set search parameters according to the filtering rate, but since it is difficult for users to do this, CAGRA automatically adjusts `itopk_size` internally according to the filtering rate on a heuristic basis. If you want to disable this automatic adjustment, set `filtering_rate`, one of the search parameters, to `0.0`, and `itopk_size` will not be adjusted automatically.
 
 Configuration parameters
 ------------------------
@@ -127,7 +127,7 @@ Optimize: formula for peak memory usage (device): :math:`n\_vectors * (4 + (size
 Build with out-of-core IVF-PQ peak memory usage:
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Out-of-core CAGA build consists of IVF-PQ build, IVF-PQ search, CAGRA optimization. Note that these steps are performed sequentially, so they are not additive.
+Out-of-core CAGRA build consists of IVF-PQ build, IVF-PQ search, CAGRA optimization. Note that these steps are performed sequentially, so they are not additive.
 
 IVF-PQ Build:
 

@@ -23,14 +23,14 @@ But how would this work when we have an index that's massively large- like 1TB?
 
 One benefit to locally indexed vector databases is that they often scale by breaking the larger set of vectors down into a smaller set by uniformly random subsampling and training smaller vector search index models on the sub-samples. Most often, the same set of tuning parameters are applied to all of the smaller sub-index models, rather than trying to set them individually for each one. During search, the query vectors are often sent to all of the sub-indexes and the resulting neighbors list reduced down to `k` based on the closest distances (or similarities).
 
-Because many databases use this sub-sampling trick, it's possible to perform an automated parameter tuning on the larger index just by randomly samplnig some number of vectors from it, splitting them into disjoint train/test/eval datasets, computing ground truth with brute-force, and then performing a hyper-parameter optimization on it. This procedure can also be repeated multiple times to simulate a monte-carlo cross validation.
+Because many databases use this sub-sampling trick, it's possible to perform an automated parameter tuning on the larger index just by randomly sampling some number of vectors from it, splitting them into disjoint train/test/eval datasets, computing ground truth with brute-force, and then performing a hyper-parameter optimization on it. This procedure can also be repeated multiple times to simulate a monte-carlo cross validation.
 
 GPUs are naturally great at performing massively parallel tasks, especially when they are largely independent tasks, such as training and evaluating models with different hyper-parameter settings in parallel. Hyper-parameter optimization also lends itself well to distributed processing, such as multi-node multi-GPU operation.
 
 Steps to achieve automated tuning
 =================================
 
-More formally, an automated parameter tuning workflow with monte-carlo cross-validaton looks likes something like this:
+More formally, an automated parameter tuning workflow with monte-carlo cross-validation looks something like this:
 
 #. Ingest a large dataset into the vector database of your choice
 
@@ -42,7 +42,7 @@ More formally, an automated parameter tuning workflow with monte-carlo cross-val
 
 #. Use the test set to compute ground truth on the vectors from prior step against all vectors in the training set.
 
-#. Start the HPO tuning process for the training set, using the test vectors for the query set. It's important to make sure your HPO is multi-objective and optimizes for: a) low build time, b) high throughput or low latency sarch (depending on needs), and c) acceptable recall.
+#. Start the HPO tuning process for the training set, using the test vectors for the query set. It's important to make sure your HPO is multi-objective and optimizes for: a) low build time, b) high throughput or low latency search (depending on needs), and c) acceptable recall.
 
 #. Use the evaluation dataset to test that the optimal hyper-parameters generalize to unseen points that were not used in the optimization process.