vecstore

The goal of vecstore is to provide a high performance data structure for vector similarity search. It introduces VectorStore, allowing for the insertion of data paired with corresponding vectors enabling efficient retrieval of elements based on similarity to a query vector.

Ideal for machine learning applications that require fast look ups over embeddings.

Additionally, vecstore exposes two vector index implementations:

HNSWIndex: Utilizes the Hierarchical Navigable Small World algorithm, implemented through the hnswlib C++ library, for highly efficient nearest neighbor searches.
KNNIndex: Utilizes naive exhaustive searches to retrieve the exact K nearest neighbors to a query.

Installation

You can install the development version of vecstore like so:

devtools::install_github("arvind-balaji/vecstore")

Example

Basic example of utilizing VectorStore with dim = 3.

library(vecstore)

dim <- 3
max_size <- 1024
index <- HNSWIndex$new(dim = dim, size = max_size)
# KNNIndex does not require a max size.
# index <- KNNIndex$new(dim = dim)
store <- VectorStore$new(index)

store$add(c(10, 10, 10), list(label = "A"))
store$add(c(38, 18, 40), list(label = "B"))
store$add(c(10, 20, 30), list(label = "C"))
store$add(c(90, 70, 15), list(label = "D"))

query <- c(91, 68, 10)
k <- 3
store$find(query, k)
#>   label Index
#> 1     D     4
#> 2     B     2
#> 3     C     3

Utilizing vector indexes directly to retrieve indices of similar items. We expect the sample_id to be the top result.

library(vecstore)

data <- as.matrix(iris[, -5])

dim = ncol(data)
n = nrow(data)

knn <- KNNIndex$new(dim)
hnsw <- HNSWIndex$new(dim, n)

for (i in 1:nrow(data)) {
  knn$add(data[i,])
  hnsw$add(data[i,])
}

sample_id <- 8
k <- 3
knn$find(data[sample_id,], k)
#> [1]  8 40 50
hnsw$find(data[sample_id,], k)
#> [1]  8 40 50

sample_id <- 9
knn$find(data[sample_id,], k)
#> [1]  9 39  4
hnsw$find(data[sample_id,], k)
#> [1]  9 39  4

Tunable Paramaters

The HNSWIndex constructor exposes two tuning parameters from hnswlib. Adjust based on utilization and data dimensionality to optimize performance.

M: This parameter determines the maximum number of connections a point can have in the graph. A higher M value leads to a more connected graph, which generally results in better recall during search but may increase index build time and memory usage.
ef_construction: This parameter defines the size of the dynamic list used in building the graph during index construction. A larger ef_construction value typically results in a more accurate index but requires more time and memory during the index construction phase.

Benchmarks

The HNSW based index offer dramatic speed up over the brute force approach.

library(vecstore)
library(microbenchmark)

data <- as.matrix(iris[, -5])
dim = ncol(data)
n = nrow(data)

knn <- KNNIndex$new(dim)
hnsw <- HNSWIndex$new(dim, n)

for (i in 1:nrow(data)) {
  knn$add(data[i,])
  hnsw$add(data[i,])
}

microbenchmark(knn$find(data[10,], 5), hnsw$find(data[10,], 5), times=100L)
#> Unit: microseconds
#>                      expr     min       lq      mean  median       uq      max
#>   knn$find(data[10, ], 5) 180.359 190.0965 216.10526 200.121 212.4620 1359.396
#>  hnsw$find(data[10, ], 5)   3.526   4.0180   5.12705   4.551   4.8585   18.696
#>  neval
#>    100
#>    100

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
R		R
inst		inst
man		man
src		src
tests		tests
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.covrignore		.covrignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
codecov.yml		codecov.yml
hnsw.bin		hnsw.bin
vecstore.Rproj		vecstore.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vecstore

Installation

Example

Tunable Paramaters

Benchmarks

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vecstore

Installation

Example

Tunable Paramaters

Benchmarks

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages