Skip to content

Using lvec with stringdist  #16

@loukesio

Description

@loukesio

Dear djvanderlaan,

Congratulations for the lvec. Just today I learned about it and I want to ask you a question.
How can I combine efficiently lvec with stringdist?

I have seen in another comment this cool function

library(lvec)
library(stringdist)

a <- sample(c("jan", "pier", "tjorres", "korneel"), 1E3, replace = TRUE)
b <- sample(c("jan", "pier", "joris", "corneel"), 1E2, replace = TRUE)

chunks <- lvec::chunk(a, chunk_size = 1E1)

dist <- lapply(chunks, function(chunk, a, b, threshold, ...) {
  i <- seq(chunk[1], chunk[2])
  j <- seq_along(b)
  res <- expand.grid(i=i, j=j)
  res$dist <- stringdist(a[res$i], b[res$j])
  res <- res[res$dist <= threshold, ]
  res
}, a=a, b=b, threshold = 2)

dist <- do.call(rbind, dist)

This is pretty neat @djvanderlaan. I want to ask you how your function can work if I have one vector e.g.,

library(tidyverse)
library(stringdist)
#> 
#> Attaching package: 'stringdist'
#> The following object is masked from 'package:tidyr':
#> 
#>     extract

vec <- c("apple","aple","banan","bananan")
stringdistmatrix(vec, useNames = "strings")
#>         apple aple banan
#> aple        1           
#> banan       5    4      
#> bananan     6    6     2

Created on 2022-03-01 by the reprex package (v2.0.1)
and I want to compare pairwise all the elements of the vector.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions