- [ ] Workflow (functions, tests, docs) for using LSH & Jaccard Sim for near duplicate identification and removal - [ ] Update the near_duplicates.Rmd vignette - [ ] Benchmark wtih spam_grams