KmedioidFLANN

A K-medoid priority-tree approach for the Fast Library for Approximate Nearest Neighbors (FLANN)

Typical implementations of the FLANN algorithm use either:

Kd-Trees: requires a dimensionally additive distance measure
K-means: requires constructing a cluster center of mass in some n-dimensional space

These two implementations fail when utilizing more complex distance measures, or in circumstances where a "center of mass" is difficult to define (such as in the field of graph clustering)

Here we utilize the K-medoids algorithm (https://en.wikipedia.org/wiki/K-medoids) to construct a K-medoid priority-tree akin to the K-means priority-tree version of FLANN.

Features:

Templated to work on arbitrary data objects with user-defined distance functions
Sparse distance matrix storage, minimizing repetitive distance computations
Radius and top M NN search protocols

Algorithm Sketch: Building a K-medoid tree

Take N samples {x}_i^N, a distance function f(x_i,x_j), and branching factor k
Construct tree root as full ensemble of data points
Select k samples randomly from x to serve as medoids, y
Cluster all x points to y, C_m={ x_i : f(x_i, y_m) < f(x_i, y_j) for all j != m}
Find sample with minimum internal distance in C_m, argmin_{C_m,i} Sum_j=1^|C_m| f(C_m,i, C_m,j)
Update medoids to the new values found in Step 4.
Goto 3 until converged/reached iteration limit
For each cluster, C_i, construct new node in tree as a descendent from the root with only the data points in C_i
Apply this scheme 2-8 recursively, leaf termination occurs when there are less than k samples in the cluster.

TODO:

Refactor into individual files for ease of understanding (monolithic header is tough)
Write simple example code
Refactor to remove Armadillo dependencies
Add OpenMP and/or MPI support for parallelized search through the tree
- Both batch and individual NN search parallelization

Dependencies:

Built using the Armadillo C++ Linear Algebra Library (http://arma.sourceforge.net/)

References

[1] M. Muja and D. G. Lowe, "Scalable Nearest Neighbor Algorithms for High Dimensional Data," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 11, pp. 2227-2240, Nov. 1 2014.

[2] M. Muja, “Scalable nearest neighbour methods for high dimensional data,” Doctoral Thesis, University of British Columbia, 2013.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
armadillo		armadillo
FLANN.hpp		FLANN.hpp
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KmedioidFLANN

Features:

Algorithm Sketch: Building a K-medoid tree

TODO:

Dependencies:

References

About

Uh oh!

Releases

Packages

Languages

License

awlong/KmedoidFLANN

Folders and files

Latest commit

History

Repository files navigation

KmedioidFLANN

Features:

Algorithm Sketch: Building a K-medoid tree

TODO:

Dependencies:

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages