Tree#
K-Nearest Neighbor Search#
Spatial tree structures can dramatically speed up nearest neighbor searching compared to brute-force approaches that require distances between all points to be calculated, however, there is an added up-front cost of constructing the tree. If only a single search needs to be performed, the brute-force approach may be the most efficient option, but if performing multiple searches, or other operations that can utilize the tree structure, performance can be enhanced significantly.
Here the performance of multiple tree structures are compared to a brute-force search.
A quadtree (QuadtreeNode), KD tree (KDtreeNode), and SciPy’s implementation of a scipy.spatial.KDTree are compared to a brute force search for 2D point data.
For 3D point data, an octree (OctreeNode), is used in place of the quadtree.
Note that the KD tree can be used for any k-dimensional data.
2D Point Data#
d = 2 # Dimensions
k = 3 # Number of nearest neighbors
points = np.random.rand(n,d)
x = np.random.rand(n)
# quadtree build
quadtree = tree.Points2Quadtree(points)
# quadtree search
qtree_out = quadtree.query_knn(x, k=k)
# kdtree build
kdtree = tree.Points2KDtree(points)
# kdtree search
kdtree_out = kdtree.query_knn(x, k=k)
# scipy kdtree build
sci_kdtree = scipy.spatial.KDTree(points)
# scipy kdtree search
sci_kdtree_out = sci_kdtree.query(x, k=k)
sci_kdtree_out = (np.atleast_1d(sci_kdtree_out[0]), np.atleast_2d(points[sci_kdtree_out[1]]))
# brute-force
d = scipy.spatial.distance.cdist(points, np.atleast_2d(x)).flatten()
indices = np.argpartition(d, k)[:k]
distances = d[indices]
brute_out = (distances, points[indices])
3D Point Data#
d = 3 # Dimensions
k = 3 # Number of nearest neighbors
points = np.random.rand(n,d)
x = np.random.rand(n)
# octree build
octree = tree.Points2Octree(points)
# octree search
otree_out = octree.query_knn(x, k=k)
# kdtree build
kdtree = tree.Points2KDtree(points)
# kdtree search
kdtree_out = kdtree.query_knn(x, k=k)
# scipy kdtree build
sci_kdtree = scipy.spatial.KDTree(points)
# scipy kdtree search
sci_kdtree_out = sci_kdtree.query(x, k=k)
sci_kdtree_out = (np.atleast_1d(sci_kdtree_out[0]), np.atleast_2d(points[sci_kdtree_out[1]]))
# brute-force
d = scipy.spatial.distance.cdist(points, np.atleast_2d(x)).flatten()
indices = np.argpartition(d, k)[:k]
distances = d[indices]
brute_out = (distances, points[indices])