Range tree

In computer science, a range tree is an ordered tree data structure to hold a list of points. It allows all points within a given range to be reported efficiently, and is typically used in two or higher dimensions. Range trees were introduced by Jon Louis Bentley in 1979. Similar data structures were discovered independently by Lueker, Lee and Wong, and Willard. The range tree is an alternative to the k-d tree. Compared to k-d trees, range trees offer faster query times of (in Big O notation) $$O(\log^dn+k)$$ but worse storage of $$O(n\log^{d-1} n)$$, where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query.

Bernard Chazelle improved this to query time $$O(\log^{d-1} n + k)$$ and space complexity $$O\left(n\left(\frac{\log n}{\log\log n}\right)^{d-1}\right)$$.

Data structure


A range tree on a set of 1-dimensional points is a balanced binary search tree on those points. The points stored in the tree are stored in the leaves of the tree; each internal node stores the largest value of its left subtree. A range tree on a set of points in d-dimensions is a recursively defined multi-level binary search tree. Each level of the data structure is a binary search tree on one of the d-dimensions. The first level is a binary search tree on the first of the d-coordinates. Each vertex v of this tree contains an associated structure that is a (d−1)-dimensional range tree on the last (d−1)-coordinates of the points stored in the subtree of v.

Construction
A 1-dimensional range tree on a set of n points is a binary search tree, which can be constructed in $$O(n \log n)$$ time. Range trees in higher dimensions are constructed recursively by constructing a balanced binary search tree on the first coordinate of the points, and then, for each vertex v in this tree, constructing a (d−1)-dimensional range tree on the points contained in the subtree of v. Constructing a range tree this way would require $$O(n \log ^d n)$$ time.

This construction time can be improved for 2-dimensional range trees to $$O(n \log n)$$. Let S be a set of n 2-dimensional points. If S contains only one point, return a leaf containing that point. Otherwise, construct the associated structure of S, a 1-dimensional range tree on the y-coordinates of the points in S. Let xm be the median x-coordinate of the points. Let SL be the set of points with x-coordinate less than or equal to xm and let SR be the set of points with x-coordinate greater than xm. Recursively construct vL, a 2-dimensional range tree on SL, and vR, a 2-dimensional range tree on SR. Create a vertex v with left-child vL and right-child vR. If we sort the points by their y-coordinates at the start of the algorithm, and maintain this ordering when splitting the points by their x-coordinate, we can construct the associated structures of each subtree in linear time. This reduces the time to construct a 2-dimensional range tree to $$O(n \log n)$$, and also reduces the time to construct a d-dimensional range tree to $$O(n \log^{d - 1} n)$$.

Range queries


A range query on a range tree reports the set of points that lie inside a given interval. To report the points that lie in the interval [x1, x2], we start by searching for x1 and x2. At some vertex in the tree, the search paths to x1 and x2 will diverge. Let vsplit be the last vertex that these two search paths have in common. For every vertex v in the search path from vsplit to x1, if the value stored at v is greater than x1, report every point in the right-subtree of v. If v is a leaf, report the value stored at v if it is inside the query interval. Similarly, reporting all of the points stored in the left-subtrees of the vertices with values less than x2 along the search path from vsplit to x2, and report the leaf of this path if it lies within the query interval.

Since the range tree is a balanced binary tree, the search paths to x1 and x2 have length $$O(\log n)$$. Reporting all of the points stored in the subtree of a vertex can be done in linear time using any tree traversal algorithm. It follows that the time to perform a range query is $$O(\log n + k)$$, where k is the number of points in the query interval.

Range queries in d-dimensions are similar. Instead of reporting all of the points stored in the subtrees of the search paths, perform a (d−1)-dimensional range query on the associated structure of each subtree. Eventually, a 1-dimensional range query will be performed and the correct points will be reported. Since a d-dimensional query consists of $$O(\log n)$$ (d−1)-dimensional range queries, it follows that the time required to perform a d-dimensional range query is $$O(\log^{d} n + k)$$, where k is the number of points in the query interval. This can be reduced to $$O(\log^{d - 1} n + k)$$ using a variant of fractional cascading.