Such a ν must exist because the query algorithm starts at the root of the tree. Note that

ν cannot be the root of a subtree whose points have been completely reported, because

otherwise x j would also have been reported. Hence, ν must be on the path to x or the path

to x or the path to both. All three cases can be dealt with in a similar manner. Thus, we

248 Data structures

will investigate only the ¬rst. If ν is on the way to x then the search path has to go to the

right at ν, because otherwise x ¤ v(ν) and all points in the right subtree of ν have been

reported, including x j . But if the search path turns to the right then this means that x > v(ν).

Moreover, since ν is the lowest ancestor of μ on the search path, μ must lie in the left subtree

of ν in this situation, which shows that x j ¤ v(ν) < x. This contradicts x j ∈ [x, x ].

Let us analyze the computational complexity. Because our data structure is a balanced

binary search tree it needs O(N ) space and O(N log N ) time to be built. Assume that k

points are contained in [x, x ]. The number of non-leaf nodes in a balanced binary tree

is bounded by the number of leaves, so each report-call in the while-loop can be done in

O(#(reported points)) time. Since altogether we report k points, all report calls can be done

in O(k) time.

The remaining time is spent in nodes on the two search paths. Since we have a balanced

binary tree, each search path has length O(log N ). Hence, Algorithm 10 needs O(k + log N )

time to answer a one-dimensional range query. Note also that the number of subtrees that

are reported is bounded by the number of nodes on the search paths and hence by O(log N ).

Proposition 14.12 Given N points x1 , . . . , x N ∈ R, a balanced binary search tree can be

built in O(N ) space and O(N log N ) time, so that every range query yielding k points can

be answered in O(log N + k) time.

How can we use these ideas in the multivariate setting? We have already encountered a

possible generalization, in form of the kd-tree, that did not lead to the desired result (cf.

Proposition 14.9).

Now we want to trade space for time. If we allow the data structure to use more space than

O(N ) it is possible to achieve better results even in the worst case. The nature of a range tree

needs the additional assumption that all points are pairwise different in every coordinate.

One possibility for matching this assumption is cyclic concatenation. In other words, we

introduce for every coordinate j a new order that allows us to distinguish pairwise-different

points x = y ∈ Rd having the same j-coordinate.

De¬nition 14.13 A d-dimensional range tree is de¬ned recursively. For d = 1, we have a

one-dimensional range tree, which is simply a balanced binary search tree. For d ≥ 2, a d-

dimensional range tree is a balanced binary search tree with respect to the ¬rst coordinate.

The leaves of this tree contain the d-variate points. The points stored in the leaves of

the subtree rooted in node ν are called the canonical subset of ν. We denote this set by

P(ν). Each node contains an additional pointer to a (d ’ 1)-dimensional range tree, the

associated tree Tassoc (ν). This associated range tree Tassoc (ν) is built on the points in P(ν)

that are restricted to the last d ’ 1 coordinates.

Algorithm 11 describes how a d-dimensional range tree can be built. However, the al-

gorithm has to be modi¬ed to get the optimal construction time, because the construc-

tion of a balanced binary search tree from unsorted data costs O(N log N ) time. In the

one-dimensional setting this is optimal, but using it in the multivariate setting leads to a

14.4 Range trees 249

complexity that is too high. Instead, we sort the data points with respect to each coordinate,

which costs additional O(d N ) space and O(d N log N ) time. With presorted points a bal-

anced binary search tree can be constructed bottom-up in linear time. We will use this in

the following analysis.

Algorithm 11 Build range tree

Data points X = {x1 , . . . , x N } ⊆ Rd and current working dimension j.

Input:

Output: The root of a d-dimensional range tree.

if j = d then

Create a balanced binary search tree at the last coordinate.

Store pointers to the points at the leaves.

Return the root.

else

Apply this algorithm to the current set of centers and dimension j + 1. This yields a

pointer νassoc to the associated data structure.

if X contains only one point then

Create a leaf ν, store a pointer to this point at ν, and make νassoc the pointer to the

associated data structure.

else

Split X into two subsets. Pl shall contain the points having jth coordinate less than or

equal to the median of X with respect to this coordinate; Pr shall contain the rest.

Apply this algorithm to Pl for dimension j. This gives a pointer νl .

Apply this algorithm to Pr for dimension j. This gives a pointer νr .

Create a node ν storing the splitting value and having νl as its left child and νr as

its right child. Let νassoc be the pointer to the associated data structure of ν.

Return ν.

Let us discuss the time and space necessary for building a d-dimensional range tree. As

mentioned before, the preprocessing step of sorting takes O(d N ) space and O(d N log N )

time. Let us denote the time and space necessary after the preprocessing step by Td (N )

and Sd (N ), respectively. The construction consists of constructing a binary search tree,

which takes O(N ) time and space. For each node of this ¬rst-level tree we have to build the

associated tree. At any depth of the ¬rst-level tree every point x ∈ X belongs to exactly one

associated tree. Hence, the time and space needed to build the associated trees for all nodes

at a given depth of the ¬rst-level tree is at most O(Td’1 (N )) and O(Sd’1 (N )), respectively.

Since the depth of the ¬rst-level tree is O(log N ), we derive the recurrence relations

Td (N ) = O(N ) + O(Td’1 (N ) log N ),

Sd (N ) = O(N ) + O(Sd’1 (N ) log N ).

250 Data structures

With the initial values T1 (N ) = S1 (N ) = O(N ), this reduces to Td (N ) = Sd (N ) =

O(N (log N )d’1 ), which overrules in magnitude the time and space necessary for presorting

the data.

Proposition 14.14 A d-dimensional range tree for N data points in Rd can be built in

O(N (log N )d’1 ) space and time.

This is signi¬cantly higher than for building kd-trees. Hence, the price we have to pay

for a better worst-case range query algorithm, which we will discuss now, is a higher

requirement on space and time in building the data structure.

Algorithm 12 Range query

A d-dimensional range tree T , a query range B = [x, x ], and the current depth

Input:

j.

Output: All points from T that lie inside B.

Apply Algorithm 9 to the jth coordinate to determine the splitting node ν.

if ν is a leaf then

Report the point stored there if necessary.

else

Set ν to be its left child.

while ν is not a leaf do

if x j ¤ v(ν) then

if j < d then

Apply this algorithm to B, the associated data structure of the right child of

ν, and dimension j + 1.

else

Report all points in the subtree of the right child of ν.

Set ν to be its left child.

else

Set ν to be its right child.

Report the point stored at the leaf ν if necessary.

Similarly, follow the path to x , apply this algorithm to B, the associated structures of

subtrees to the left of the path, and dimension j + 1. Check whether the point stored

at the leaf at which the path ends has to be reported.

The idea of the range query algorithm is quite simple. It ¬rst selects the O(log N ) canonical

subsets that together contain those points of X having a ¬rst coordinate within the interval

of the ¬rst coordinate of the query box. This can be done with the one-dimensional range

query algorithm, if it is modi¬ed in such a way that it can distinguish different points in

14.5 Notes and comments 251

Rd with the same ¬rst coordinate. These subsets are queried further by performing a range

query on their second-level data structure, which is in each case a range tree on the last d ’ 1

coordinates built on the canonical subset. Hence the algorithm uses again the current depth

j, which ensures termination in dimension. It is initially called with j = 1. Furthermore, it

uses Algorithm 9 in an appropriate way. It obviously resembles Algorithm 10, except that

where all points in a subtree are reported, we now call the algorithm again for the associated

data structure as long as we have not reached the ¬nal depth. Algorithm 12 gives the

details.

Finally, let us analyze the computational complexity of this algorithm. To this end let