spelling out these changes, however, we want to point out another possible modi¬cation.

For an ef¬cient implementation of the moving least squares method, we need for a given

x all points from X that are in the ball B(x, δ) around x with radius δ. This is clearly a

range query problem. But since the weight function has compact support, additional points

from X not contained in the ball B(x, δ) would do no harm. Thus we could also relax our

request in the following sense, hoping for faster answers.

r Approximate range search Given a region R and > 0, de¬ne the region R = ∪x∈R B(x, ). A

legal answer is an answer that reports all points X © R together with some (not necessarily all)

points from X © (R \ R).

r Approximate nearest neighbor search Given x ∈ and > 0, report a point x j ∈ X whose

distance from x is at most 1 + times that of a true nearest neighbor. Obviously, an approximate

nearest neighbors query can be introduced in a similar fashion.

The algorithms corresponding to these modi¬ed problems in the setting of bd-trees are

presented in Algorithms 7 and 8, where the latter is based upon a priority queue.

The analysis of these algorithms is beyond the scope of this book. We will collect the

necessary results in the next theorem, where midpoint splitting and the centroid shrinking

14.3 bd-Trees 245

Algorithm 7 Range query

Query region R ⊆ Rd , an outer range RO and the root ν of the bd-tree.

Input:

Output: All points that lie in R, no point outside RO .

if B(ν) ⊆ R O then Report all points in B(ν).

if B(ν) © R = … then Return ….

if ν is a leaf then

Return all points assigned to ν that lie in R.

else

Apply this algorithm to both children of ν.

Algorithm 8 Approximate nearest neighbor search

Query point x ∈ Rd , > 0, and the root ν of the bd-tree.

Input:

Output: A point q ∈ X with distance from x that is at most 1 + times the distance of a

nearest neighbor from x.

Initialize the distance dist as in¬nity. Initialize the priority queue by the root node ν.

while the priority queue is not empty do

Take the ¬rst node from the priority queue.

if the distance of its associated cell from x exceeds dist/(1 + )

then

Stop and report the current q.

if ν is a leaf then

Compute the distances from the points stored in this node to x and update dist and

q.

else

Compute the distance from both children of ν and enqueue ¬rst the further and then

the closer one.

rule are employed to build the data structure. For a proof, we refer the interested reader to

the articles of Arya and Mount [3“4] and Arya et al. [5].

Theorem 14.10 For N given points in Rd the bd-tree can be built in O(d N log N ) time

and O(d N ) space. The tree has depth O(log N ).

(1) For > 0 there is a constant cd, ¤ d 1 + 6d/ d such that for every x ∈ Rd a (1 + )-nearest

neighbor of x can be reported in O(cd, log N ) time.

√

(2) Given > 0 and a range R, the m points in R can be reported in O(m + 2d log N + (3 d/ )d )

time.

246 Data structures

It remains to remark that the data structure itself is independent of . Hence different

accuracies can be tested without rebuilding the data structure. Moreover, the constants

are worst-case constants. We know already that in the case of quasi-uniform data sets the

behavior is much better. This is also true if an average time analysis based on a uniform

distribution is performed.

14.4 Range trees

We come ¬nally to a data structure that has been designed for answering rectangular range

queries in particular. It is based on ideas from the univariate case, which we will hence

investigate ¬rst. If X = {x1 , . . . , x N } ⊆ R consists of N pairwise distinct real numbers and

[x, x ] is a given interval then we want to report all points from X inside [x, x ]. This can

be done by employing either a sorted array or a balanced binary search tree. Let us use the

latter. We will store the points of X in the leaves of our tree T . The internal nodes of T

store certain splitting values to guide the search; let us denote the splitting value stored at

node ν by v(ν). Now, that T is a search tree means that for every internal node ν its left

subtree contains all the points smaller than or equal to v(ν) and its right subtree contains all

the points greater than v(ν). It is well known that a balanced search tree can be built with

O(N ) space and O(N log N ) time and having O(log N ) depth.

To report all points in [x, x ] we can do the following. First, we search in T for x and x .

This gives us two leaves μ and μ at which the searches end. Then, all points in [x, x ] are

those points stored in leaves between μ and μ plus, possibly, the points stored at μ and μ

themselves. The leaves between μ and μ are the leaves of certain subtrees, rooted at nodes

ν between the search paths for x and x , whose parents are on one of the search paths. To

¬nd these nodes we ¬rst ¬nd the node νsplit at which the two search paths for x and x split.

This is achieved using Algorithm 9.

Algorithm 9 Find split node

A binary search tree T and a query region [x, x ].

Input:

Output: The node ν where the search paths to x and x split, or the leaf where both end.

Set ν = r oot(T ).

while ν is not a leaf and (x ¤ v(ν) or x > v(ν)) do

if x ¤ v(ν) then

Apply this algorithm to the left child of ν.

else Apply it to the right child.

Starting from this splitting node we follow the search path for x. At each node where the

search path goes left, we report all leaves in the right subtree. Then we follow the path to

x and report the leaves in the left subtree of nodes where the path goes right. Finally, we

14.4 Range trees 247

have to check the points stored at the leaves where the paths end. Details may be found in

Algorithm 10.

Algorithm 10 One-dimensional range query

A binary search tree T and a query region [x, x ].

Input:

Output: All points from X in [x, x ].

Use Algorithm 9 to determine the splitting node ν.

if ν is a leaf then

Report the point stored there if necessary.

else

Set ν to be its left child.

while ν is not a leaf do

if x ¤ v(ν) then

Report all points in the subtree of the right child of ν.

Set ν to be its left child.

else

Set ν to be its right child.

Report the point stored at the leaf ν if necessary.

Similarly, follow the path to x and report the points in the subtrees left of the path.

Check whether the point stored at the leaf where the path ends has to be reported.

Lemma 14.11 Algorithm 10 reports exactly those points x j that lie in [x, x ].

Proof First we show that each point reported by the algorithm is contained in [x, x ].

Suppose that x j is one of the reported points. If x j is stored at one of the leaves where

the search paths end then we check explicitly whether x j ∈ [x, x ]. Otherwise it must have

been reported inside the ˜while™ loop. Assume that it was reported on the way to x. Let ν

denote the node on the path such that, on the one hand, x j is in the subtree of ν™s right child.

Since ν and hence its right child and hence x j are in the left subtree of the splitting node,

we have x j ¤ v(νsplit ) ¤ x . On the other hand, the search path to x goes left at ν, giving

x ¤ v(ν) ¤ x j . If x j is reported while we are on our way to x then a similar argument

holds.

Finally, we have to check whether all points x j ∈ [x, x ] have been reported. Let x j ∈ X

be any point in [x, x ] and assume that x j has not been reported. Let μ be the leaf where x j