set X . The lower bound is slightly smaller than expected, to ensure that in what follows all

indices are between 1 and N 1/d .

To ¬nd out in which box a point x = (x1 , . . . , xd )T ∈ Rd is contained, we compute the

index vector (I (x1 ), . . . , I (xd ))T with

x k ’ ±k

.

N 1/d

I (xk ) :=

βk ’ ± k

234 Data structures

Then this index vector has to be mapped to a unique index, which can be done by de¬ning

d

[I (xk ) ’ 1] N 1/d .

k’1

I (x) :=

k=1

Let us analyze the resources needed by this algorithm. We need O(d N ) space to store

N points from Rd . The list of cells needs another O(N ) space for the pointers to the lists

of indices. Finally, since every point is reported only once in the lists of indices, we need

O(N ) space for storing all indices. Thus, summing up, we need O(d N ) space to build the

entire data structure.

Next, let us turn to the computational complexity. In the ¬rst step of the algorithm we have

to compute the maximum and minimum of each coordinate. This can be done in O(d N )

time. In the second step we have to build the data structure for the cells. This means we

have to de¬ne an array of size N 1/d d whose entries are lists of indices. Hence this can be

done in O(N ) time. Finally, in the third step we have to ¬nd for each center x j its box. The

computation of its index costs O(d) time, so that we need O(d N ) time for the third step.

Note that we have not made use of the fact that X is quasi-uniform at all. We have derived

the following result.

Theorem 14.3 Algorithm 1 needs O(d N ) time and O(d N ) space to build a grid data

structure for N points in Rd .

The left-hand part of Figure 14.1 shows a typical example of a gridded set of centers.

With this data structure given, it is now time to see how it helps to answer our requests.

Let us start with the range query. In Figure 14.1 the algorithm for an ellipsoidal region is

demonstrated. The precise formulation of the procedure is given in Algorithm 2.

Any analysis of the computational complexity has to consider the size of the query

region. In general, bounds on the necessary time are given in terms of N and of the number

of points in the query region. Here, we want to take a different point of view. We are

mainly interested in query regions R that have size O(N ’1/d ). If this is the case then there

Fig. 14.1 Grid structure for points (on the left) and for a range query (on the right).

14.1 The ¬xed-grid method 235

Algorithm 2 Range query

Query region R ⊆ Rd .

Input:

Output: All points that lie in R.

Compute the indices of all cells W with R © W = ….

If this is too expensive then compute ¬rst the bounding box B of R and then all indices of

cells W with B © W = ….

Traverse all cells found and report those points contained in R.

exists a cube W = W (y, c1 N ’1/d ) containing R. Since our cells are cubes with side length

O(N ’1/d ), the number of cells that have common points with W and hence with R is

bounded by O(cd ), i.e. it is constant in terms of the number N of points but exponential

in terms of space dimension. Moreover, if X is quasi-uniform then the number of points in

each cell is also bounded by O(cd ) by Corollary 14.2. Hence we have

Proposition 14.4 Let X = {x1 , . . . , x N } ⊆ be quasi-uniform. Suppose that the query

region R is of size O(N ’1/d ). If its bounding box can be computed in constant time and if

it is also possible to decide in constant time whether a point belongs to R, then Algorithm

2 needs constant time in terms of N and exponential time in terms of d to report all points

contained in R.

We ¬nish this section by investigating the nearest neighbor problem. To look for the

nearest neighbor of x, we simply locate the box that contains x and compute the distance of

all points x j in that box. Moreover, we have to check the points in the neighboring boxes.

To make this procedure more precise, we will introduce the concept of surrounding boxes.

For a given box

B = [γ1 , δ1 ] — · · · — [γd , δd ]

we de¬ne the kth surrounding box to be

Sk (B) = {[γ1 + h 1 ±1 , δ1 + h 1 ±1 ] — · · · — [γd + h d ±d , δd + h d ±d ] : ± = k},

∞

where h j = δ j ’ γ j and ± is in Zd . In particular, S0 (B) contains only B itself. See

Figure 14.2 for an illustration.

With this de¬nition we can formulate the algorithm for the nearest neighbor search as in

Algorithm 3. Remember, that we already have the bounding box B B of X .

If X is quasi-uniform then we know that each box of the grid contains only a constant

number of centers. Moreover, the size of the grid boxes is proportional to h X, . Hence the

algorithm terminates after a constant number of steps.

Proposition 14.5 Let X = {x1 , . . . , x N } ⊆ be quasi-uniform. For every x ∈ Rd , Algo-

rithm 3 reports the nearest neighbor from X to x in constant time in terms of N .

236 Data structures

Fig. 14.2 Typical surroundings of a two-dimensional box.

Algorithm 3 Nearest neighbor search

x ∈ Rd query point.

Input:

Output: A point x j ∈ X closest to x.

Find the box B with x ∈ B.

Initialize dist by in¬nity.

for k = 0, 1, 2, . . . do

for every W ∈ Sk (B) do

Compute the distance of each x j ∈ W © B B to x, store the closest so far, and update

dist.

Stop if dist < k min1¤ j¤d h j .

A generalization to the nearest neighbors problem is obvious. The time taken to report

those neighbors depends also on the number .

The ¬xed-grid method can also be seen in a different light. It turns out to be an ef¬cient

data structure for the containment query. Let us describe this in more detail. Suppose that

is an axis-parallel cube. Then we take the covering { j } to be cubes centered at a grid hZd

of side length cs h that have common points with . On the boundary of we might not

need all the small cubes. Hence, an offset 1 ¤ k ¤ cs determines how much our covering

overlaps . Figure 14.3 demonstrates the idea. It shows the ¬ne hZd grid on and some

of the macro-boxes j .

Obviously, for every x ∈ those boxes j that contain x can be determined in constant

time. Moreover, only a constant number of these boxes overlap, i.e. each x is contained

only in a constant number of boxes and the constant is independent of x. Finally, if X is a

quasi-uniform data set with ¬ll distance h then the number of points from X in one of the

14.2 kd-Trees 237

Fig. 14.3 Covering a region with boxes.

boxes j is bounded by a uniform constant. If is not a box, then the general idea still

holds. Only at the boundary do the boxes have to be adjusted.

14.2 kd-Trees

In the last section we divided the bounding box of the centers X into equally sized subboxes

and collected all points for each of these subboxes. If the centers fail to be quasi-uniform

(which means numerically that the quasi-uniformity constant cqu is huge), we have a large

number of empty boxes, or boxes that contain a large number of points, or both. In this