W

To determine the usefulness of an algorithm A we measure its error by

E(A) := sup A(J x) ’ T x W,

x∈K

and the entire problem has an intrinsic error de¬ned by

E — = inf E(A).

A

Now it should be clear what an optimal algorithm has to do.

De¬nition 13.4 A mapping A : J (K ) ’ W is called an optimal algorithm for the problem

just described, if E(A) = E — .

Next, we derive a suf¬cient condition for an optimal algorithm. To this end we assume

that K is symmetric, i.e. x ∈ K implies ’x ∈ K .

Theorem 13.5 Suppose that K is symmetric. If there exists a mapping F : J (K ) ’ U

such that for x ∈ K

(1) x ’ FJ x ∈ K ,

(2) J (x ’ FJ x) = 0

then T F : J (K ) ’ W is optimal.

Proof First of all, the symmetry of K and the linearity of J show that we have for x ∈ K

with J x = 0 also J (’x) = ’J (x) = 0. This gives, for an arbitrary algorithm A and such

an x,

Tx = T x ’ A(0) + T x + A(0)

1

2

¤ T x ’ A(0) + T x + A(0)

1

2

¤ max{ A(0) ’ T x , A(0) + T x }

= max{ A(J x) ’ T x , A(J (’x)) ’ T (’x) }

¤ E(A).

228 Optimal recovery

Hence we have established on the one hand the inequality

sup{ T x : x ∈ K , J x = 0} ¤ E — .

On the other hand, the existence of an F with the stated properties now shows that

E(T F) = sup T (FJ x ’ x) ¤ sup{ T z : z ∈ K , J z = 0} ¤ E — ,

x∈K

which means that T F is indeed an optimal algorithm.

At ¬rst sight the previous result seems to be of only limited use, since the optimal

algorithm T F includes the target operator. The following two examples show that this is in

fact not the case.

In the ¬rst example, we choose U = W = N ( ) as the native space of a (condition-

ally) positive de¬nite kernel and T : N ( ) ’ N ( ) as the identity mapping T = id.

Moreover, we set V = R N and de¬ne J : N ( ) ’ R N by J f = ( f (x1 ), . . . , f (x N ))T

for a given ¬xed set {x1 , . . . , x N } ⊆ . Finally, K is chosen to be the unit ball in N ( ),

i.e. K = { f ∈ N ( ) : | f |N ( ) = 1}. In this setting an algorithm A : J (K ) ’ N ( ) is

optimal if it minimizes

E(A) = |A( f |X ) ’ f |N ( ).

sup (13.3)

| f |N ( ) ¤1

To apply Theorem 13.5, we de¬ne F : R N ’ N ( ) to be the interpolation mapping, i.e.

F( f 1 , . . . f N ) = s{ f j },X . Then the interpolation condition is equivalent to J FJ f = J f for

all f ∈ N ( ), which is the second condition in Theorem 13.5. The ¬rst condition can be

concluded from Theorem 13.1. Since s f,X is the best approximation to f , we in particularly

have

| f ’ FJ f |N ¤ | f |N ¤ 1.

() ()

Hence by Theorem 13.5 interpolation is in this sense optimal.

Corollary 13.6 Among all mappings A : N ( )|X ’ N ( ), interpolation in X is opti-

mal in the sense that it minimizes (13.3).

The result obviously remains true in the following more general setting. Whenever an

interpolatory operator provides a best-approximation property, it gives an optimal algorithm

to the associated problem.

Our second example goes in the same direction. This time we choose U, K ⊆ U, V, and

J as in the last example, but we let W be R and T : N ( ) ’ R be de¬ned by T f = f (x)

for a ¬xed x ∈ . This means that we now want to minimize

E(A) = |A( f |X ) ’ f (x)|.

sup (13.4)

| f |N ( ) ¤1

13.3 Notes and comments 229

Since the setting is almost the same as in the ¬rst example, we have in particular again that

with F( f |X ) = s f,X the conditions of Theorem 13.5 are satis¬ed. Hence, interpolation is

again optimal.

Corollary 13.7 Among all mappings A : N ( )|X ’ R the one that evaluates the inter-

polant s f,X at x is optimal in the sense that it minimizes (13.4).

Obviously, one can easily think of several other examples and we encourage the reader to

do so.

For simplicity, we have restricted ourselves here to a situation of exact information. It is

possible to include the fact that J x is known only up to an error of magnitude > 0 by

rede¬ning the error of an algorithm as

Ay ’ T x ,

E(A, ) := sup sup

x∈K y∈V: J x’y ¤

but this would lead us beyond the scope of this book.

13.3 Notes and comments

In this chapter we have recovered many of the striking properties of univariate splines in

the context of multivariate approximation by (conditionally) positive de¬nite kernels.

Deeper insights into the theory of optimal recovery can be gained from the review articles

[73] by Golomb and Weinberger and [134] by Micchelli and Rivlin and the literature therein.

14

Data structures

So far we have dealt mainly with the theoretical background of scattered data approximation

and interpolation. Except for the moving least squares approximation we have not discussed

an ef¬cient implementation of our theory. But from the investigation of the condition num-

ber of straightforward interpolation matrices we know that special techniques have to be

developed to get an ef¬cient but still accurate approximation method. Several approaches

will be discussed in the next chapter.

Crucial to all these methods is the choice of the underlying data structure. It is worth

re¬‚ecting for a while on how the centers can be stored in a computer most success-

fully. Thus, in this chapter we will discuss different ways of representing the centers

X = {x1 , . . . , x N } ⊆ ⊆ Rd .

Let us start by collecting possible requests about the set of points X that the data structure

should be able to answer ef¬ciently.

The ¬rst question every user has to answer is whether all points should be kept in the main

memory of the computer or whether the number of points is so large that it has to be stored

on a hard disk or other external device. The difference is that in the latter case a reasonable

ordering of the data points reduces the number of disk accesses, resulting in a dramatic

reduction in run time. There is also an improvement in the ¬rst situation by a reasonable

ordering, because of the cache of the computer, but it is not dramatic. We will concentrate

on the situation where all points can be kept in the main memory of the computer.

The second question the user has to answer is whether all points are given in advance,

which would allow us to build the data structure using the knowledge of all the points, or

whether the points are given one by one, which would force us to build the data structure

point by point. We will concentrate here on the former case. Nevertheless, each of our

data structures allows us to add points after the initial build-up. But adding too many

points in this way might lead to a degenerate data structure that in turn results in bad