As pointed out earlier, even though the multilevel algorithm resembles the alternating-

projection algorithm we cannot use the ideas developed in Section 15.3 to prove con-

vergence. But we can use the fact that P j f denotes the best approximation to f from

VX j = span{ δ j (· ’ x) : x ∈ X j } with respect to the native space norm · N δ ( ) . This

j

gives for level j

f j’1 ’ P j f j’1 + P j f j’1 = f j’1

2 2 2

N N N

() () ()

δj δj δj

or

¤ f j’1 ( ),

fj N N

()

δj δj

and we have to bound this expression by a constant, smaller than one, times f j’1 ( ).

N δ j’1

Hence everything reduces to the problem of ¬nding a constant c < 1 such that

¤c f

f (15.12)

N N()

()

δ

for all f ∈ N ( ) with a given 0 < δ < 1. Note that in most cases scaling does not change

the native space, only the norm. Things are different if different basis functions are used.

The constant c in equation (15.12) is easily determined if is a thin-plate spline (x) =

β

x 2k log x 2 or if (x) = x 2 . For example, in the ¬rst case we know from Theorem

2

8.17 that has a generalized Fourier transform (ω) = cd,k ω ’d’2k , with a constant cd,k

2

that does not play a role in what we intend to do now. Next, Theorem 10.21 tells us that the

native space of on Rd has the semi-norm

1

| f |2 = | f (ω)|2 ω d+2k

dω

N ()

(2π)d/2 cd,k Rd

so that the semi-norm for the scaled basis function becomes

δ ’d

| f |2 = | f (ω)|2 δω dω = δ 2k | f |2 ( ).

d+2k

N N

()

(2π)d/2 cd,k

δ

Rd

Since δ < 1, the constant c can be chosen to be δ k . Using thin-plate splines in the multilevel

algorithm is of course possible and we have just seen that it results in a convergent method.

But the multilevel algorithm was tailored for compactly supported functions and hence

it is not very surprising that in fact the numerical bene¬t of using thin-plate splines here

is limited. Nonetheless, the success of proving convergence for thin-plate splines might

encourage the reader to establish convergence results for other basis functions.

15.6 A greedy algorithm

All the numerical methods we have investigated up to now try to use all the data points.

This is of course reasonable since the data often result from expensive experiments and it

is hard to explain to the user why some of these measured values should not be used.

Nonetheless, building the approximant on only a subset of the centers might lead to a

more ef¬cient way of representing the unknown function. In that situation one has to decide

284 Numerical methods

which centers are to be used and to think about strategies to employ all the information

given.

In this context one often faces so-called adaptive greedy algorithms. We do not want to

discuss the general idea of an adaptive greedy method but we want to explain how it can

work in the context of compactly supported radial basis functions. As for the alternating-

projection and multilevel methods, the idea is to interpolate on subsets X j of the original data

sets X and to form residuals. But this time the data sets X j are not chosen in advance. Instead,

the choice of data set X j depends on the residuum of the previous level. As well as the data

set X j , it is also possible to choose the basis function or at least its support radius differently.

In what follows we will use the notation of the previous sections. In particular, we will

denote the interpolation operator based on X j and on a general positive de¬nite kernel j

by P j . The residuals are denoted by f j and the accumulated approximants by s j .

The following proposition is a rephrasing of the thoughts expressed at the end of the last

section, but it is the crucial point in what we want to do in this section.

Proposition 15.24 Suppose that the positive de¬nite kernel is used for all levels j, i.e.

j= . If there exists a constant γ ∈ (0, 1) such that

≥ γ f j’1

P j f j’1 N() N()

for all j ≥ 1 then s j converges linearly to f in the native space.

Proof The proof simply follows from f ’ s j = f j and

= f j’1 ’ P j f j’1

2 2

fj N() N()

= ’ P j f j’1

2 2

f j’1 N() N()

¤ (1 ’ γ 2 ) f j’1 N ( ).

2

Since we want to choose the sets X j at each step we have to do this in such a way that the

condition P j f j’1 N ( ) ≥ γ f j’1 N ( ) is satis¬ed. This might seem to be problematic

at ¬rst sight, but things become easier if we concentrate on approximating s f,X instead of

f itself. This means that we want to compute an approximation to an interpolant based on

a large data set, and this is actually what we have already done several times before.

Thus, we will work in a ¬nite-dimensional space, and we can replace the native space

norm by a suitable norm.

Corollary 15.25 Suppose that the subset X j of X is chosen such that

≥ γ f j’1

f j’1 (15.13)

L p (X j ) L p (X )

for all j ≥ 1 and for a ¬xed γ ∈ (0, 1). Then the sequence s j converges to s f,X linearly in

the native space and in the L p (X )-norm.

15.6 A greedy algorithm 285

Proof Since we are working in the ¬nite-dimensional space VX , all norms are equivalent.

In particular, there exist two constants 0 < c1 ¤ C1 depending on , X , and p such that

¤g ¤ C1 g

c1 g N()

L p (X ) L p (X )

holds for all g ∈ VX . This relation gives us the bound

≥ c1 P j f j’1 ≥ c1 P j f j’1

P j f j’1 N() L p (X ) L p (X j )

= c1 f j’1 ≥ c1 γ f j’1

L p (X j ) L p (X )

c1 γ

≥ N ( ).

f j’1

C1

Since c1 γ /C1 ∈ (0, 1) we can apply Proposition 15.24 to get linear convergence in the