Q Q

= h» (x, y) ’ pk (x) (ξk , y) ’ p (y) (x, ξ )

=1

k=1

Q

+ p (y) pk (x) (ξk , ξ )

k, =1

Q Q

+ qh (x ’ y) ’ pk (x)qh (ξk ’ y) ’ p (y)qh (x ’ ξ )

=1

k=1

Q

+ pk (x) p (y)qh (ξk ’ ξ )

k, =1

= h » κ(x, y).

This theorem gives the reason why the condition number of the matrix C is independent

of the scaling parameter. To see this we only have to show that the conditions of the theorem

are met by thin-plate splines. But before that let us record

Corollary 12.14 Suppose that ∈ C(Rd — Rd ) is a conditionally positive de¬nite kernel

with respect to πm’1 (Rd ) and satis¬es the assumptions of the last theorem. Sup-

pose further that X = {x1 , . . . , x N } is disjoint with . Let C = (κ(x j , xk )) and C h =

(κ h (hx j , hxk )); then both matrices have the same condition number with respect to the

2 -norm.

Proof Since both C and C h are positive de¬nite matrices, we can compute their minimal

and maximal eigenvalues:

N

»min (C h ) = min ± j ±k κ h (hx j , hxk )

± 2 =1

j,k=1

N

»

=h ± j ±k κ(x j , xk )

min

± 2 =1

j,k=1

= h » »min (C).

Replacing “min” by “max” shows that »max (C h ) = h » »max (C). Hence both matrices have

the same condition number.

As mentioned before we will conclude this section by giving two examples.

β

Proposition 12.15 The kernel (x, y) = (’1) β/2 x ’ y 2 , β > 0, β ∈ 2N, is a function

that is conditionally positive de¬nite of order m = β/2 and satis¬es the condition of The-

orem 12.13 with » = β and qh ≡ 0. The kernel (x, y) = (’1)k+1 x ’ y 2k log x ’ y 2

2

is conditionally positive de¬nite of order m = k + 1 and satis¬es the condition of

Theorem 12.13 with » = 2k and a polynomial qh ∈ π2k (Rd ) ⊆ π2m’1 (Rd ).

222 Stability

Proof Everything said about the ¬rst kernel is obviously true, so let us turn to the second

kernel, the thin-plate splines. There we have

(hx, hy) = (’1)k+1 h 2k x ’ y + log x ’ y )

2k

2 (log h

= h 2k (x, y) + (’1) h log h x ’ y 2.

k+1 2k 2k

12.4 Notes and comments

The idea of expressing the condition number in terms of the so-called separation distance

was initiated by Ball [8] and extended by Narcowich and Ward in [143“145, 147]. Their

approach is based on the Schoenberg and Bernstein“Hausdorff“Widder theory on the one

hand and on Micchelli™s theorem on the other, so that it is restricted to radial functions

that are (conditionally) positive de¬nite on every Rd . But their idea was so powerful that

Schaback [163] could easily extend it to the general case of conditionally positive de¬nite

functions by using representations of Bochner™s type.

The trade-off principle, also sometimes called the uncertainty relation, was ¬rst discov-

ered in this ¬eld by Schaback [163]. The third section of the chapter was based on ideas of

Beatson et al. [15].

13

Optimal recovery

So far, we have dealt with the following simple interpolation or approximation problem. An

in general unknown function f is speci¬ed only at certain points X = {x1 , . . . , x N }, and we

are interested in recovering the function f on a region that is well covered by the centers

X . In a later chapter we will concentrate on more general problems. But let us stick to this

particular one a little longer. Why should we use (conditionally) positive de¬nite kernels

for recovering f ?

We have already learnt that recovering f is a dif¬cult task and that radial basis functions

are a powerful tool for doing this. In particular, they can be used (at least theoretically “

we come back to the numerical treatment in a later chapter) with truly scattered data and

in every dimension. Moreover, positive de¬nite functions appeared quite naturally in the

context of reproducing-kernel Hilbert spaces.

But this is not the end of the story. Interpolants based on (conditionally) positive def-

inite kernels are optimal in several other ways and the present chapter is devoted to this

subject.

13.1 Minimal properties of radial basis functions

Let us start with best approximation. We have seen that the native space N ( ) corre-

sponding to a (conditionally) positive de¬nite kernel is an adequate function space. The

interpolant s f,X is one candidate that uses the given information about f on X , but of course

not the only one. More precisely, any function s from the space

N N

VX = s = ± j (·, x j ) + p : p ∈ P and ± j q(x j ) = 0 for all q ∈ P (13.1)

j=1 j=1

can be considered.

∈ C( — ) is a conditionally positive de¬nite kernel

Theorem 13.1 Suppose that

with respect to the ¬nite-dimensional space P ⊆ C( ). Suppose further that X is P-

unisolvent and that f ∈ N ( ) is known only at X = {x1 , . . . , x N } ⊆ . Then the in-

terpolant s f,X is the best approximation to f from (13.1) with respect to the native space

223

224 Optimal recovery

(semi-)norm, i.e.

| f ’ s f,X |N ¤ | f ’ s|N

() ()

for all s ∈ VX . Hence, s f,X is the orthogonal projection of f onto VX .

Proof By Lemma 10.24 we know that ( f ’ s f,X , s)N ( ) = 0 for all s ∈ VX . But this is

the characterization of the best approximation in Hilbert spaces. The argument also holds

in the case of a symmetric bilinear form with nonzero kernel.

Note that in the case of a conditionally positive de¬nite function the best approximation

is uniquely determined only up to an element from P. We can avoid this nonuniqueness by

going over to the inner product

Q

( f, g) = f (ξk )g(ξk ) + ( f, g)N ()

k=1

with a P-unisolvent subset = {ξ1 , . . . , ξ Q } ⊆ ; see Theorem 10.20.

But this is not yet the whole story. Recalling the minimal properties of splines, we know

that they minimize an energy functional under all interpolatory functions from a Sobolev

space. The same is true here.

∈ C( — ) is a conditionally positive de¬nite kernel

Theorem 13.2 Suppose that

with respect to the ¬nite-dimensional space P ⊆ C( ). Suppose further that X is P-

unisolvent and that values f 1 , . . . , f N are given. Then the interpolant s f,X has minimal

(semi-)norm | · |N ( ) under all functions s ∈ N ( ) that interpolate the data { f j } at the