itself (solid curve), with a largest deviation of about 2%. Its ¬rst derivatives

at x = 0 and x = π di¬er somewhat from the ideal cosine values. The cubic

interpolation using exact derivatives (1, 0, ’1, 0, 1) gives a somewhat better

¬t (dashed curve), but with slight discontinuities of the second derivatives

at x = 0 and x = π.

19.3 Fitting splines

There is one good reason not to draw spline functions through a number of

given points. That is if the points represent inaccurate data. Let us assume

that the inaccuracy is a result of statistical random ¬‚uctuations.5 The data

points yi then are random deviations from values fi = f (xi ) of a function

f (x) that we much desire to discover. The value of di = yi ’ fi is a random

sample from a distribution function pi (di ) of the random variable. That is,

if the data points are statistically independent; if they are not, the whole set

of deviations is a sample from a multivariate probability distribution. We

need at least some knowledge of these distribution functions, best obtained

from separate observations or simulations:

4 This is the rationale for the name spline, borrowed from the name of the thin elastic rods used

by construction engineers to ¬t between pairs of nails on a board, in order to be able to draw

smooth outlines for shaping construction parts. The elastic deformation energy in the rod is

proportional to the integral of the square of the second derivative, at least for small deviations

from linearity. The rod will assume the shape that minimizes its elastic energy. If the ends

of the rods are left free, natural splines result. Lasers and automated cutting machines have

made hardware splines obsolete.

5 Be aware of, and check for, experimental errors or programming errors, and “ using simulations

“ for insu¬cient sampling and inadequate equilibration. An observable may appear to be

randomly ¬‚uctuating, but still only sample a limited domain. This is a problem of ergodicity

that cannot be solved by statistical methods alone.

19.3 Fitting splines 531

0.02

1 0.01

0.75 0

0.5 ’0.01

0.25 ’0.02

1 2 3 4 5 6

0

’0.25

’0.5

’0.75

’1

1 2 3 4 5 6

Figure 19.2 A cubic periodic spline (dotted) ¬tted through ¬ve points sampling a

sine wave (solid curve). Dashed curve: cubic interpolation using function values

and ¬rst derivatives at the sample points. The inset shows the di¬erences with the

sine function.

(i) their expectation (or “expectation value”), de¬ned as the average over

the distribution function, must be assumed to be zero. If not, there

is a bias in the data that “ if known “ can be removed.

∞

xpi (x) dx = 0, (19.30)

∞

2

(ii) their variances σi , de¬ned as the expectation of the square of the

variable over the unbiased distribution function

∞

2

x2 pi (x) dx.

σi = (19.31)

∞

For our purposes it is su¬cient to have an estimate of σi . It enables us to

determine the sum of weighted residuals, usually indicated by chi-square:

n

(yi ’ fi )2

2

χ= . (19.32)

2

σi

i=0

532 Splines for everything

2

f( χ )

5

0.15

ν=

10

0.10

20

30

50

0.05

0

χ2

20 40 60

Figure 19.3 The chi-square probability distribution for various values of ν, the

number of degrees of freedom.

If the weighted deviations (yi ’fi )/σi are samples from normal distributions,

χ2 will be distributed according to the chi-square distribution

χ2

ν ’1 2 ν/2’1

f (χ |ν) = [2

2 ν/2

exp(’ ) dχ2 ,

“( )] (χ ) (19.33)

2 2

as depicted in Fig. 19.3. The parameter ν is the number of degrees of free-

dom; if the deviations are uncorrelated, ν = n for our purposes. The expec-

tation (average value over the distribution) of χ2 is equal to ν, and for large ν

the χ2 -distribution tends to a Gaussian distribution with variance 2ν. From

the cumulative χ2 -distribution con¬dence intervals can be computed: these

indicate the range in which, say, 90% of the samples are expected to occur,

or, say, the value that will be exceeded in 1% of the cases. Note that the

median value for χ2 , expected to be exceeded in 50% of the cases, is close

to ν.

Chi-square tables can be used to decide what deviation from a ¬tted

curve is considered acceptable. The logical choice is to take the value at the

expectation of χ2 , which is the 50% value, for which χ2 = ν is a su¬cient

approximation. If a small value is set, the ¬tted curve may follow the noisy

data too closely; if a large value is allowed, details may be missed that are

real.

Extensive tables and equations can be found in Beyer (1991) and in

Abramowitz and Stegun (1965), as well as in most textbooks on statis-

tics. In Table 19.1 an excerpt is given of values that are not exceeded by

χ2 /n in a given fraction of the cases. The 50% value is the most neutral

expectation, which statistically will be exceeded in half of the cases. The

19.3 Fitting splines 533

Table 19.1 Values that are not exceeded by χ2 /n in the fraction of cases F

mentioned in the header row. Vertical is the number of degrees of freedom