-0.1

counterparts displayed in Figure 11.2b. The

5 10 15 20

random perturbations observed at large lags are

due to sampling variability.

Figure 12.3: Estimated full (upper panels) and

Despite the similarity between the theoretical

partial (lower panels) auto-correlation functions

and estimated functions, the generating processes

computed from time series of length 240 generated

can not be unequivocally identi¬ed as AR(2)

from two AR(2) processes.

processes. However, since the estimated partial

auto-correlation functions (lower panels) quickly

fall to zero after lag-2, the AR(2) model would be

has rather complex behaviour. On the other hand,

a good ¬rst guess in both cases.

our perception of Figures 12.1 and 12.2 is coloured

The second example we consider is a time series

by our knowledge of the model that generated the

of length 240 generated from an MA(10) process

data. Making a correct ¬rst guess of the type and

with parameters β1 = · · · = β10 = 1. The

order of model to ¬t is very dif¬cult in this case.

full and partial auto-correlation function estimates

computed from this time series are displayed in the

12.2.2 Fitting AR Processes: the Yule“Walker

lower panels of Figures 12.1 and 12.2.

Method. We now restrict ourselves to AR

This time series presents a greater challenge

processes, both because the class of AR models

than the examples discussed above. The full auto-

is as rich as the class of ARMA models3 and

correlation function decays to zero more or less

because we do not wish to consider models whose

exponentially, suggesting that the process may be a

dynamics are forced by noise processes with

low-order AR process. The partial auto-correlation

memory (cf. [10.5.4]).

function decays to zero after two lags, suggesting

The Yule“Walker estimates of the parameters

that the process is AR(2). However, there are

of an AR( p) process are obtained simply by

also partial auto-correlation estimates that are

plugging values of the estimated auto-covariance

signi¬cantly different from zero at lags 6, 10,

function c(„ ) or auto-correlation function r („ ) into

11, and 14. A skilled practitioner may suspect

the process to be a pure MA process because 3 Note, however, that the AR approximation of a given

the full auto-correlation function goes to zero process may not be as parsimonious as an ARMA

quickly and the partial auto-correlation function approximation. See [10.5.4].

12.2: Identifying and Fitting Auto-regressive Models 257

small for samples of length 240.

the Yule“Walker equations (11.2) (see [11.1.6])

and solving the system for the unknown process

The mean of 100

parameters. Thus the Yule“Walker estimates are

Yule“Walker parameter estimates

given by

(0.9, ’0.8) (0.3, 0.3)

T

’1 T

±p = R r (1), . . . , r ( p) (0.72, ’0.63) (0.16, 0.04)

15

(0.85, ’0.75) (0.27, 0.24)

60

where R is the p — p matrix with Ri j = r (|i ’ (0.88, ’0.78) (0.30, 0.29)

240

j|) and ± p is the vector of parameter estimates

Note that these results do not fully re¬‚ect the actual

(±1 , . . . , ± p )T .

properties of the Yule“Walker estimator in practice

These parameter estimates can be used to

because prior knowledge was used to choose the

estimate the variance of the forcing noise, say σZ ,

2

order of AR process to ¬t.

by computing

12.2.4 Fitting AR Processes: Maximum Like-

1T

σZ2 = (xt ’ ±1 xt’1 ’ · · · ’ ± p xt’p )2 . lihood. Most statistical and scienti¬c subroutine

T t= p+1

packages include routines that compute maximum

(12.7) likelihood estimates of AR (and ARMA) param-

eters. We therefore give a general description of

how these estimates are obtained in this subsec-

12.2.3 Example: Yule“Walker Estimates. The tion, and describe the estimation of their uncer-

Yule“Walker estimates of (±1 , ±2 ) computed tainty in [12.2.6]. In most climatological research

from the auto-correlation functions displayed in contexts, however, the Yule“Walker estimates pro-

Figure 12.3 and the corresponding estimates of the vide close approximations to the exact maximum

noise variance are given in the following table. likelihood estimates (MLEs). Maximum likeli-

hood estimation should be used when samples are

Yule“Walker parameter estimates

˜small™ or when the AR parameters are thought to

and noise variance estimates

be close to the boundaries of the admissible region.

(0.9, ’0.8) (0.3, 0.3) Even so, parameter estimates will be somewhat

±1 biased, as discussed at the end of this subsection.

0.868 0.358

±2 ’0.784 To simplify the discussion below we assume

0.308

σZ2 that the observed process has zero mean. To keep

1.077 1.101

our notation fairly compact, we let xT be the

T -dimensional vector (x1 , . . . , xT )T that contains

Both ¬tted models are close to those that generated

the data. The noise variance is only slightly the observed time series. Also, we let XT be the

overestimated in both cases and the errors in the vector that contains the corresponding segment of

the stochastic process {Xt : t ∈ Z}. Vectors x p and

parameter estimates are modest. The spectra of

the ¬tted processes also closely match those of X p are de¬ned similarly.

Let {Xt : t ∈ Z} be a stationary, normally

the generating processes. The ¬rst model has a

spectral maximum at a slightly higher frequency distributed, AR( p) process that is forced by noise

(ω = 0.202; see [11.2.6] and (11.24)) than the with variance σZ and has parameters ± p =

2

generating process (ω = 0.166). The second has a (±1 , . . . , ± p )T . Then the joint density function of

spectral minimum at ω = 0.282, which compares XT is given by

well with ω = 0.278 for the generating process.

|M p |1/2 e’S(± p )/(2σZ )

2

We conducted a small Monte Carlo experiment

f XT (xT |± p , σZ ) =

(2π σZ )T /2

to obtain an impression of how the bias varies 2

with sample size. One hundred samples of length (12.8)

T = 15, 60, and 240 were generated from AR(2)

processes with parameters (±1 , ±2 ) = (0.9, ’0.8) where

and (0.3, 0.3). Each sample was used to compute

S(± p ) = xT M p x p

p

Yule“Walker parameter estimates.

T

The results, given in the table below, show that

+ (xt ’ ±1 xt’1 ’ · · · ’ ± p xt’p )2

the bias is substantial when samples are very small.

t= p+1

The bias becomes modest for samples of moderate

M p = Σ’1

length and, for these examples, becomes quite p

12: Estimating Covariance Functions and Spectra

258

and where Σ p is the p — p matrix whose (i, j)th so that the minimization can be done ef¬ciently

element is γx x (|i ’ j|)/σZ . Note that the elements

2 (see Box and Jenkins [60, Chapter 7]; Ansley

of M p are independent of σZ since they describe