it is dif¬cult to constrain numerical minimization

the auto-covariance function of an AR( p) process

methods to the admissible parameter region

forced with unit variance white noise.

for weakly stationary AR( p) processes. Thus

The likelihood function (see [5.3.8]) is the

transformations are used to map the admissible

probability density (12.8)

region onto the real p-dimensional vector space

L(± p , σZ |xT ) = f (xT |± p , σZ ) (see, e.g., Jones [205]). These transformations

enforce stationarity in the ¬tted model by mapping

re-expressed as a function of the unknown

the boundaries of the admissible region to in¬nity.

parameters for a ¬xed realization xT of XT .

Consequently, MLEs of AR parameters tend to

With normal random variables it is easier to

be negatively biased, particularly when the time

work with the log-likelihood function because the

series comes from a process with parameters that

latter is essentially quadratic in the parameters.

are close to the edge of the admissible region.

Here, the log-likelihood is

The next subsection shows, however, that the bias

T of ML estimates is less than that of Yule“Walker

l(± p , σZ |xT ) = ’ ln(2π ) + ln σZ

2

2 estimates.

S(± p )

1

+ ln |M p | ’ .

2

2 2σZ 12.2.5 Example: Maximum Likelihood Es-

timates. The MLEs corresponding to those

The constant ln(2π ) is irrelevant to the derivation

displayed in [12.2.3] are

of MLEs, so the log-likelihood function is usually

(0.9, ’0.8) (0.3, 0.3)

given as

±1 0.871 0.260

T ln σZ ln |M p | S(± p )

2

±2 ’0.785

l(± p , σZ |xT ) = ’ + ’ . 0.322

2

2 2 2σZ σZ

2 0.967 1.103

(12.9)

Because samples are large, these estimates appear

Maximum likelihood estimates are found by to be only slightly different from the Yule“Walter

setting the partial derivatives of (12.9) to zero. estimates.

Differentiating, we obtain However, MLEs are more than worth the effort

when samples are small. To illustrate, we repeated

S(± p )

‚l T

=’ + (12.10) the Monte Carlo experiment described in [12.2.3],

‚σZ σZ σZ3

making ML estimates instead of Yule“Walker

p

‚l 1 estimates.

= Mk + D1,k+1 ’ 2 ± j D j+1,k+1

‚±k σZ j=1 The mean of 100

for k = 1, . . . , p ML parameter estimates

(12.11)

(0.9, ’0.8) (0.3, 0.3)

T

where Di j is the sum

(0.83, ’0.73) (0.29, 0.16)

15

Di j = xi x j + · · · + xT +1’ j xT +1’i (0.88, ’0.78) (0.30, 0.27)

60

(0.90, ’0.80) (0.30, 0.29)

240

and Mk is the partial derivative

Comparing the results in the above table with

‚ ln |M p |

Mk = . (12.12) those in [12.2.3], we see that the negative bias

2‚±k of Yule“Walker estimates is reduced in all cases.

Equations (12.10) and (12.11) are not generally The reduction in bias is particularly dramatic when

used to compute maximum likelihood estimates samples are very small. In this case, the reduction

of the AR parameters because partial derivative of bias does not come at the cost of increased

(12.12) is dif¬cult to evaluate. variability. The ML estimates have variance that is

Instead, maximum likelihood estimates (MLEs) comparable to that of the Yule“Walker estimates.

are obtained by using nonlinear numerical Again, be aware that these results do not fully

minimization techniques to ¬nd the minimum of re¬‚ect the practical properties of the ML estimator

’2l(± p , σZ |xT ). Ingenious methods for evaluating because we used prior knowledge to choose the

the log-likelihood method have been developed order of AR process to ¬t.

12.2: Identifying and Fitting Auto-regressive Models 259

12.2.6 Uncertainty of Maximum Likelihood

1.0

Parameter Estimates. Software that computes

MLEs also usually provides an estimate of

their uncertainty. These uncertainty estimates

0.5

are obtained through the use of large sample

theory that approximates distributions of the

0.0

AR parameter estimates (see, e.g., Box and

Jenkins [60, Appendix A7.5]). The ¬nal result, an

estimate of the variance-covariance matrix of the

-0.5

MLEs, is

1 σZ2 ’1

= R

-1.0

Σ± (12.13)

ˆ

T c(0)

p 0.0 0.5 1.0 1.5 2.0

where σZ2 is an estimate of the variance of the noise

process,4 c(0) is the sample variance (12.2) of the Figure 12.4: Approximate 95% con¬dence regions

time series, R is the p— p matrix that has r (|i ’ j|) for (±1 , ±2 ) computed from samples of length 240

as its (i, j)th element, and r („ ) is the estimated generated from AR(2) processes with (±1 , ±2 ) =

(0.9, ’0.8) (solid ellipse) and (±1 , ±2 ) =

auto-correlation function (12.1).

(0.3, 0.3) (dashed ellipse). The triangle depicts

12.2.7 Example: Uncertainty of MLEs. We the right half of the admissible region for the

used (12.13) to estimate the standard errors and parameters of a stationary AR(2) process.

correlation of the ML parameter estimates given

in [12.2.5]. We obtained

con¬dence intervals are approximately correct in

’0.488 this case, since the samples are fairly large and

1

Σ± = 0.040

ˆ2

’0.488 both ellipsoids lie well within the admissible

1