A ˜rough-and-ready™ approach that can be used

when the samples are large is based on the

observation that d = 2(1 ’ ρ ˆ ˆ (1)), where ρ ˆ ˆ (1) 8.3.17 Are Least Squares Estimators Robust?

is the estimated lag-1 correlation coef¬cient of the To understand the in¬‚uence outliers have on least

residuals. An approximate test can therefore be squares estimates, think about the sample mean. A

√

performed by comparing ρ ˆ ˆ (1)/ n with critical positive outlier will increase the sample mean in

values from the standard normal distribution direct proportion to the size of the outlier. In fact,

(Appendix D). If the null hypothesis can not be there is no upper limit on the effect that can be

rejected with this test, then it will also not be induced on the sample mean by an outlier. On the

rejected with d. On the other hand, if H0 is rejected other hand, the effect of an outlier on the sample

with this test, Durbin and Watson™s approximation median is bounded; once the outlier becomes the

[108, 109] should be used to con¬rm that this largest observation in the sample it has no further

decision will stand when the details of the in¬‚uence on the median. Thus the sample median

independent variable (i.e., the values xi ) are taken and mean are examples of estimators that are

robust and not robust, respectively.

into account.

The value of the Durbin“Watson statistic in our Least squares estimators are not robust to the

SOI example is 2.057, which means that ρ ˆ ˆ (1) = effects of outlying observations. Other ¬tting

’0.0285. This value is not signi¬cantly different methods (see [8.3.18]), such as robust M-

from zero. estimation (see, e.g., [154]) can be used, but at

Another approach to testing for serial corre- the expense of computer time (perhaps not such

lation in the residuals is to perform a runs test an issue these days), some loss of the rich body

(see, e.g., Draper and Smith [104] or Lehmann of inferential methods available for least squares

and D™Abrera [249]) to determine whether the estimators, and some loss of ef¬ciency when errors

residuals change sign less frequently (i.e., there are actually iid normally distributed.

8.3: Fitting and Diagnosing Simple Regression Models 159

8.3.18 In¬‚uence and Leverage: the Effects of solving equations of the form

Outliers. In regression analysis, the effect of an n

outlying realization of Y is also in¬‚uenced by the (ei ) = 0

value of X. One can think of the regression line as i=1

a bar balanced on a pivot point at (x, y). An outlier n

(ei )xi = 0,

directly above (or below) the pivot point pulls the

bar up (or down) and has a relatively small effect i=1

on the ¬tted conditional mean. An outlier near the

where (·) is a function that preserves the sign of

end of the bar has a very large in¬‚uence on the

its argument but limits its magnitude. For example,

¬tted line.

Huber [190] uses

Suppose an outlying point (x, y) is located ±

’c, t <c

above the ¬tted line and that the line passes

t, |t| ¤ c

(t) =

through (x, y). Then a physical analogy for the

c, t > c.

outlier™s effect is that it exerts an upwards force of

(y ’ y) 2 units on the line at a distance x ’ x units

from the pivot point of the bar. The farther from 8.3.19 Matrix-vector Formulation of Least

the pivot point, the greater the ability of the outlier Squares Estimators. We have formulated the

to affect the ¬t, that is, the greater its ability to use least squares estimators for simple linear regres-

the line as a lever. Hence the term leverage. sion by basic brute force, but it is easier to form

We can now understand why the relatively small estimators and derive distributional results for

outlier in Figure 8.10 at x = 0.5 is easy to detect multiple linear regression problems when matrix-

while the relatively large outlier at x = 0.95 is not. vector notation is used.

The outlier at x = 0.5 exerts little in¬‚uence on the Let Y denote the n-dimensional random vector

whose ith element is Yi . Let X be the n — 2 matrix

¬tted line. Thus the line has little opportunity to

˜adapt™ to this outlier, leaving the outlier plainly that has units in the ¬rst column and xi as the ith

visible above the ¬tted line. The large outlier at element of the second column. That is,

x = 0.95 has much greater in¬‚uence on the ¬tted «

1 x1

line, which ˜adapts™ well to this outlier, hiding its

¬ 1 x2 ·

¬ ·

presence.

X = ¬ . . ·.

. .

..

Statisticians have devised a number of sophisti-

1 xn

cated techniques for estimating the in¬‚uence of an

individual observation. Without going into detail,

Matrix X is called the design matrix. Let E

the idea behind these methods is that the in¬‚uence

denote the n-dimensional random vector whose ith

of an individual observation can be estimated by

element is Ei , and let a be the two-dimensional

¬tting the model with, and without, that obser-

vector whose elements are a0 and a1 . Then the

vation. The change in the ¬t, measured in some

matrix-vector representation of (8.10) is

objective manner, determines the in¬‚uence of that

observation. See [41], [78], and [90] for details and

Y = X a + E. (8.25)

methods.

Bounded in¬‚uence regression (M-estimation, The least squares estimates are obtained by

see [154])”of which median absolute deviation choosing a so that the squared length of E, given

regression is a special case”has become a by

popular way to protect against the effects of

SSE = ET E = (Y ’ X a)T (Y ’ X a), (8.26)

in¬‚uential outliers. Such techniques are now

generally available in statistical packages and

is minimized. Differentiating with respect to a

subroutine libraries. Two kinds of action are taken

(see, e.g., [148, pp. 350“360]), we obtain the

to control the effects of outliers. First, the errors

normal equations

ei (8.12) are weighted (see Section 8.6) so that

observations corresponding to outlying values of

2X T (Y ’ X a) = 0,

the factor X receive less weight. Second, rather

than substituting the weighted errors into normal where 0 is a two-dimensional vector of zeros. The

equations (8.14) and (8.15) to obtain parameter solutions of the normal equations are given by

estimators, bounded errors are substituted into the

a = (X TX )’1 X T Y.

equations. That is, the M-estimates are obtained by (8.27)

8: Regression

160

where X — is a nonzero 2 — 2 matrix and X — is the

Some simple algebra reveals that estimator (8.27) 1 2

(n ’ 2) — 2 matrix of zeros. Therefore SSE is of

is identical to estimators (8.16) and (8.17) derived

previously. the form

The sums of squares appearing in (8.19) are

SSE = (Z1 ’ X 1 a)T (Z1 ’ X 1 a) + ZT Z2 ,

2

also easily re-expressed in matrix-vector form.

Substituting (8.27) into (8.26) we obtain where Z1 consists of the ¬rst two elements of Z

(8.28) and Z2 consists of the remaining (n ’ 2) elements.

SSE = YT (I ’ X (X TX )’1 X T )Y,

Upon minimization we see that

where I denotes the n — n identity matrix. By n

noting that SSE = Z2 Z2 = Zi .

T

«

i=3

1/n

¬ 1/n ·

T¬ · Now from the matrix-vector form of the regression

y=Y ¬ . ·

. model we see that the elements of Z are

.

independent and have common variance σE (the2

1/n

covariance matrix of both Y and KY is σE I). 2

we obtain that the sum of squares due to Therefore SSE/σ 2 is χ 2 (n ’ 2) distributed. Note

n

regression, given by i=1 (µY|X=xi ’ y)2 , can be that n ’ 2 is the E dimension of the sub-space not

expressed as spanned by the columns of the design matrix.

T ’1 T

(8.29) Moreover, because SSE depends only upon Z2

SSR = Y (X (X X ) X ’ U)Y,