with ( p, d fE ) df obtained from Appendix G.

Let us consider the problem of constructing F = a V V a ,

TT

(8.37)

σE 2

˜

a joint p con¬dence region for a subset of two

parameters, (a1 , a2 ), in our Landsat example.

which is distributed F(d fR , d fE ) under H0 .

Proceeding as above, we have

However, in this case there is an easier way. It can

® « ’1

be shown that (8.37) is also given by

00

010

V TV = ° (X TX )’1 1 0 »

SSR/d fR

001

F= ,

01

SSE/d fE

25.47 8.13

= .

which is easily computed as a byproduct of the

8.13 4.86

least squares ¬tting procedure. Large values of

Expanding (8.36), we ¬nd that the points in the F are evidence contrary to H0 , so the test is

joint p con¬dence region for (a1 , a2 ) satisfy

˜ conducted at the (1 ’ p) — 100% signi¬cance level

˜

by rejecting H0 when f > Fp , the p-quantile of

˜

˜

25.47(a 1 ’ a1 )2 + 2 — 8.13(a a ’ a1 )(a 2 ’ a2 )

F(d fR , d fE ).

+ 4.86(a 2 ’ a2 )2 < Fp σ E

2

We ¬nd f = 318.6 in our Landsat example, a

˜

value that is signi¬cant at much less than the 0.1%

˜

where Fp is the p-quantile of F(2, 42).

˜ level.

The 95% con¬dence region computed in this

way is displayed in Figure 8.13. The tilt of the

ellipse re¬‚ects the correlation between a 1 and a 2 . 8.4.9 Are all Parameters in a Subset Zero? We

The point estimate is shown in the middle of could answer this question as well by constructing

the ellipse. The dashed lines indicate the 95% a suitable kernel V T V and computing F as

con¬dence intervals for a1 and a2 computed with in (8.37). Again, there is an easier and more

(8.33). Note that the rectangular region de¬ned by intuitively appealing answer.

their intersection is substantially larger than the Consider the following possible approach for

testing H0 : al1 = · · · = al p = 0.

region enclosed by the ellipse.

8: Regression

164

• Fit the full regression model including the evidence of outliers, heteroscedasticity, and lack-

p factors Xl1 , . . . , Xl p . Denote the resulting of-¬t. For multiple regression, residuals should

be plotted against the estimated conditional mean

regression and sum of squared errors as

SSR F and SSEF , respectively, where the (i.e., the ¬tted model) and against the values

of individual factors. Bear in mind that outliers

subscript F indicates that these variance

(see [8.3.13]) will be more dif¬cult to detect

components were obtained by ¬tting the full

than in the case of simple linear regression.

model.

Use objective methods for detecting in¬‚uential

• Fit the restricted regression model speci¬ed observations (see [8.3.18]) if at all possible. Use

by the null hypothesis by excluding factors probability plots (see [8.3.14]) to detect departures

Xl1 , . . . , Xl p from the design matrix. Denote from the assumption of a normal distribution. The

the resulting regression sum of squares as general considerations of [8.3.15] apply, so we

SSR R . can proceed cautiously if the normal distribution

assumption is in doubt. When appropriate, use

• The increase in the regression sum of the Durbin“Watson statistic (8.24) or runs test

squares that is obtained by adding factors to check for dependence amongst the errors (see

Xl1 , . . . , Xl p to the restricted model is given [8.3.16]).

by SSR F ’ SSR R . Under H0 , [(SSR F ’

We now brie¬‚y examine the ¬t of model (8.32)

SSR R )/σE ] ∼ χ 2 ( p) and is independent of

2

to the Landsat data set described in [8.1.4].

SSEF . Thus, using property 5 of [8.4.2], we

Figure 8.14 shows studentized residuals plotted

obtain a test statistic

against ln „ (right). The left hand panel shows one

(SSR F ’ SSR R )/ p outlier with undue in¬‚uence on the ¬t. One effect

F=

SSEF /d fE F of this outlier, the extreme point in the lower left

corner of the right hand panel, is to shift the other

(SSR F ’ SSR R )/ p

= quantiles in the probability plot upwards, thereby

σE2

giving the impression that the upper tail of the

error distribution may be narrower than that of the

that is distributed F( p, d fE F ) under H0 . Here normal distribution.

d fE F is the degrees of freedom of the sum of

Figure 8.15 shows the same diagnostics for

squared errors for the full regression.

the ¬t that is obtained after removing the outlier

from the data set. The left hand panel shows

The test is conducted at the (1 ’ p) — 100%

˜

that there may still be one or two observations

signi¬cance level by rejecting H0 when f > Fp ,

˜

that need investigation. Other diagnostics also

the p-quantile of F( p, d fE ).

˜

indicate that these observations, corresponding to

the two largest remaining studentized residuals,

8.4.10 Diagnostics. We have two things in mind are somewhat more in¬‚uential than we might like.

when we think about the ¬t of the model. The The right hand panel shows improvement in the

¬rst is, how well does the model specify values distributional characteristics of the residuals after

of Y from the factors Xl ? The coef¬cient of removal of the outlier.

multiple determination R 2 = SSR (see [8.3.12])

SST Removing the single outlier results in fairly

gives a quick but somewhat optimistic answer.

large changes to the ¬tted model. There is little

Use cross-validation (see Section 18.5) if it is

change in the estimated intercept (the new value of

important to obtain a good estimate of future

a 0 is ’0.0748), but there are substantial changes

model performance [18.5.2].

in the coef¬cients of „ (a 1 = 0.866) and Ac

The second worry is whether or not inferences (a 2 = 0.866). Also, σ E is reduced to 0.208 and

are made reliably. Implicit in the discussion to R 2 increases slightly to 95.2%, a further indication

this point are the assumptions that the errors in that the ¬t is improved.7

(8.25) are iid normally distributed and that the

full model adequately represents the conditional 7 The outlying observation comes from a Landsat image

mean of Y. Therefore the diagnostic procedures identi¬ed as scene C4 by Barker et al. (see [18, Table 2]).

discussed in Section 8.3 should be applied to The image contains scattered cumulus clouds and appears to

have large mean optical depth relative to its fractional cloud

con¬rm that the distributional assumptions are as

coverage. However, the image was taken when the solar zenith

close to being satis¬ed as possible and that the angle was 68—¦ . Optical depth is dif¬cult to estimate accurately

inferences can be properly quali¬ed. Scatter plots in this scene because of the oblique trajectory of light incident

(see [8.3.13]) of residuals should be examined for on the clouds.

8.4: Multiple Regression 165

0.4

• •