and a depends only upon Z1 , we see that SSE is

where U is an n — n matrix with each entry equal independent of a.

to 1/n. The total sum of squares is given by

SST = YT (I ’ U)Y. (8.30) 8.4 Multiple Regression

8.3.20 Distributional Results. Here we brie¬‚y The simple linear regression model we have

demonstrate how Properties 1“5 stated in [8.3.5] examined up to this point, while enormously useful

are obtained and provide a geometrical inter- in climatology and meteorology, has severely

pretation of the concept of degrees of freedom. limited ¬‚exibility. Many methods, such as the

These ideas generalize easily to include regression MOS (model output statistics) and perfect prog

models that contain more than one factor. statistical forecast improvement procedures (see,

Now suppose again that the errors Ei are iid for example, Klein and Glahn [226], Klein [224],

normally distributed with mean zero. Then Y has Klein and Bloom [225], Brunet, Verret, and

a multivariate normal distribution with mean X a Yacowar [71]), require the use of regression

and covariance matrix σE I. It follows that a is

2

models with more than one explanatory factor.

normally distributed with mean a and covariance The working example we develop as we

matrix σE (X TX )’1 (see Section 2.8).

2

progress through the section is the cloud param-

Next we demonstrate that SSE/σE is indepen-

2 eterization example introduced in [8.1.4].

dent of a and distributed χ 2 (n ’ 2). Let k1 and k2

8.4.1 The Multiple Regression Model. A

be orthonormal vectors spanning the column space

of the design matrix X . Choose k3 , . . . , kn so that

multiple linear regression model expresses a

k1 , k2 , . . . , kn form a complete orthonormal basis

response variable as an error term plus a mean

for Rn . Let Z = KT Y where K is the n — n that is conditional upon several factors. Suppose

we observe a response variable Y and k factors

matrix that has ki as its ith column. Then, since

denoted by X1 , . . . , Xk that are thought to

KT K = KKT = I, Y = KZ. Now substituting for

affect the expected value of Y. These random

Y in expression (8.26), we have

variables are all observed n times. The result

SSE = (Y ’ X a)T (Y ’ X a)

is a sample of n (k + 1)-tuples represented

= (KZ ’ X a)T (KZ ’ X a) by random variables (Yi , X1,i , . . . , Xk,i ) whose

= (Z ’ K X a) (Z ’ K X a).

T T T

actual observed, or realized, values are represented

by (yi , x1,i , . . . , xk,i ), for i = 1, . . . , n. The

Because the ¬rst two columns of K span the

multivariate version of (8.10) is given by

columns of X , we have that KT X is of the form

k

X— Yi = a0 + al xli + Ei ,

KX= ,

T 1 (8.31)

X— l=1

2

8.4: Multiple Regression 161

where the Ei , for i = 1, . . . , n, are iid random 0.233. The coef¬cient of multiple determination,

R 2 , is equal to 0.938, indicating that „ and Ac

variables with mean zero. We usually assume that

these errors are normally distributed. jointly represent about 94% of the variability in

ln „ in the data set. The total variability in the 45

This model states that the mean of Y,

ln „ values of the Landsat data set is partitioned by

conditional upon the realized values of the factors

X j , can be expressed as a linear combination of the the ¬tted model as follows:

factors. Thus the model is linear in its parameters.

However, the factors themselves can be nonlinear Source Sum of Sq. df

functions of other variables. For example, the

Regression 34.705 2

model speci¬es a polynomial of order k in X if

Error 2.287 42

Xli = (Xi )l .

Total 36.992 44

The model we will ¬t to the Landsat data (cf.

[8.1.4]) has the form

The methods of [8.3.20] can be used to prove the

ln „ = a0 + a1 ln(„ ) + a2 Ac + E. (8.32)

following properties, which form the basis of the

The ln(„ ) term is used to account for the inference procedures used in multiple regression:

curvilinear relationship between „ and ln „ that is

1 a is an unbiased estimate of a,

apparent in Figure 8.3 (left). See also [8.6.2].

SSE

2 σE = is an unbiased estimate of σE ,

2 2

8.4.2 Matrix-vector Representation of the Mul- d fE

tiple Linear Regression Model. The develop-

3 a ∼ N (a, σE (X TX )’1 ).

ment of least squares estimators and inferential 2

methods for multiple regression parallels that for

the simple linear regression model once the model

4 a is independent of SSE.

has been expressed in matrix-vector form.

As in [8.3.19], let Y represent the n-dimensional

5 SSE/σE ∼ χ 2 (d fE ).

2

random vector whose ith element is Yi . De¬ne E

similarly. Let the design matrix X be the n—(k+1)

matrix given by

8.4.3 Multiple Regression Model Without an

«

1 x1,1 . . . xk,1 Intercept. Sometimes it may be desirable to

¬ 1 x1,2 . . . xk,2 · force the ¬tted regression surface to pass through

¬ ·

X =¬ . . ·.

. the origin. In this case coef¬cient a0 in (8.31) is set

. . .

. . .

to zero and the column of 1s in the design matrix

1 x1,n . . . xk,n

is deleted. The least squares estimator is computed

Let a be the (k + 1)-dimensional vector consisting as before by substituting the modi¬ed design

of model parameters a0 , a1 , . . . , ak . With this matrix into (8.27). The variance components are

notation, the matrix-vector representation of (8.31) computed using

is identical to that of the simple linear regression

T ’1 T

case given in (8.25), where we have Y = X a + E. SSR = Y (X (X X ) X )Y

T

SSE = YT (I ’ X (X TX )’1 X T )Y

The least squares estimator of a and the variance

components SST , SSR, and SSE are computed SST = YT Y.

as in (8.27)“(8.30).

The degrees of freedom for the variance

components are as follows: The corresponding degrees of freedom are

Source Sum of Sq. df Source Sum of Sq. df

SSR d fR = k SSR d fR = k

Regression Regression

SSE d fE = n ’ k ’ 1 SSE d fE = n ’ k

Error Error

SST d fT = n ’ 1 SST =n

Total Total d fT

When model (8.32) is ¬tted to the Landsat data In particular, notice that there is one additional

described in [8.1.4], we obtain parameter estimates degree of freedom for error because it was not

a 0 = ’0.747, a 1 = 0.794, a 2 = 1.039 and σ E = necessary to ¬t the intercept parameter.

8: Regression

162

Parameter 95% Con¬dence Interval

8.4.4 A Con¬dence Interval for the Mean of

the Response Variable. Let X represent the (k + (’0.936, ’0.557)

a0

1)-dimensional vector X = (1, X1 , . . . , Xk )T . The (0.661, 0.927)

a1

rows of the design matrix can be thought of as a (0.735, 1.343)

a2

collection of n realizations of X. From (8.31) we