and observed variables at speci¬c points on the

further change can be made to the model.

surface, such as precipitation and temperature. The

primary tool used is multiple linear regression.

8.5.5 All Subsets Regression. Another screen- The advantage of MOS over perfect prog is that

ing approach that has become feasible with in- it inherently corrects for forecast model biases in

creased computing power is all subsets regression. both the mean and variance. A disadvantage of

As the name suggests, the procedure ¬ts all 2k MOS is that the speci¬cation equations need to

possible subsets of factors to the response variable. adapt constantly to the changing characteristics of

The screening statistic C p the numerical forecast model and its associated

SSE {l1 ,...,l p } data assimilation systems.

C p{l1 ,...,l p } = ’ (n ’ 2 p) Perfect prog procedures (See Klein, Lewis, and

σE 2

Enger [227], Brunet et al. [71]) are similar to MOS

is computed for every model and a plot of points procedures except that the speci¬cation equations

( p, C p{l1 ,...,l p } ) is produced. Note that the error describe simultaneous relationships between the

variance estimate is generally obtained from the analysed (as opposed to forecast) free atmosphere

full model. A model that ¬ts well will have a and observed variables at speci¬c points on the

computed C p that lies close to the C p = p line. surface. The resulting speci¬cation equations are

This is therefore used as a guide for selecting more stable than the MOS equations because the

models that require more careful examination (see 8 Many other techniques, such as cluster analysis [163, 115],

[104] or [420] for details). multiple discriminant analysis [267] and classi¬cation and

Alternatively, Akaike™s information criterion regression trees [63] are also used. See, for example, Yacowar

(AIC) [6] could be used as the screening statistic. [435].

8: Regression

168

a non-diagonal covariance matrix. If there are

data used to ¬t the equations are less affected by

departures from the constant variance assumption

periodic model changes. However, perfect prog

(heteroscedasticity; see [8.3.13]), then although

speci¬cation equations do not account for forecast

ΣE may be diagonal, the elements on the diagonal

model biases. Statistical downscaling procedures

(see [97, 152, 252, 403]) that link regional and are not constant. In general, ordinary least squares

local aspects of simulated climate change are a estimates are less than optimal (they are no longer

maximum likelihood estimates) whenever ΣE =

variation of perfect prog.

σE I.

2

Screening regression is strongly affected by the

arti¬cial skill phenomenon discussed in [8.3.12] When ΣE is known, the optimality properties of

and also [18.4.7] (see, e.g., Ross [332] or Unger ordinary least squares estimators are restored by

[377]) because these methods select a model from solving the generalized normal equations. Instead

of minimizing (Y ’ X a)T (Y ’ X a), we choose a

a set of possible models that adapts most closely to

the data. Ross [332] citing Copas [91] and Miller to minimize

[278] points out that using the same sample to

(Y ’ X a)T Σ’1 (Y ’ X a). (8.38)

select the model and estimate its coef¬cients is E

˜over¬tting™ and can lead to models that perform The generalized least squares estimators are

very poorly on independent data. It may therefore therefore given by

be wise to use three data sets in conjunction with

a = (X T Σ’1 X )’1 X T Σ’1 Y.

screening techniques; one with which to identify E E

the model, one with which to estimate coef¬cients, Weighted regression is the special case in

and one for validation. which ΣE is diagonal. Then quadratic form (8.38)

Small data sets often make this strategy reduces to

impossible to use. An alternative method for

n k 2

estimating the skill of the model is cross- wi2 Yi ’ ,

al xl

validation, but Unger [377] demonstrates that i=1 l=1

cross-validation does not provide reliable skill

where weight wi is proportional to 1/σ Ei .

estimates because of the way in which it interacts

Weighted regression is an option to consider

with the screening methods. He proposes the

when errors are heteroscedastic, and transforma-

use of a method called bi-directional retroactive

tion of the response variable [8.6.2] does not result

real-time (BRRT) validation instead. The idea is

in a model with a reasonable physical interpre-

that a substantial subset of recent data is withheld.

tation. Note that in order to perform weighted

A screening technique is used to ¬t a model to

regression it is only necessary to know the relative

the earlier data (called the base data set). This

sizes of the error variances, not the variances

model is used to forecast the ¬rst observation in

themselves. Very good prior information about the

the withheld set. It is then added to the base data

relative variances may be available from sampling

set and the process is repeated, thereby collecting

or physical considerations.

a set of veri¬cation statistics of the same size as

the withheld data set. More veri¬cation data are

8.6.2 Transformations. Transformation of var-

collected by running the same process in reverse

iables can be used in several ways in regression

(hence the term ˜bi-directional™). Unger ¬nds that

analysis. First, many models that appear to be

BRRT gives reliable estimates of skill ˜when the

nonlinear in their parameters can easily be made

number of candidate predictors is low.™

linear.

• Multiplicative models, such as

8.6 Some Other Topics

a

Y = a0 xa1 xa2 x33 E,

12

8.6.1 Weighted Regression. The working

assumption to this point has been that the errors can be made linear by taking logarithms to

Ei are normally distributed, independent, and obtain

identically distributed. That is, the vector of

lnY = a0 + a1 ln x1 + a2 ln x2

errors E is jointly distributed N (0, σE I), where

2

I denotes the n — n identity matrix. We noted + a3 ln x3 + E .

in [8.3.16] that departures from the independence

assumption lead to dif¬culties. If the errors are Fitting can now proceed provided appropriate

not independent, E ∼ N (0, ΣE ), where ΣE is assumptions can be made about E = ln E.

8.6: Some Other Topics 169

• Reciprocal models, such as ¯

where r H is relative humidity, qe is the large-scale

condensate (cloud water plus ice) mixing ratio, and

1 ’γ

Y= , ± = ±0 (1 ’ r H )q — , (8.40)

a0 + a1 x1 + E

where q — is the water vapour mixing ratio.

can be made linear by inverting the dependent

Constants p, ±0 , and γ are scalar parameters

variable to obtain

that are estimated by ¬tting model (8.39, 8.40) to

the output from a high resolution cloud ensemble

1

= a0 + a1 x1 + E. model (CEM); see, for example, Xu and Krueger

Y

[433]. CEMs are used in the development of cloud

• Bilinear models, such as parameterizations because detailed observational

data on cloud ¬elds are scarce.

a0 x1 A second reason for using transformations in

Y= ,

a1 + a2 x2 + E regression is to change the model so that it

better satis¬es the assumptions necessary to make

can be made linear by cross-multiplying to inferences about the estimated parameters and

obtain about unobserved values of the dependent variable.

For example, the heteroscedasticity displayed in

x1 a1 a2

= + x2 + E , Figure 8.8 can be removed by ¬tting the model

Y a0 a0

Y 1

= a0 + a1 +E

or by inversion to obtain x(1 ’ x) 1’x

instead of

1 a1 1 a2 x2 E