Y a 0 x1 a0 x1 x1 Y = a0 + a1 x + E.

where E = E/a0 . Least squares estimators Suitable variance stabilizing transforms are found

can then be obtained for the ratios a1 /a0 by physical reasoning and by plotting residuals

and a2 /a0 . The particular form that is chosen against the independent variables.

depends upon whether or not x1 can become

zero, and whether E/x1 better satis¬es the 8.6.3 Nonlinear Regression. Many of the ideas

distributional assumptions needed to make discussed in this chapter can be extended to

statistical inferences about the model than E the ¬tting and analysis of intrinsically nonlinear

itself. models such as (8.39, 8.40) provided it is possible

to assume that errors are iid and normally

• Many models can be made linear in

distributed. Then a reasonable nonlinear regression

their parameters through a combination of

model for the conditional mean of the response

transformations. For example, a model of the

variable has the form

form

Yi = h(x1,i , . . . , xk,i |a1 , . . . , a p ) + Ei .

1

Y=

1 + a0 xa1 E That is, the conditional mean of the response

1

variable is a function h(·|·) of k factors that is

known up to the value of p coef¬cients. Function

can be re-expressed as

h is nonlinear in at least some of the unknown

coef¬cients. Parameters are estimated by using

1

’ 1 = a0 + a1 ln x1 + E .

ln function minimization techniques (such as the

Y

method of steepest descent, see [322]) to minimize

Some models are intrinsically nonlinear and can the sum of squared errors

not be re-expressed in a way that is linear in the n

parameters. For example, Xu and Randall [434] SSE = (yi ’ h(x1,i , . . . , xk,i |a1 , . . . , a p ))2 .

propose the following parameterization for the i=1

fraction C S of the sky in a GCM grid box that is

Approximate inferences are possible by linearizing

covered by stratiform clouds:

h about a. See Bates and Watts [35] or Draper and

’± qe

¯

p

C S = r H (a ’ e ), (8.39) Smith [104] for more details.

This Page Intentionally Left Blank

9 Analysis of Variance

9.1 Introduction tion, at least some of the treatment combinations

are applied more than once.

In this chapter we describe some methods that

can be used to diagnose qualitative relationships 9.1.2 Experimental Designs in Climatology.

between a quantitative response variable, that is, The experimental units are simulations in designed

a variable measured on a continuous scale, and experiments conducted with General Circulation

one or more factors that are classi¬ed, perhaps Models. Treatments applied to the simulations

according to level, or perhaps only according to could be various combinations of parameteriza-

their presence or absence. tions of sub-grid scale processes, parameter values

Our purpose is to introduce only some of for a given set of parameterizations (as in Gough

the concepts of experimental design and analysis and Welch [145]), conditions imposed at the top

of variance (ANOVA). We illustrate the general of the atmosphere (e.g., a rigid lid as opposed

patterns of analysis and thought with these to a sponge layer) or at the lower boundary

methods using a couple of examples from the (e.g., to examine the model™s systematic response

climate literature. Our coverage of the subject is to an imposed sea-surface temperature anomaly

necessarily far from complete. A more complete such as the standard Rasmusson and Carpenter

treatment of the topic can be found in Box, Hunter, El-Ni˜ o anomaly [330], as in Boer [51]), vertical

n

and Hunter [59]. Cochran and Cox [87] provide resolutions for a model, and so on.

a classical treatment. Anderson and McLean [13] Unfortunately, developers of GCMs have not

provide a good description of ANOVA for non- generally relied upon designed experiments to

specialists. differentiate objectively between treatments be-

cause GCM experimentation is quite expensive.

9.1.1 Terminology and Purpose of Experimen- However, developers of models that are cheaper to

tal Design. The classical setting for ANOVA and run (such as basin scale ocean models and sea-ice

experimental design methods is agricultural exper- models) have started to study their models objec-

iments, so much of the associated terminology has tively through the use of designed experiments.

Gough and Welch [145], Chapman et al. [79], and

its roots in agriculture.

For example, a typical agricultural experiment Bowman, Sacks, and Chang [58] are examples.

might be designed to determine the effect of The Gough and Welch example is discussed in

two factors, say, fertilizer (applied at one of Section 9.5.

three different levels) and tillage (the land is

either tilled, or not tilled before seeding) on 9.1.3 Isolating External Sources of Variability.

crop yield. The experiment might be conducted A de¬ciency of the completely randomized design

as a factorial experiment in which each possible is that variation in the response variable is induced

treatment combination is applied to a separate plot both by the treatments and by variations between

of land according to an experimental design. experimental units. In agricultural experiments,

The simplest experimental design is a com- variations might occur because the fertility is

pletely randomized design in which treatment not uniform from one plot to the next. In GCM

combinations are randomly assigned to plots of experiments, simulations might be conducted

land (or more generally, experimental units: any- with different computers, which, owing to the

thing to which treatments are applied). In experi- peculiarities of a particular machine, leads to small

ments without replication, each treatment combi- differences amongst simulated climates. In the

nation is applied exactly once. Thus in the simple language of statisticians, the treatment effects are

agricultural example introduced here, six plots of confounded with the plot effects in the completely

land would be used. In experiments with replica- randomized design.

171

9: Analysis of Variance

172

The ability to detect treatment effects can be en- ANOVA are regressions in which the factors on

hanced if experimental designs are constructed that the right hand side of the equation are indicator

reduce or eliminate external sources of variation. variables. The choice of model is not very ¬‚exible

One such design is the randomized complete block because the indicator variables are used to identify

design. In our pedagogical agricultural example, the speci¬c treatment and block combination

we could split each plot into six sub-plots, then that resulted in each realization of the response

randomly assign treatments to sub-plots with the variable. Some terms in ANOVA models may be of

constraint that every treatment combination ap- little direct interest to the analyst because they are

pears once within every plot. Presumably fertil- only present to account for the variation, such as

ity is relatively uniform within each plot, so all between block variation, that the experiment was

responses within a plot are subject to the same designed to isolate from the effects of interest.

variations induced by differences in plot fertility. Perhaps because of the limited ¬‚exibility in

An extra factor, the block (or plot) effect, is the choice of model, the estimated values of

effectively introduced into the experiment. When model coef¬cients are generally of less interest

the results of the experiment are subsequently than the partitioning of variability according to its

analysed using the methods of ANOVA, we will be source and determining which sources contribute

able to isolate variation in the data induced by the signi¬cantly to the variation in the data obtained

blocks from variation induced by the treatments, from the experiment. The examples discussed

and therefore make better inferences about the in this chapter show that this is also largely

effect of the treatments. true in climatological applications of ANOVA

methodology. The model coef¬cients or, at least,

the relationships between model coef¬cients, are

9.1.4 Randomized Complete Block Climate

only of interest after it has been determined that

Experiments. Designed climate experiments,

a factor has a signi¬cant effect on the response

because of their huge cost, might have to be run

variable. The speci¬c value of the coef¬cient is

on several computers, perhaps not all of the same

irrelevant in many problems because the factor

type. Different types of machines have different

level may not have been measured quantitatively.

schemes for representing real numbers, slightly

Even when the levels are known, values of the

different implementations of intrinsic functions,

response variable might only be available for a

different numerical precisions, etc., resulting in

few levels of a factor, making it inappropriate

simulated climates that are slightly, but sometimes

to attempt to diagnose systematic relationships

detectably, different.

between the factor and the mean of the response

However, complete block experiments may

variable.

not be feasible as there may not be suf¬cient

computing resources available on a given machine

to replicate every treatment combination. It 9.1.6 Applications to Climatology. In the past,

may therefore be necessary to use another it was relatively uncommon to apply ANOVA to

design, such as a fractional factorial design (see climatological and meteorological problems. This