second is that the spatial coherence of the studied

noise.

¬elds also leads to ¬elds of decisions that are

Before using statistical tests, we must account

spatially coherent: if the difference between two

for several methodical considerations (see Chap-

mean 500 hPa height ¬elds is large at a particular

ter 6). Straightforward statistical assessments that

point, it is also likely to be large at neighbouring

compare the mean states of two simulated climates

points because of the spatial continuity of 500 hPa

generally use simple statistical tests that are per-

height. A decision made at one location is

formed locally at grid points. More complex ¬eld

generally not statistically independent of decisions

tests, often called ¬eld signi¬cance tests in the

made at other locations. This makes regions of

climate literature, are used less frequently.

signi¬cant change dif¬cult to identify. Methods

Grid point tests, while popular because of their

that can be used to assess the ¬eld signi¬cance of

simplicity, may have interpretation problems. The

a ¬eld of reject/retain decisions are discussed in

result of a set of statistical tests, one conducted at

Section 6.8. Local, or univariate, signi¬cance tests

each grid point, is a ¬eld of decisions denoting

are discussed in Sections 6.6 and 6.7.

where differences are, and are not, statistically

Another approach to the comparison of ob-

signi¬cant. However, statistical tests cannot be

served and simulated mean ¬elds involves the use

conducted with absolute certainty. Rather, they are

of classical multivariate statistical tests (Sections

conducted in such a way that there is an a priori

6.6 and 6.7). The word multivariate is used some-

˜

speci¬ed risk 1’ p of rejecting the null hypothesis:

what differently in the statistical lexicon than it

˜no difference™ when it is true.13

is in climatology: it describes tests and other in-

The speci¬ed risk (1 ’ p) — 100% is often

˜

ference procedures that operate on vector objects,

referred to as the signi¬cance level of the test.14

such as the difference between two mean ¬elds,

A consequence of setting the risk of false

rather than scalar objects, such as a difference of

rejection to 1 ’ p, 0 < p < 1, is that we

˜ ˜

means at a grid point. Thus a multivariate test is a

can expect approximately (1 ’ p) — 100% of

˜

¬eld signi¬cance test; it is used to make a single

the decisions to be reject decisions when the

inference about a ¬eld of differences between the

null hypothesis is valid. However, many ¬elds of

observed and simulated climate.

interest in climate experiments exhibit substantial

Classical multivariate inference methods can

not generally be applied directly to difference of

13 The standard, rather mundane statistical nomenclature for

means or variance problems in climatology. These

this kind of error is Type I error; failure to reject the null

hypothesis when it is false is termed a Type II error. Specifying methods are usually unable to cope with ¬elds

a smaller risk reduces the chance of making a Type I error but

under study, such as seasonal geopotential means,

also reduces the sensitivity of the test and hence increases the

that are generally ˜observed™ at numbers of grid

likelihood of a Type II error. More or less standard practice is

to set the risk of a Type I error to (1 ’ p) — 100% = 5% in

˜ points one to three orders of magnitude greater

tests of the mean and to (1 ’ p) — 100% = 10% in tests of

˜ than the number of realizations available.15

variability. A higher level of risk is usually felt to be acceptable

15 A typical climate model validation problem involves the

in variance tests because they are generally less powerful than

tests concerning the mean state. The reasons for specifying the comparison of simulated monthly mean ¬elds obtained from

risk in the form 1 ’ p, where p is a large probability near 1, will

˜ ˜ a 5“100 year simulation, with corresponding observed mean

become apparent later. ¬elds from a 20“50 year climatology. Such a problem therefore

uses a combined total of n = 25 to 150 realizations of mean

14 There is some ambiguity in the climate literature about

how to specify a ˜signi¬cance level.™ Many climatologists use January 500 hPa height, for example. On the other hand, the

the expression ˜signi¬cant at the 95% level,™ although standard horizontal resolution of typical present day climate models is

statistical convention is to use the expression ˜signi¬cant at the such that these mean ¬elds are represented on global grids with

m = 2000 to 8000 points. Except on relatively small regional

5% level.™ With the latter convention, which we use throughout

this book, rejection at the 1% signi¬cance level indicates the scales, the dimension of (or number of points in) the difference

presence of stronger evidence against the null hypothesis than ¬eld is greater than the combined number of realizations from

rejection at the 10% signi¬cance level. the simulated and observed climates.

1.2: Some Typical Problems and Concepts 15

One solution to this dif¬culty is to reduce the information about the mean state contained in

dimension of the observed and simulated ¬elds to the observed and simulated realizations. Larger

less than the number of realizations before using samples have greater information content and

any inference procedure. This can be done using consequently result in more powerful tests. Thus,

pattern analysis techniques, such as EOF analysis, even though a 5 m difference at midlatitudes may

that try to identify the climate™s principal modes not be physically important, it will be found to

of variation empirically. Another solution is to be signi¬cant given large enough simulated and

abandon classical inference techniques and replace observed climatologies. The statistical strength of

them with ad hoc methods, such as the ˜PPP™ test the signal (or model error) may be quanti¬ed by

(Preisendorfer and Barnett [320]). a parameter called the level of recurrence, which

Both grid point and ¬eld signi¬cance tests are is the probability that the signal™s signature will

plagued with at least two other problems that not be masked by the noise in another identical

result in interpretation dif¬culties. The ¬rst of but statistically independent run with the GCM

these is that the word signi¬cance does not have (Sections 6.9“6.10).

a speci¬c physical interpretation. The statistical The second problem is that objective statis-

signi¬cance of the difference between a simulated tical validation techniques are more honest than

and observed climate depends upon both location modellers would like them to be. GCMs and

and sample size. Location is a factor that affects analysis systems have various biases that ensure

interpretation because variability is not uniform that objective tests of their differences will reject

in space. A 5 m difference between an observed the null hypothesis of no difference with certainty,

and a simulated mean January 500 hPa height given large enough samples. Modellers seem to

¬eld may be statistically very signi¬cant in the have an intuitive grasp of the size and spatial

tropics, but such a difference is not likely to structure of biases and seem to be able to discount

be statistically, or physically, signi¬cant at mid- their effects when making climate comparisons. If

latitudes where interannual variability is large. these biases can be quanti¬ed, statistical inference

Sample size is a factor because the sensitivity procedures can be adjusted to account for them

of statistical tests is affected by the amount of (see Chapter 6).

This Page Intentionally Left Blank

Part I

Fundamentals

This Page Intentionally Left Blank

2 Probability Theory

2.1 Introduction are only able to describe compound events, such as

the outcomes that the daily rainfall is more, or less,

2.1.1 The General Idea. The basic ideas behind than a threshold of, say, 0.1 inch. While we are

probability theory are as simple as those associated able to describe these compound events in terms

with making lists”the prospect of computing of some of their characteristics, we do not know

probabilities or thinking in a ˜probabilistic™ enough about the atmosphere™s sample space or

manner should not be intimidating. the processes that produce precipitation to describe

Conceptually, the steps required to compute the precisely the proportion of the atmosphere™s

chance of any particular event are as follows. sample space that represents one of these two

compound events.

• De¬ne an experiment and construct an ex-

haustive description of its possible outcomes.

2.1.3 Relative Likelihood and Probability. In

• Determine the relative likelihood of each the coin tossing experiment we use the physical

outcome. characteristics of the coin to determine the relative

• Determine the probability of each outcome by likelihood of each outcome in S. The chance of a

comparing its likelihood with that of every head is the same as that of a tail on any toss, if we

have no reason to doubt the fairness of the coin, so

other possible outcome.

each of the eight outcomes is as likely to occur as

We demonstrate these steps with two simple any other.

examples. In the ¬rst we consider three tosses of The West Glacier rainfall outcomes are less

an honest coin. The second example deals with the obvious, as we do not have an explicit character-