m/10

2.5

n=40

n=40

2.0

1.5

1.0

n=10

n=10

0.5

n=1 n=1

120 E 180 120 W 60 W 0 60 E

0.0

-2 -1 0 1 2

Figure 6.5: Zonal distribution of the meridionally

averaged (30—¦ N“60—¦ N) eddy component of Jan-

Figure 6.4: The solid curves display the uary mean 500 hPa height in decametres. Shaded:

distribution of the mean of samples of size n = the observed univariate 95% con¬dence band at

1, 10, and 40 taken from a N (’0.5, 1) population. each longitude. Curves: 10 individual states simu-

The dashed curves show the same distributions for lated with a General Circulation Model [397].

the N (+0.5, 1) population. Note that the overlap

is very large when n = 1, and virtually nonexistent

when n = 40.

component of January mean 500 hPa height} and

we let Y be the corresponding random vector that

is simulated by the AGCM. The null hypothesis is

large portion of experimental states can occur

that X and Y have the same distributions. In the

under control conditions and vice versa. However,

absence of prior knowledge about the AGCM™s

as the sample size increases, the spread of

biases, we take the alternative hypothesis to be the

the density functions of the sample means

complement of the null and use the non-rejection

decreases, and eventually there is virtually no

region (95%) = {x : f (x) ≥ ±95% }. We ¬nd

overlap. Under these circumstances the control

that 6 of the 10 AGCM realizations y lie outside

and the experimental random variables can be

(95%), so we reject the null hypothesis that the

distinguished with almost perfect reliability. Thus,

model simulates the observed climate.

given a large enough sample, it will be possible

The 10 y curves are displayed in Figure 6.5

to state with con¬dence that the experimental and

together with the univariate 95% con¬dence

control random variables cluster around different

band (i.e., the univariate (˜ ) at each longitude;

p

means.

shaded). Some of the simulated ¬elds are fairly

Thus the likelihood of rejection of the null

realistic but most have severe distortions. We

hypothesis depends not only on the strength of the

return to this example in Section 7.1.

signal but also on the amount of available data. We

must therefore be careful to distinguish between

statistical and physical signi¬cance. We return to 6.2.7 Example: Sign Test. Suppose X1 , . . . ,

this point when we introduce recurrence analysis Xm are iid random variables that represent a

in Sections 6.9“6.10. sample from a population X, and that we want to

decide whether or not E(X) has a particular value

a. That is, we want to test

6.2.6 Example: AGCM Validation. One

application of statistical tests occurs in the H0 : E(X) = a. (6.3)

validation of the climate simulated by an

The following is a simple non-parametric solution

Atmospheric General Circulation Model (AGCM).

(see [4.2.2]).

The assessment is performed by comparing

Assume that X has a symmetrical distribution,

individual ¬elds y generated by the AGCM with

that is, that there exists a constant b such that

a statistical model X that is ¬tted to an ensemble

f (b ’ x) = f (b + x) for all x. Then (6.3) is

of ¬elds obtained from the observed climate.

equivalent to H0 : b = a.

In the following example (see [397])

Now consider the test statistic

the observed random vector of interest is

X = {meridionally averaged (30—¦ “60—¦ N) eddy n(X1 , . . . , Xm ) = number of X j ≥ a. (6.4)

6: The Statistical Test of a Hypothesis

104

The analysed observations contain more spatial

Since we have assumed independence, we can

variability than does Model A in n = 5 of nine DJF

think of N = n(X1 , . . . , Xm ) as the number of

seasons. Using (6.5) we ¬nd that the probability of

heads in m tosses of a coin where the probability

observing n ≥ 5 under H0 is (126 + 84 + 36 + 9 +

of a head on the jth toss is p j = P X j ≥ a .

When H0 is correct, p j = 0.5 and thus N has 1)(0.5)9 = 0.5. Thus we cannot conclude that the

the binomial distribution: N ∼ B(m, 0.5). If n is spatial variability of the DJF climate simulated by

the actual number of observations x j for which Model A is signi¬cantly different from that which

x j ≥ a, then the probability of observing N ≥ n is is observed. On the other hand, n = 8 for Model B,

and P (N ≥ 8) = (9 + 1)9 = 0.0195. Thus the null

given by

hypothesis can be rejected for Model B at about

m!

P (N ≥ n|H0 ) = 0.5m . the 2% signi¬cance level.

(6.5)

n≥n n!(m ’ n)! Not all of the assumptions required by the sign

test are satis¬ed in this example. The measure

We reject H0 when N is unusually large in the of spatial variability we used, (φ500 ’ φ500 )2

context of H0 , i.e., when P (N ≥ n|H0 ) is small where · denotes global average, is not likely to

(e.g., 5% or 1%). be exactly symmetrically distributed, although a

We illustrate the sign test with an example from Central Limit Theorem [2.7.5] type of argument

AMIP, the Atmospheric Model Intercomparison can be used to show that its distribution is close to

Project (see Gates [137]). the normal distribution. Also, the spatial variability

AMIP established a benchmark 10-year climate is not likely to be identically distributed in all years

simulation experiment that was performed by a since it is strongly affected by ENSO (see [1.2.3]).

large number of modelling groups. One feature Both of these departures from the assumptions

of these experiments is that the monthly mean will have some effect on the signi¬cance level and

SSTs and sea-ice extents observed between power of the test.

January 1979 and December 1988 were prescribed

as time varying lower boundary conditions.

6.2.8 Suf¬cient Statistics. The decisions in

Therefore, since AMIP simulations experience

the previous example [6.2.7] were made on the

the same ˜forcing™ at the lower boundary as the

basis of a statistic that is a function of the

real atmosphere, it is natural to compare the

pairs of variance differences, not the variances

variability in the AMIP simulations with that in

themselves. It is obvious that such reductions

observations.

of data are necessary, but how do statisticians

In particular, suppose that we want to test the

choose the statistic that results in the most effective

null hypothesis, H0 , that the spatial variability

test? In this example the hypothesis concerns the

of the December, January, February (DJF) mean

value of a parameter of the binomial distribution.

500 hPa height (φ500 ) that is simulated by model

The nine random variables that represent the

X is the same as that contained in the US

variance differences may be transformed into nine

National Meteorological Center (NMC) global

other random variables such that distribution of

φ500 analyses. The table below gives measures