that obtained from the 1000 trial sample.

The observed set of 51 storms is distributed

on the eight classes as follows: f1,...,8 = 6.4.2 More on the Role of Statistical Inference.

3, 9, 16, 6, 3, 4, 2, 8, which results in s = 19. The Mexican Hat is a pretty obvious example”

The corresponding critical value is κ(5%) = 14 but there are many similar examples in climate

(derived from 100 000 trials; see the distribution research journals. There are even instances

function FS in Figure 6.6). Hence we reject the null in which peer reviewers have requested that

hypothesis that the occurrence of tropical cyclones authors perform statistical tests as outlined

in the Southwest Paci¬c is independent of the above. One example concerns the Labitzke

and van Loon hypothesis [238] about the

phase of the MJO.

relationship between the 11-year solar cycle and

the atmospheric circulation in the stratosphere and

6.4 On Establishing Statistical the troposphere.3 They found, using about 30

Signi¬cance years of data, that the North Pole winter mean

30 hPa temperature is only weakly correlated

6.4.1 Independence of the Null Hypothesis.

3 The original draft of [238] did not contain statistical infer-

A rock formation called the Mexican Hat

ences about the relationship between atmospheric circulation

(Figure 6.7), near the border between Arizona and and solar activity. However, reviewers of that article demanded

Utah, consists of a very large boulder perched a statistical test even though there are really only two ways

precariously on a rocky outcrop. It is instructive to to verify the Labitzke and van Loon hypothesis. These are a)

think brie¬‚y about whether we can use statistical develop a physical hypothesis a few decades so that additional

that can be veri¬ed by numerical

experimentation, and b) wait

methods to test the null hypothesis that this independent data can be collected for a con¬rmatory statistical

rock formation has natural origins. To gather test of the hypothesis (cf. [4.1.2]).

6.4: On Establishing Statistical Signi¬cance 107

Figure 6.8: Creation of the Mexican Hat: Null hypothesis correctly rejected!

with solar activity. The observed correlation was 300

SOLARFLUX 10.7cm

Independent -54

0.14 (Figure 6.9, top). The apparent strength of data

°C

250 -58

the relationship was much stronger when the -62

200

data were strati¬ed according to the phase of -66

150

the Quasi-Biennial Oscillation (QBO; Veryard and -70

100

-74

Ebdon [382], Dunkerton [106]): A high positive 70

-78

correlation of 0.76 was obtained for the winters in 300

SOLARFLUX 10.7cm

-54

WEST

which the QBO was in its west phase (Figure 6.9, °C

-58

250

middle), and a negative correlation of ’0.45 when -62

200

the QBO was in its east phase (Figure 6.9, bottom). -66

150

-70

The similarity of the middle and bottom curves 100 -74

in Figure 6.9 is certainly as remarkable as the 70

-78

Mexican Hat. 300

SOLARFLUX 10.7cm

-54

EAST

°C

250 -58

-62

200

6.4.3 What if Con¬rmatory Analysis is not -66

150

-70

Possible? Although it is frequently not possible 100

-74

to make con¬rmatory statistical inferences once 70

-78

an exploratory analysis has suggested questions, 1956 1960 1970 1980 1990 TIME [year]

aa ds4

methods of statistical inference, such as testing,

are valuable. They serve to underline the unusual

Figure 6.9: Time series of January/February mean

quantitatively and thus help us to focus on unusual

solar activity (solid curve) and 30 hPa temperature

aspects of the data. But the statistical test can not

at the North Pole (broken curve). Top: all winters.

be viewed as an objective and unbiased judge of

Middle: winters when the QBO is in its west phase.

the null hypothesis under these circumstances.

Bottom: winters when the QBO is in its east phase.

From Labitzke and van Loon [238].

6.4.4 What Constitutes Independent Data?

Con¬rmatory analysis, as discussed in [6.4.1], re-

of analyses ¬elds, the two data sets are strongly

quires additional independent data. Independence

correlated.

is the essential point here; it is generally not

This observation limits any con¬rmatory statis-

suf¬cient to have additional data from independent

tical analysis with observed (atmospheric or other

sources. For example, workers sometimes claim

geophysical) data. Truly independent con¬rmatory

that they use independent data when they use

analyses can only be performed with observations

station data to derive a hypothesis and grid point

in the future because we can only collect the nec-

data from the same or a similar period to con¬rm

essary independent information in the future. One

the hypothesis. While it is certainly valuable to

alternative is to carefully construct a sensitivity

analyse both data sets to make sure that the

experiment with a GCM to test the question. This

hypothesis does not come about as a result of,

avoids waiting, and often gives the experimenter

for example, systematic biases in an ensemble

6: The Statistical Test of a Hypothesis

108

opportunities to control or eliminate extraneous for example, the anomalous boundary conditions

sources of variability that obscure the effects of or a modi¬ed parameterization of a sub-grid

interest in observations. Another alternative is to scale physical process. Statistical tests are often

divide the observations into learning and valida- used to determine whether the changes affect

tion data sets. The latter is set aside and reserved the distribution of climatic states simulated by

for con¬rmatory analysis of questions that arise the model. Since distributional changes alter the