independence of all observations.

µ X ’ µY

t= . (6.19) The solution to this problem is to compute the

2 /n + S 2 /n

SX X difference ¬elds and test the null hypothesis that

Y

Y

the mean difference is zero using a one sample t

The square of the denominator can be shown to

test. It is reasonable to assume that the observed

be statistically independent of the numerator but

differences are independent of one another because

it does not have a distribution proportional to the

the initial conditions were chosen randomly. The

χ 2 distribution. Therefore the test statistic does not

distributional assumptions are that the differences

have a t distribution under the null hypothesis.

have a normal distribution and that all the

The accepted solution to this problem, which is

differences come from the same distribution. The

known in the statistical literature as the Behrens“

former may not be true, even approximately,

Fisher problem, is to approximate the distribution

because moisture related variables, such as

of this statistic with a t distribution whose degrees

total liquid water, often exhibit strongly skewed

of freedom are estimated from the data. The

distributions. However, let us continue to assume

formula used to determine the approximating t

that the differences are normally distributed for

distribution is obtained by comparing the ¬rst and

the purposes of this discussion. The second

second moments of S X /n X + SY /n Y with those of

2 2

distributional assumption, that the differences are

the χ 2 distribution. The resulting formula for the identically distributed, may not hold if we failed to

approximating number of degrees of freedom is account for other sources of variation, such as the

(S X /n X + SY /n Y )2

2 2 annual cycle, in our experimental design. To avoid

df = . (6.20) such problems, the choice of initial conditions

(S 2 /n X )2 (SY /n Y )2

2

+

X

n X ’1 n Y ’1 should be constrained to one season or calendar

month, and one time of day.

8 When the equality of the two variances is uncertain,

Let di represent the ith realization of the change

one might resort to an F test for the equality of variances

in total liquid water D. The null hypothesis to be

(Section 6.7).

6: The Statistical Test of a Hypothesis

114

tested is H0 : µ D = 0. The optimal test statistic for lead to a t test in which the denominator of

this problem is the t statistic is in¬‚ated by a factor related to

the time scales at which the time series varies.

µD

t= √, (6.21) The resulting statistic, detailed below, is compared

SD / n

with critical values from a t distribution with an

where n is the size of the sample of differences, estimated number of degrees of freedom. This

n

µD = i=1 di /n is the mean difference, and approach, while not exact, has the advantages

n

SD = i=1 (di ’ µ D ) /(n ’ 1) is the sample that it is easy to use, easy to understand, and

2 2

variance of the observed differences. This statistic asymptotically optimal (i.e., it becomes optimal as

has a t distribution with n ’ 1 degrees of the sample size becomes in¬nitely large). It can

freedom under the null hypothesis.9 Thus the be used safely when samples are relatively large,

paired difference test is conducted by computing as de¬ned below. When samples are not large the

the differences, then computing (6.21) with the ˜Table-Look-Up™ test [6.6.9] should be employed.

sample moments, obtaining the appropriate critical The large sample difference of means test is

value from Appendix F and ¬nally comparing t developed heuristically as follows. We assume

with the critical value to make a decision. that the memory of the observed time series

The paired difference test is an example of a is ¬nite so that the full samples {X1 , . . . , Xn X }

one-sample t test. One-sample tests are used to test and {Y1 , . . . , Yn Y } contain subsets of indepen-

hypotheses of the form H0 : µ X = c where c is dent observations. For example, suppose that

a constant that is chosen a priori. These tests are {x1 , . . . , x100 } is a time series of 100 daily surface

performed by computing temperature anomalies. Consecutive observations

are certainly highly correlated, but any two obser-

µ ’c

t= X√ (6.22) vations separated by 10 days or more are nearly

SX / n

independent. Thus the sample contains a subset

and comparing with critical values for t(n ’ 1). of at least 11 roughly independent observations.

However, we do not throw away the other 89

6.6.7 Auto-Correlation. As noted in [6.6.1], the observations. Instead, we attempt to estimate the

t test is not robust against departures from the information content of the entire sample by deriv-

independence assumption. In particular, meteoro- ing an equivalent sample size.

logical time series are generally auto-correlated The measure of information used in the differ-

if the time increment between observations is not ence of means problem is one over the variance of

too large. Under these circumstances, a t test such the sample mean. Thus the smaller the variance of

as that based on (6.16) becomes liberal, that is, the sample mean, the more information the sample

it rejects the null hypothesis when it is true more contains about the unknown population mean.

frequently than indicated by the signi¬cance level. The equivalent sample size n X is de¬ned as the

Intuitively, observations taken in an auto- number of independent random variables that are

correlated sequence vary less quickly than ob- needed to provide the same amount of information

servations obtained completely at random. An about µ X as the sample of dependent random

auto-correlated series therefore contains less in- variables {X1 , . . . , Xn X }. Equivalent sample size

formation about the population mean than a com- n Y is de¬ned analogously.10 We anticipate that

pletely random sequence of the same length. n X < n X and n Y < n Y when observations are

Consequently, the standard error of µ X ’ µY is auto-correlated.11

larger for auto-correlated data than for indepen- This paradigm leads us to estimators n X and n Y ,

dent observations. However, the denominators of which replace n and n in the ordinary difference

X Y

t statistics, such as (6.16), estimate the standard of means tests with equal (see [6.6.1]) or unequal

deviation of µ X ’ µY under the independence

assumption. Therefore the denominator in (6.16) 10 Note that the de¬nition of the equivalent sample size

underestimates the variability of µ X ’ µY with the depends upon the parameter that is being tested and the way

consequence that the absolute value of t tends to is which information is measured. The equivalent sample sizes

for an equality of variance test, for example, are different

be too large. from those for the equality of means tests. The measure of

Resolution of this problem is non-trivial [454]. information used here, the inverse of the variance of the sample

Heuristic arguments, such as that given above, mean, is called Fisher™s information (see [92]).

11 Strictly speaking, this happens when time series are

9 There are n ’ 1 degrees of freedom because the deviations persistent, that is, when adjacent anomalies have the same sign.

di ’µ D are elements of an n-dimensional random vector that is It is possible to have n X > n X and n Y > n Y when adjacent

constrained to vary within an (n ’ 1)-dimensional vector space. anomalies tend to have opposite sign.

6.6: Tests of the Mean 115

(see [6.6.5]) variances. When the samples are large

enough, t statistics computed in this way with

(6.16) as

µ X ’ µY

t= (6.23)

+

1 1

Sp ˆ ˆ

nX nY

or with (6.19) as

µ X ’ µY

t= (6.24)

S2 2

SY

+

X

nX nY

can be compared with critical values from the

t(n X + n Y ’ 2) or t(d f ) distribution respectively

where, in the latter case, d f is computed with

(6.20) by substituting the equivalent sample size

estimates for the sample sizes themselves.

There are two problems left:

• estimating the equivalent sample size (see

[6.6.8]), and