vector of all X and Y observations: z =

applied to S instead of S. One should exercise

(x1 , . . . , xn X , y1 , . . . , yn Y )T . Under the null hy-

some caution with this expedient. For example,

pothesis, the distributions of X and Y are identical

the multivariate test that is obtained is not always

and thus any statistic S of Z has a distribution that

invariant [4.3.3] under linear transformation of the

is independent of the ordering of the components

m-dimensional ¬eld.

of Z. That is, if π is a random permutation of

One drawback of the permutation test is that it

{1, . . . , n X + n Y }, then S(Z) has the same distri-

is not supported by a rich statistical theory. We do

bution as S(Zπ ). Consequently, any arrangement

know that permutation tests are asymptotically as

zπ of the observed z is as likely under the null

ef¬cient as their parametric counterparts [274], but

hypothesis as any other. Hence the probability that

we must rely on Monte Carlo methods to obtain

the observed test statistic S(z) takes a value in the

information about the small sample properties of

upper ¬fth percentile of values that can be taken by

the test in speci¬c situations.

S(zπ ) is exactly 5% under the null hypothesis.

In contrast, ordering becomes important under

the alternative hypothesis, where possible values 6.7 Test of Variances

of S(z π ) obtained via permutation are not equally

likely. The unpermuted vector precisely divides 6.7.1 Overview. Until now our focus has been

the observations according to their population of on tests about the ¬rst moments (i.e., means) of

scalar and vector random variables. We brie¬‚y

16 The ef¬ciency of two tests is measured by comparing the

describe a few ways in which to test hypotheses

sample sizes needed to achieve the same power at the same

about the second central moments (i.e., variances)

signi¬cance level against the same alternative. The sample size

ratio often becomes independent of power, signi¬cance level, of scalar random variables in this section. Tests

and the particular alternative as one of the sample sizes tends

17 At least, this should be true if S ef¬ciently estimates

to in¬nity. When this happens, the limiting sample size ratio

is called the asymptotic relative ef¬ciency (ARE). See Conover a monotone function of the difference between the two

[88] for more details. populations.

6.7: Test of Variances 119

about the second central moments of random equal variances. For example, this is needed when

vectors (i.e., covariance matrices) are beyond the selecting a test for the equality of means (see

scope of this book.18 [6.6.1] and [6.6.5]). There are also a myriad of

climate analysis problems in which we want to

compare variances. For example, we may want

6.7.2 The χ 2 Test. Suppose X1 , . . . , Xn are

compare the variability of two simulated climates

iid random variables that represent a sample of

on some time scale, the variability of the observed

size n from the normal distribution. Then C2 =

climate with that of a simulated climate, or the

(n ’ 1)S X /σ X has the χ 2 (n ’ 1) distribution (cf.

2 2

variability under different climatic regimes (e.g.,

[2.7.8]).

warm versus cold ENSO events).

The null hypothesis H0 : σ X = σo2 can then be

2

The standard procedure for testing H0 : σ X = 2

tested at the (1’˜ ) signi¬cance level by computing

p

σY is the F test. It can be applied when

2

2 = (n ’ 1)S 2 /σ 2 and making decisions as

C Xo

we have two independent samples X1 , . . . , Xn X

follows.

and Y1 , . . . , Yn Y , each consisting of iid normal

• Ha : σ X < σo : reject H0 when C2 is less

2 2 random variables. Then

than the (1 ’ p)-quantile of the χ 2 (n ’ 1)

˜ S2

F= X

distribution. The χ 2 distribution is partially (6.40)

2

SY

tabulated in Appendix E. For example, when

n = 10, we would reject H0 at the 5% has the F(n X ’ 1, n Y ’ 1) distribution under the

signi¬cance level when C2 is less than 3.33. null hypothesis [2.7.10]. Critical values of the F

The non-rejection region is [3.3, ∞). distribution are tabulated in Appendix G. The test

is performed at the (1 ’ p) — 100% signi¬cance

˜

• Ha : σ X = σo2 : reject H0 when C2 is less level as follows.

2

than the ((1’ p)/2)-quantile of the χ 2 (n ’ 1)

˜

• Ha : σ X > σY : reject H0 when f is greater

2 2

distribution, or greater than its ((1 + p)/2)- ˜

than the p-quantile of the F(n X ’ 1, n Y ’ 1)

˜

quantile. When n = 10, the non-rejection

distribution. For example, when n X = 9 and

region for the 5% signi¬cance level test is

n Y = 10, the non-rejection region for a test

[2.70, 19.0].

conducted at the 10% signi¬cance level is

• Ha : σ X > σo

2 2 : reject H when C2 is

[0, 2.47].

0

greater than the p-quantile of the χ

˜ 2 (n ’ 1)

• Ha : σ X = σY : reject H0 when f is

2 2

distribution. When n = 10, the non-

less than the (1 ’ p)/2-quantile of the

˜

rejection region for the 5% signi¬cance level

F(n X ’ 1, n Y ’ 1) distribution, or greater

is [0, 16.9].

than its ((1 + p)/2)-quantile. Note that

˜

The χ 2 test is more sensitive to departures from most tables do not list the lower tail

quantiles of the F distribution, because

the normal distribution assumption than the tests

when F ∼ F(n X ’ 1, n Y ’ 1), then F ∼ 1

of the mean discussed in the previous section.

F(n Y ’ 1, n X ’ 1). Thus the ((1 ’ p)/2)-

˜

This sensitivity arises because C2 is a sum of

quantile of F(n X ’ 1, n Y ’ 1) is 1 over the

squared deviations. Data that are not completely

((1 + p)/2)-quantile of F(n Y ’ 1, n X ’ 1).

˜

normal tend to have at least some deviations from

When n X = 9 and n Y = 10, the non-rejection

the sample mean that are larger than would be

region for a 10% signi¬cance level test is

observed in a completely normal sample. Because

[0.295, 3.23].

these deviations are squared, they have a very

2 . Inferences are

large effect on the value of C

Just as for the χ 2 test, the F test is sensitive

consequently unreliable.

to departures from the normal distribution. Also,

it is not robust against outlying observations

6.7.3 The F Test. The one sample χ 2 test caused by, for example, observational or data

of the previous subsection has relatively limited management errors. It is therefore useful to have

applications. On the other hand, there are many a non-parametric alternative even if the relative

problems in which it is necessary to decide ef¬ciency of the test is low when data are normal.

whether two samples came from populations with A non-parametric test is discussed in the next

18 Interested readers can ¬nd entry points to literature on this subsection.