subject in, for example, Graybill [147], Johnson and Wichern

when there is dependence within the samples.

[197], Morrison [281], or Seber [342].

6: The Statistical Test of a Hypothesis

120

If the samples are time series, spectral analysis sums (6.41). It is clear that this test can never be

methods (see Section 12.3) can be used to describe as powerful as the Mann“Whitney test because the

the variability in the samples as functions of two samples of absolute deviations can never be

time scale. F tests can then be used to compare completely separated. Regardless of the variance,

variability within the samples at various time both samples are likely to have some small

scales. deviations near zero.

Finally, the F test is not particularly powerful. One way to improve the power of this test is

For example, to reject H0 reliably when σ X = 2 to focus more attention on the largest absolute

2 deviations. The second test, the squared-ranks test,

2σY , say with power 95% in a 5% signi¬cance

level test, requires samples of size n X = does this by using

n Y ≈ 100. Since power can always be increased nX

T= Ri2

somewhat at the cost of greater risk of false (6.42)

rejection, F tests are often performed at the i=1

10% signi¬cance level whereas t tests are usually as the test statistic instead of (6.41). Decisions are

performed at the 5% or 1% signi¬cance levels. made at the (1 ’ p) — 100% signi¬cance level by

˜

using the critical values in Appendix J as follows.

6.7.4 A Non-parametric Test of Dispersion.

• H0 : σ X < σY : reject when T is unusually

2 2

There are several simple non-parametric tests of

small, that is, when T is less than the (1 ’

equality of variance.19 We will describe two of

p)-quantile of T . When n X = 7, n Y = 8,

˜

them here. In both cases, the standard sampling

(1 ’ p) = 0.05, we would reject when T <

˜

assumptions are required. That is, it must be

426.

possible to represent the samples by iid random

variables, and the samples must be independent of • H0 : σ X = σY : reject when T is less than the

2 2

each other. It is also necessary to assume that the ((1 ’ p)/2)-quantile of T , or greater than the

˜

two populations have the same distribution when ((1 + p)/2)-quantile. When n X = 7, n Y = 8,

˜

they are standardized by subtracting the mean and and (1 ’ p) = 0.05, reject when T < 384 or

˜

dividing by the standard deviation. T > 935.

The ¬rst test is performed by converting

• Ha : σ X > σY : reject when T is greater than

2 2

both samples into absolute deviations from the

the p-quantile of T . When n X = 7, n Y = 8

˜

respective sample means:

and (1 ’ p) = 0.05, reject when T > 896.

˜

ui = |xi ’ x|, i = 1, . . . , n X and

When n X or n Y is greater than 10, the (1 ’

v j = |y j ’ y|, j = 1, . . . , n Y .

p)-quantile of T can be approximated by

˜

The combined samples of absolute deviations

n Y (N + 1)(2N + 1)

u1 , . . . , un X , v1 , . . . , vn Y are then assigned ranks, T(1’˜ ) = (6.43)

p

6

as in the Mann“Whitney test [6.6.11]. The sum of

n X n Y (N + 1)(2N + 1)(8N + 1)

the ranks ’ Zp ,

˜ 180

nX

S= (6.41) where N = n X + n Y and Z is the p-quantile

˜

Ri

˜

p

i=1

of the standard normal distribution (Appendix D).

is used as the test statistic. Critical values are Note that, as with the ¬rst non-parametric test of

the same as for the Mann“Whitney test (see the variance, this test is also an approximate test

Appendix I). This is an approximate test when when samples are small.

Even with the improved power, the squared-

samples are small because ranked entities, the

absolute deviations, are not quite independent of ranks test is inef¬cient when the data are really

one another.20 normal. Conover [88] notes that the test has

The idea behind this simple test is that the asymptotic relative ef¬ciency 0.76 in this case (i.e.,

deviations in one sample will tend to be smaller the F test with samples of size 760 will be as

than deviations in the other when H0 is false, ef¬cient as the squared-ranks test is with samples

resulting in either unusually small or large rank of size 1000). On the other hand, when the data

are actually distributed as the double exponential

19 Strictly speaking, these are tests of dispersion because they

distribution (a wide-tailed asymmetric distribution

are designed to look for differences in the spread of the samples.

20 The deviations within a sample are dependent because they that peaks sharply at the mean), the asymptotic

relative ef¬ciency is 1.08.

sum to zero.

6.8: Field Signi¬cance Tests 121

all decisions and let D— be the vector of decisions

6.8 Field Signi¬cance Tests

at the subset of points. The relative frequency of

6.8.1 Constructing Field Signi¬cance Tests rejections of local null hypotheses will, on average,

from Local Tests. We discussed the use of be about the same in D and in D— . That is,

a ¬eld of local test decisions for making a —T — —

global decision about a global null hypothesis in D D /m ≈ D D/m.

T

[6.5.2]. We reconsider this problem here in more Independence ensures that

generality.

˜

The global null hypothesis is H0 : ˜all local null D—T D—/m — ∼ B(m — , p). ˜

G

hypotheses are correct.™ We assume that all local —

tests are conducted at the (1 ’ p) signi¬cance Thus the challenge is to select m in such a way

˜ T

level. The alternative hypothesis is that ˜at least that the distribution of D D/m is approximately

—T — —

one local null hypothesis is incorrect.™ Note that that of D D /m .

One way to determine m — is to use physical

we must specify two signi¬cance levels: (1 ’ p), ˜

reasoning. Usually this approach will lead to only

˜

the signi¬cance level of the local test; and (1 ’ p),

˜

vague estimates, but often this approach does

the signi¬cance level of the global test. We will see —

˜ can be chosen independently of p. However, yield upper limits on m . —Another approach is to

˜ ˜

that p G

compute the minimum m for which H0 can be

the power of the global test is not independent of

˜

rejected at the (1 ’ p) signi¬cance level. Clearly, if

˜

the power of the local tests. — > m, the global null hypothesis HG cannot be

Let D be an m-dimensional random vector of m 0

rejected. See [6.8.4].

binary random variables Di that take values 0 or

1. Each of these random variables represents the

result of a local test. These binary random varia- 6.8.3 Livezey and Chen™s Example. Livezey

bles are identically distributed with P (Di = 1) = and Chen [257] describe an analysis of the

1 ’ p and P (Di = 0) = p under the global null relationship between the Southern Oscillation, as

˜ ˜

represented by an SO index, and the Northern

hypothesis.

Now let test statistic S be the number of local Hemisphere extratropical circulation, given by

—¦