from auto-regressive processes of order 1 (Chapter 10).

ensuring that both the Hotelling T 2 test and the t

Departures from this assumption will compromise the test.

Wilks [423] suggests an alternative approach for situations test will make the same decision. Also note that

T 2 is a scaled version of the Mahalanobis distance

when the assumption does not hold.

6.6: Tests of the Mean 117

j = 1, . . . , n Y . Combinatorial arguments show

(6.8) that is computed with an estimate of the

covariance matrix. that the combined sample can be partitioned into

two groups of size n X and n Y in (n XX+nYY!)! ways.

T 2 has the F distribution with (m, n X + n Y ’ n !n

m’1) degrees of freedom [280] when H0 is true.13 Thus the probability of observing fully separated

samples under H0 such that all observations in the

Thus the Hotelling test is conducted by comparing

T 2 with critical values from this distribution. X sample are greater than all observations in the Y

sample is (n XX+nYY!)! . Similarly the probability that

n !n

Critical F values may be found in Appendix G.

xk > y j for all j and all but one k = 1, . . . , n X is

When the covariance matrix Σ is known, the

n Y / (n XX+nYY!)! .

Hotelling test reduces to the χ 2 test (see [6.7.2]). n !n

The test statistic is then given by These examples indicate that it makes sense

to de¬ne a test statistic based on the ordering of

n X + nY

(µ X ’ µY )T Σ’1 (µ X ’ µY ),

C2 = the combined sample. To do so we introduce the

n X nY

concept of ranks in the joint sample

(6.34)

z = (x1 , . . . , xn x , y1 , . . . , yn Y )T .

and is compared with the critical values of the χ 2 (6.35)

distribution with m degrees of freedom.14 Again,

note the scalar case analogy. When m = 1, C 2 Now let R1 be the rank of x1 in z; that is, if x1

reduces to Z2 with Z ∼ N (0, 1). Also note that is the ith smallest observation in z, then we set

R1 = i. De¬ne R2 , . . . , Rn x +n y similarly.15 The

C 2 is a scaled version of the Mahalanobis distance

D2 (6.8). Critical χ 2 values may be found in test statistic is then de¬ned to be the rank sum of

all X observations,

Appendix E.

nX

S= Ri .

6.6.11 The Mann“Whitney Test. Sometimes (6.36)

it is not possible to make all the assumptions i=1

required for a parametric test, so it may be

The distribution of S, under H0 , is obtained

desirable to use a non-parametric test that can be

through combinatorial arguments [88]. Critical

applied under a less restrictive set of assumptions.

values κ1’˜ are tabulated in Appendix I. For large

The Mann“Whitney test (cf. [4.1.8]) is an ex- p

samples sizes, approximate critical values for tests

ample of a non-parametric test of H0 : µ X = µY .

at the (1 ’ p) — 100% signi¬cance level are given

˜

The same sampling assumption is required as

by [88] as

in the t test and it is also necessary to assume

that all observations in a sample come from the

n X (n X + n Y + 1)

κp =

same distribution, but the distributional assump- (6.37)

˜

2

tion itself is relaxed. Rather than specifying a

n X n Y (n X + n Y + 1)

particular functional form (e.g., the normal distri-

’ Zp ,

˜

bution), the Mann“Whitney test requires that the 12

density functions of X ’ E(X) and Y ’ E(Y) be

˜

where Z p is the p-quantile of the standard normal

˜

identical.

distribution (Appendix D). A two-sided test of H0 :

With these assumptions, the distribution of

µx = µ y versus Ha : µx = µ y is performed at

any function of the n X + n Y observations

the (1 ’ p) — 100% signi¬cance level by rejecting

˜

x1 , . . . , xn X , y1 , . . . , yn Y is independent of the

H0 when S < κ(1’˜ )/2 or S > Smax ’ κ(1’˜ )/2 ,

ordering of the samples under H0 . The Mann“ p p

where Smax = n x (n x + 2n y + 1)/2 is the largest

Whitney test exploits this fact by examining the

possible value that S can take. A one-sided test of

positions of the X observations when the combined

H0 : µx ≥ µ y versus Ha : µx < µ y is performed by

sample is sorted in increasing order.

rejecting H0 when S < κ(1’˜ ) .

The samples are fully separated when Xk > p

Y j , or vice versa, for all k = 1, . . . , n X and The added ¬‚exibility of the Mann“Whitney

test compared with its conventional parametric

13 The derivation of the distribution of T 2 follows that of

counterpart, the t test, comes at the cost of

t closely. The statistic can be written as the ratio of two slightly reduced ef¬ciency when the observations

independent quadratic forms that each have the χ 2 distribution

are normally distributed. The asymptotic relative

under the H0 . It follows that T 2 has an F distribution because

the latter is characterized as a ratio of χ 2 random variables

15 Of course, the ranks can be de¬ned equally well in

[2.7.10].

14 Note the analogy with T 2 . Here the statistic consists of a ascending order so that the largest value receives the rank 1,

single quadratic form. etc.

6: The Statistical Test of a Hypothesis

118

ef¬ciency16 of the Mann“Whitney test is 0.955 origin, and consequently S(z) should lie at the

extremes of the collection of S(zπ ) values.17

when the data are normally distributed. That means

that, asymptotically, the t test is able to achieve the A test is therefore constructed by comparing

same power as the Mann“Whitney test using only S(z) with the ensemble of values obtained by

evaluating S(zπ ) for all permutations π. If

95.5% of the observations needed by the latter.

However, this disadvantage disappears for some the collection of permutations is very large,

the distribution of S(zπ ) may be estimated by

distributions other than the normal distribution.

The asymptotic relative ef¬ciency is 1.0 when randomly selecting a subset of permutations.

the data come from the uniform distribution and For most applications a subset containing 1000

it is 1.5 if the data have the double exponential permutations will do.

distribution, indicating that the t test requires 1.5 To express the test mathematically, let be

the set of all permutations π . Then compute (or

times as many observations.

estimate if is large)

6.6.12 A Permutation Test. The following |{π ∈ : S(zπ ) > S(z)}|

H= ,

test of H0 : µ X = µY , ¬rst proposed by Pitman (6.38)

||

[314, 315, 316], can be applied to univariate as

where |A| denotes the number of entries in a set

well as multivariate problems. It also allows us

to relax the distributional assumption somewhat A. Since H is an estimate of the probability of

further than the Mann“Whitney test allows. We observing a more extreme value of the test statistic

will need the standard sampling assumption (i.e., under the null hypothesis, we may reject H0 if H

independence), the assumption that observations is less than the speci¬ed signi¬cance level.

are identically distributed within samples, and The permutation test approach is easily ex-

a third assumption that distributions differ only tended to multivariate problems [397]. One ap-

with respect to their expectations, if they differ at proach is to de¬ne a multivariate test statistic S in

terms of univariate test statistics S j , j = 1, . . . , m,

all. Note that the sampling assumption is crucial.

In particular, the permutation test performs very as

poorly when observations are serially correlated m

S= |S j |.

[442]. (6.39)

Let us ¬rst consider the univariate case. j=1

As in the Mann“Whitney test, let z be the