(6.13) approach to testing would involve conducting

a test in the space spanned by the problem-

where ν is a Lagrange multiplier used to enforce

speci¬c patterns, and then, if a signal is detected,

the constraint (6.11). Note that any solution of

conducting a second test in the full space. Of

(6.13) satis¬es (see, e.g., Graybill [148])

course, this approach is not limited to two levels;

2Σ p o = ν p , p o p . a hierarchy of nested vector spaces could be

(6.14)

constructed by scaling arguments, for example.

Thus the only solution p o of (6.13) is A sequence of tests could then be conducted

[22], either in order of increasing or decreasing

1 ’1

po = νΣ p , dimension, to isolate the region on the supposed

2

response space (the space spanned by the full set

with ν = 2( p T Σ’1 p )’1 . of guess patterns) that contains the signal (see

When Σ = diag(σ1 , . . . , σm ), that is, Σ is

2 2 Section 7.3).

diagonal, the ith component of p o is expressed

in terms of the ith component of p as pio =

pi /σi . That is, the original guess pattern is 6.6 Tests of the Mean

rotated towards directions with small values of σi ,

directions that have little ˜noise™ relative to the 6.6.1 The Difference of Means Test. The t test,

signal. also known as Student™s t test, is a parametric test

6: The Statistical Test of a Hypothesis

112

of the null hypothesis that two univariate random [2.7.9].7 This is fortunate because it means that

variables X and Y have equal means, that is, the reference distribution under the null hypothesis

does not depend upon either the unknown common

H0 : E(X) = E(Y) or µ X = µY . (6.15) population mean µ = µ X = µY or standard

deviation σ = σ X = σY . Consequently, only a

The statistical model required to conduct the small number of reference distributions, indexed

test is built by making three assumptions [454]. by n X + n Y ’ 2, are required. Critical values

The ¬rst is a sampling assumption that every for this family of distributions are tabulated in

realization of X or Y occurs independently Appendix F.

of all other realizations. The second and third

are distributional assumptions: ¬rst, that the

distribution that generates realizations of X (or 6.6.2 Components of the t Statistic. It is useful

Y) is the same for each observation in the X (or to take a slight diversion to dissect (6.16) and

Y) sample and, second, that the distributions are better understand why it has the t distribution

normal5 and have equal variance σ 2 . The t test under the null hypothesis.

A random variable T has the t distribution with

is moderately robust against departures from the

normal distribution, particularly if relatively large m degrees of freedom, written T ∼ t(m) [2.7.9],

samples of both random variables are available. when

However, the test is not robust against departures A

from the sampling assumption (see [454] and T = √ , (6.18)

B/m

[6.6.6]) or against large departures from the

assumption that all realizations in a sample come where A is a standard normal random variable,

from the same distribution. A ∼ N (0, 1), and B is a χ 2 random variable

The optimal test statistic, within the constraints with m degrees of freedom, B ∼ χ 2 (m), that is

of the statistical model implied by the three independent of A. Under the null hypothesis of

assumptions, is conceptually different from that equality of means we ¬nd that

used for the sign test.6 The difference of means is

µ X ’ µY

estimated and then scaled by an estimate of its own

A= ∼ N (0, 1),

σ 1/n X + 1/n y

standard deviation, making it dimensionless.

The optimal test statistic is given by

n X + nY ’ 2 2

µ X ’ µY B= S p ∼ χ 2 (n X + n Y ’ 2),

t= , σ 2

(6.16)

S p n1X + n1

and that A and B are independent. By substituting

Y

these quantities into (6.18) we see that the test

where n X and n Y indicate the size of the X and Y

statistic for the difference of means test (6.16) is

samples respectively, µ X and µY are the sample

T ∼ t(n X + n Y ’ 2).

means of {x1 , . . . , xn X } and {y1 , . . . , yn Y }, and S p

is the pooled estimate of the common standard

6.6.3 When the Variance is Known. The t test

deviation

discussed above has been derived assuming that

nX nY

(xi ’ µ X )2 + i=1 (yi ’ µY )2 the variance is unknown. When the variance is

S2 = .

i=1

p

n X + nY ’ 2 known, its square root may be substituted directly

(6.17) for S p in (6.16). The resulting z-statistic has the

standard normal distribution N (0, 1) under the

Under the null hypothesis (6.16) has a t null hypothesis. Critical values may be obtained

distribution with n X + n Y ’ 2 degrees of freedom from Appendix D.

5 The test is said to be parametric because it concerns 7 The term degrees of freedom has geometrical roots. The

parameters (the means µ X and µY ) of a speci¬c distribution random variable T , of which t is a realization, is a function

of deviations xi ’ µ X , i = 1, . . . , n X and y j ’ µY , j =

(the normal distribution). A non-parametric version of the test

1, . . . , n Y . When these n X + n Y random deviations are

(see [6.6.11]) would focus on the expected values of X and Y

organized into an (n X + n Y )-dimensional random vector, we

and would use less speci¬c information about the distribution of

¬nd that the random vector is con¬ned to an (n X + n Y ’

these random variables to construct the statistical model needed

to conduct the test. 2)-dimensional vector space. This happens because the n X X

6 The sign test is an example of a non-parametric test. deviations must sum to zero as must the n Y Y deviations. A

The Mann“Whitney test [6.6.11] is another example of a derivation of this distribution of (6.16) may be found in, among

non-parametric test. others, [280] or [272].

6.6: Tests of the Mean 113

6.6.4 Relaxing the Assumptions. The differ- Hypothesis (6.15) is tested by comparing the

ence of means test described above operates as t-value computed using (6.19) with the critical

expected (e.g., the risk of false rejection is equal to values of the t distribution with d f degrees of

that speci¬ed) only if the assumptions are ful¬lled. freedom, where d f is computed with (6.20). This

In the following subsections we discuss methods recipe constitutes a test that operates at an actual

that can be used when: signi¬cance level close, but not exactly equal, to

the level speci¬ed by the user.

• the variances of X and Y are unequal, σ X =

σY (see [6.6.5]),

6.6.6 The Paired Difference Test. Not all

• the observations are paired in such a experimental designs lead to pairs of samples that

manner that pairs (xi , yi ) are independent are independent of each other. For example, one

realizations of a random vector (X, Y)T that may conduct an experiment consisting of a series

has dependent components (see [6.6.6]), of ¬ve-day simulations with an AGCM to study

the effects of a particular cloud parameterization.

• the observations are auto-correlated (see

Suppose that two parameterizations are chosen,

[6.6.7,8]).

and that pairs of ¬ve-day runs are conducted from

the same initial conditions. The initial conditions

6.6.5 Unequal Variances. We suppose now that are selected randomly from a much longer run of

the sampling and distributional assumptions of the same AGCM, and the total liquid water content

[6.6.1] continue to hold except that Var(X) = of the atmosphere is computed at the end of each

Var(Y).8 Under these circumstances only some of ¬ve-day integration.

the ingredients that lead to the t distribution as Because the integrations are short, one can

reference distribution are obtainable. The natural imagine that the pairs of liquid water ¬elds

estimator of the true difference of means is still obtained from each set of initial conditions are not

µ X ’ µY . This is a normal random variable with independent of each other. Thus the difference of

mean µ X ’ µY and variance σ X /n X + σY /n Y . The

2 2

means tests discussed above are not appropriate

variance is estimated by S X /n X + SY /n Y with S X

2 2 2

for testing the null hypothesis that the change in

nX

and SY de¬ned as usual by S X = n x1 i=1 (xi ’

2 2

parameterization has not affected the total liquid

’1

µ X )2 . Thus the difference of means is expressed in water content of the atmosphere. The statistical

model used with these tests relies upon the