correct word.

6.2: The Structure and Terminology of a Test 101

the assumption that the alternative hypothesis is

the complement of the null hypothesis, that is,

a) Ha = ¬H0 . This particular choice of alternative

hypothesis dictates that all ˜unusual™ values of X

0.3 represent evidence contrary to H0 . However, we

often have prior knowledge about the expected

0.2 kind of departure from the null hypothesis. An

example: if we summarize the response of the

climate system to a doubling of CO2 with the

0.1

a 95%

global mean (near-surface) temperature and the

x''

x' global mean precipitation, then we anticipate an

-3 -2 -1 1 2 3 increase in temperature, but we might be uncertain

0

Q (95%) about the sign of the change in precipitation.

This prior knowledge, which is expressed as the

alternative hypothesis, results in a non-rejection

b) region that is constrained in some way.

2

Consider again the simple examples of the

x' previous subsection. Figure 6.1 illustrates non-

1

x'' rejection regions when Ha is the complement of

H0 . However, suppose that we anticipate, as in the

climate change example above, that the mean of

1 2 3

X1 will be greater than zero if H0 is false (we

use the subscript ˜1™ to indicate the ¬rst element

of X). Then a reasonable non-rejection region that

accounts for Ha is given by (˜ ) = {x : f (x) ≥

p

± p and x1 ≥ 0} © {x : f (0, x2 ) ≥ ± p and x1 ¤ 0},

where ±p is chosen to satisfy (6.1). The alternative

˜

hypothesis has modi¬ed the ˜rules of evidence™

Figure 6.1: Schematic diagrams illustrating the by instructing the test not to treat unusually large

˜

domains for which the null hypothesis ˜x is drawn negative values of x1 as evidence inconsistent with

from X™ is accepted. The shaded area represents H0 . The change in the non-rejection region is

the non-rejection region (95%) = {x : f (x) ≥ illustrated in Figure 6.2. This change reduces the

±95% } (a) univariate distribution; (b) bivariate magnitude of X realizations needed on the right

hand side of the x1 = 0 plane to reject H0 .

distribution. The points x and x are examples of

realizations of the sampling process that provide Hence the power of the test is increased against

alternatives for which E(X1 ) is positive.

evidence contrary to the null hypothesis, whereas

the realizations x and x are consistent with the

null hypothesis [396].

6.2.4 Ef¬ciency. A test may not be ef¬cient even

if it operates at the selected signi¬cance level, that

diag(1, 2)). The maximum of f is located in the is, the constraint (6.1) is satis¬ed. For example,

centre of the diagram, and the region bounded by one might choose the non-rejection region (˜ ) =p

the (95%)-ellipsoid is shaded. In both cases, the {x : f (x) ¤ ±p }. This would lead to the rejection

˜

observation x leads to the rejection of the null of the null hypothesis for realizations of X that

hypothesis H0 , whereas x leads to the conclusion are close to ˜normal™ and hence nearest the null

that the observations are consistent with the null hypothesis. Although this is a test of H0 , it is

hypothesis. clearly an absurd one. One could also choose to

ignore the data by tossing a coin that comes up

heads (1 ’ p) — 100% of the time. Generally

˜

6.2.3 The Non-rejection Region When Ha is

Speci¬ed. The choice of the non-rejection region speaking, inef¬cient low-power tests are avoided if

may be constrained in various ways when an the non-rejection region satis¬es (6.1) and contains

alternative hypothesis is speci¬ed. The region must the outcomes x that are most likely to occur

satisfy (6.1) to ensure that the test operates at under H0 . Technical details of the construction of

the selected signi¬cance level, but it need not optimal tests can be found in standard texts on

necessarily satisfy (6.2), which was derived under mathematical statistics such as [335] or [92].

6: The Statistical Test of a Hypothesis

102

a)

0.3

0.2

a 95%

0.1

x'

x''

-3 -2 -1 1 2 3

0

Q (95%)

b) 2

1 x'

x''

Figure 6.3: Signal-strength δ = µY ’µ X for which

σ

H0 : µY = µ X is rejected with probability 50%

1 2 3

or 90% at the 5% signi¬cance level, shown as a

function of n, the number of realizations of each

X and Y. It is assumed that X ∼ N (µ X , σ ) and

Y ∼ N (µY , σ ). [404]

Figure 6.2: Same as Figure 6.1 but for a one-sided realizations of the con¬dence interval. Thus, even

test. The non-rejection region is described in the though the difference between µ X and µY is

physically insigni¬cant, we will judge it to be

text.

statistically signi¬cant given large enough samples

(i.e., resources).

This is illustrated in Figure 6.3, which shows

6.2.5 Statistical and Physical Signi¬cance.

Suppose we wish to test the null hypothesis, H0 : the minimum strength of the difference of means

µ X = µY , that the means of two random variables signal µ X ’ µY for which an ordinary t test (see

are equal. This can be accomplished by collecting [6.6.1]) will reject H0 : µ X = µY with probability

a sample from both populations and computing a 50% or 90%. These power curves are shown as

con¬dence interval for the difference of means, a function of sample size under the assumptions

µY ’ µ X , similar to (5.45). The null hypothesis that both populations have the same variance σ 2

is rejected at the 5% signi¬cance level when the and size n. The ¬gure shows, for example, that if

hypothesized value for µY ’ µ X , 0, is not covered µ X ’ µY = 0.5σ , then samples of approximately

n = 24 observations are needed to detect the

by the 95% con¬dence interval.

Zero will lie outside just about every real- signal with a probability of 50%. Eighty-eight

ization of the con¬dence interval when the two observations are needed in each sample to increase

populations are well separated, regardless of the the power to 90%. The size of signal that can be

size of the sample, since there is probably a detected with a given level of reliability tends to

√

large, physically signi¬cant difference between the zero as O(1/ n).

populations. On the other hand, suppose that the Another way to illustrate these ideas is shown

true difference of means is small and of little in Figure 6.4, where we see the density functions

physical consequence, and that the populations of a control and an experimental random variable

have heavy overlap. Zero will often be inside the (solid and dashed curves labelled n = 1)

con¬dence intervals when the sample size is small. and corresponding sampling distributions of the

However, the width of the con¬dence interval means for samples of 10 and 40. The population

decreases with increasing sample size. Given large means differ by one standard deviation. The

enough samples, zero will again lie outside most two density functions overlap considerably; a

6.2: The Structure and Terminology of a Test 103