discussed in Sections 6.6 and 6.7. Tests designed reject decision.

to provide a global interpretation for a ¬eld The rule is de¬ned in three steps.

of local decisions, called ¬eld signi¬cance tests, First, we regard the set of observations x

are presented in Section 6.8. Univariate and as a realization of a random vector X. The

multivariate recurrence analysis are discussed in latter represents the ensemble of values that x

Sections 6.9 and 6.10. is able to take, when H0 is true, under in¬nite

replication of the ˜experiment™ that produced the

set of observations. A statistical model is built for

6.1 The Concept of Statistical Tests the experiment by representing the likelihood of

observing a particular realization in this ensemble

6.1.1 Introduction. Since we should now be with a probability distribution f X .

somewhat comfortable with the ideas underlying Second, we specify the signi¬cance level, the

hypothesis testing (see [1.2.7], [4.1.7“11], and the probability of rejecting the null hypothesis when

preamble to this part of the book), we only brie¬‚y it is true, at which the test is to be conducted. The

characterize the testing paradigm here. choice of the signi¬cance level affects the power,

or sensitivity, of the test. Thus the consequences

Statistical hypothesis testing is a formalized

of falsely rejecting H0 should be balanced against

process that uses the information in a sample

to decide whether or not to reject H0 , the the consequences of failing to reject H0 when H0 is

false. In Section 6.2 we present this idea in more

null hypothesis. The evidence is judged in the

concrete terms.

context of a statistical model in such a way

that the risk of falsely rejecting H0 is known. A Finally, the chosen signi¬cance level, the

second proposition, the alternative hypothesis Ha , alternative hypothesis, and the statistical model are

generally describes the range of possibilities that used jointly to derive the decision making criterion

may be true when H0 is false. The alternative for the test. This is usually expressed in terms of a

hypothesis affects the decision making process by test statistic and a range of values of that statistic,

99

6: The Statistical Test of a Hypothesis

100

or non-rejection region,1 that is consistent with the error is 1’power. Thus, reduced signi¬cance level

null hypothesis. comes at the cost of decreased power. Ultimately,

˜

the user must choose p to balance the risk of a type

I error with the costs of a type II error.

6.2 The Structure and Terminology

of a Test 6.2.2 The Non-rejection Region When an

Alternative Hypothesis is not Speci¬ed. To

6.2.1 Risk and Power. The general mathemat-

conduct a test it is necessary to derive the

ical setup is derived from the three components

non-rejection region (˜ ). Intuitively, it should

p

described above. A statistical model is developed

contain all events except those that are unusual

to describe the stochastic characteristics of the

under the null hypothesis and consistent with the

observations and the way in which they were

alternative hypothesis. We will assume for now

obtained, provided that H0 is true. This model is

that Ha = ¬H0 . In this context the non-rejection

expressed in terms of a random vector X and its

region contains all events except those that are

probability distribution. Then a probability p ∈

˜

unusual under H0 .

[0; 1] and a domain (˜ ) are chosen so that p — ˜

p

In particular, if the observations are realizations

100% of all realizations of X fall inside (˜ ), that

p

of continuous random variables, then the non-

is,

rejection region will cover all possible realizations

x for which the density function f (x) under the

P X∈ (˜ ) = p.

˜

p (6.1)

null hypothesis is larger than some threshold ±p , ˜

˜

The null hypothesis H0 is rejected if x ∈ (˜ ). that is,

p

The probability of rejecting H0 when it is actually

(˜ ) = {x : f (x) ≥ ±p }.

p (6.2)

˜

true is 1 ’ p. This probability, the risk of false

˜

In many applications the derivation of (˜ )

rejection, is called the signi¬cance level of the p

statistical test. is facilitated by assuming that the sampling

˜

The probability p is chosen to be large, typically procedure and stochastic characteristics of the

observations are such that X ∼ N (µ, Σ). Then

95% or 99%, so that the non-rejection region

(˜ ) contains the realizations of X most likely to the outer surface of (˜ ) is given by f (x) = ±p ,

p p ˜

occur when H0 is true. Only the (1 ’ p) — 100%

˜ an ellipsoidal surface de¬ned by

of realizations that are unusual, and therefore

D2 (x) = (x ’ µ)T Σ’1 (x ’ µ) = κp .

constitute evidence contrary to H0 , are excluded ˜

from (˜ ).p

The domain (˜ ) = {x : D2 (x) ¤ κp } is the

p ˜

The probability of rejecting H0 when H0 is false

interior of the ellipsoid. Thus the statement x ∈

is the power of the test. While we would like

(˜ ) is equivalent to D2 (x) > κp , and the test

p ˜

the power to be large, it is sometimes small,

statistic is D .

2

often when the alternative hypothesis describes a

When H0 is true, the random variable D2 (X)

probability distribution similar to that described by

has a χ 2 distribution with m degrees of freedom

H0 . Then P X ∈ (˜ ) under Ha will be close to

p

[2.7.8], where m is the dimension of X. Therefore

that under H0 .

it is easy to determine κp so that the test operates

˜

Two types of decision making errors can occur

at the appropriate signi¬cance level. The non-

in the testing process. First, H0 can be rejected

rejection region is sketched in Figure 6.1 for m =

when it is true. This is referred to as a type I error.

1 and m = 2.

The probability of a type I error, 1 ’ p, is equal to

˜

In the univariate case, X = X and the matrix

the signi¬cance level.

Σ degenerates to the scalar σ 2 . The surface of

The signi¬cance level is chosen by the user of

the ellipsoid (x ’ µ)T Σ’1 (x ’ µ) = κp is given

˜

the test. However, reducing the likelihood of a

by the equation (x ’ µ) 2 /σ 2 = κ . Only two

˜

p

type I error comes at the cost of increasing the

points satisfy this equation, so the ellipsoid (˜ )

p

likelihood of the type II error: the failure to reject

degenerates to an interval that has two points as

H0 when it is false. The probability of a type II

its ˜surface™ (Figure 6.1a). The null hypothesis is

1 This is admittedly an awkward expression. The term rejected whenever an observation x lies outside the

˜acceptance region™ is sometimes used instead, but this interval; it is not rejected when an observation x

expression is imprecise as it implies that we might be able to

falls inside the interval.

actively support the validity of the null hypothesis. Instead we

The isolines of a bivariate normal density

just do not reject the null hypothesis”so ˜non-rejection™ is the