If the target is a random variable, a model is

σ σ

needed to predict the value of the unknown random

p = P X ’ zU √ < µ < X + zU √ .

˜

variable from the observed random variables n n

X1 , . . . , Xn . Often, the model has the form Xi =

± + Ei , i = 1, . . . , n + 1, where ± is a location Thus, when σ 2 is known, the p — 100% con¬dence

˜

parameter and the errors Ei are iid with mean interval for µ is

zero. The approach is to estimate the location

σ σ

parameter and then predict Xn+1 as Xn+1 = ± + X ’ zU √ , X + zU √ . (5.44)

n n

En+1 . Because the errors are iid we can only

predict En+1 = 0. Thus, the prediction error is

We still express the distance between X and µ

A pr ed = Xn+1 ’ ± = (± ’ ±) + En+1 . The

next step is to ¬nd the distribution of the prediction in dimensionless units as in (5.43), but we replace

error, and then to ¬nd critical values A L and σ with the estimator S. The resulting t statistic,

AU such that P A L ¤ A pr ed = 1 ’ p/2 and ˜ √

P A pr ed ¤ AU = 1 ’ p/2. We expect that in T = n(X ’ µ)/S,

˜

repeated sampling

has a t distribution with n ’ 1 degrees of freedom

p = P A L < A pr ed < AU

˜ (see [2.7.9] and [4.3.3]). Proceeding as above, we

¬nd that, when σ is unknown, the p — 100% ˜

= P (A L < Xn+1 ’ ± < AU )

con¬dence interval for µ is

= P (± + A L < Xn+1 < ± + AU ).

S S

X ’ tU √ , X + tU √ ,

The con¬dence interval has structure similar to (5.45)

n n

that of ±, but is substantially wider because the

critical values A L and AU account for sampling

where tU is the 0.5 + p/2 quantile of the t(n ’ 1)

˜

variation in both ± and Xn+1 .

distribution (see Appendix F).11

These con¬dence intervals may depend upon

Be aware that the coverage of intervals (5.44)

yet more parameters. For example, the limits of a

and (5.45) deviates from the nominal p — 100%

˜

con¬dence interval for a location parameter may

level when one or more of the assumptions

depend upon the value of a scale parameter. Such

we have made is violated. For example, serial

parameters are called nuisance parameters (see

correlations within the sample will tend to reduce

also [4.1.7]). The only solution is to estimate

the coverage of these intervals (see Chapter 4,

the nuisance parameter and then reformulate the

[5.1.2] and [6.6.7“9]).

con¬dence interval to account for the sampling

variability of the nuisance parameter estimator. 10 For example, z = 1.96 (1.63) for p = 0.95 (0.90).

˜

U

Examples of con¬dence intervals for location U = 2.776 (2.132) for p = 0.95 (0.90)

˜

11 For example, t

and scale parameters are described below. when T has 4 degrees of freedom.

5.5: Bootstrapping 93

5.4.5 Con¬dence Intervals for the Variance. [4.1.7], [5.4.3]). Then we can ¬nd a con¬dence

Again, let X1 , . . . , Xn represent a sample of interval for ± simply by ¬nding the lower and

iid N (µ, σ 2 ) random variables. As described in upper tail critical values of ±(X)’±. The bootstrap

[5.4.3], con¬dence intervals for scale parameters, procedure solves the problem of the missing

such as σ 2 , are constructed by ¬rst expressing an distribution function FX (x) by replacing it with

estimator of σ 2 in dimensionless units. Here we a consistent estimator, the empirical distribution

function F X (x) (see [5.2.2]). Then, following the

use

same steps outlined above, we arrive at an estimate

= (n ’ 1)S2 /σ 2 , of the distribution of ±(X) ’ ± that converges to

the true distribution as the sample size increases.

which is χ 2 (n ’ 1) distributed (see [2.7.8]). Upper

The estimated distribution can be used to obtain

and lower tail critical values, U and L , of the

an approximate con¬dence interval for p or an

χ 2 distribution are tabulated in Appendix E. These

estimate of the variance of ±(X).

values are chosen so that P ( < L ) = 0.5 ’ p/2 ˜

The steps that produce bootstrapped con¬dence

and P ( < U ) = 0.5 + p/2. ˜ 12 Following the

intervals or variance estimates can sometimes be

derivation in (5.42), we see that the p — 100%

˜

performed analytically (see, e.g., Efron [111]). In

con¬dence interval for σ 2 is

general, though, the mathematics are intractable,

(n ’ 1)S2 (n ’ 1)S2 and Monte Carlo simulation is used instead. The

, . (5.46)

steps are as follows.

U L

This interval contains the point estimator S2 , but 1 Generate a random sample y1 , . . . , yn from

unlike the con¬dence interval for the mean, it is the population that has distribution function

F X (x).14 This can be done by using a random

not located at its centre. As with the mean, the

coverage of (5.46) is sensitive to departures from number generator to simulate a sample

u1 , . . . , un from the U(0, 1) distribution and

the assumptions.

then solving F X (y j ) = u j for each j =

1, . . . , n.

5.5 Bootstrapping

2 Evaluate ± for the realized sample.

5.5.1 Concept. The interval estimation methods

of the previous section use a fully parametric

3 Repeat steps 1 and 2 a large number of times.

model to express the uncertainty of the corre-

sponding point estimator. That is, all elements

The resulting sample of realizations of ± can

of the assumed statistical model are required to

be used to estimate properties of the distribution

derive the con¬dence interval. However, it is often

of ± such as its variance or its quantiles. The

not possible to make a distributional assumption,

(1 ’ p)/2 and (1 + p)/2 quantiles are the lower

˜ ˜

or a distributional assumption can be made but

and upper bounds of the bootstrapped p — 100%

˜

derivation of a con¬dence interval is mathemati-

con¬dence interval for ±. The inferences made

cally intractable. The bootstrap [111] provides a

with bootstrapping procedures are approximate

solution in both instances.

because the distribution of the parameter estimate

Suppose we assume only that the sample can be

is derived from an estimated distribution function.

represented by iid random variables X1 , . . . , Xn .

There may also be additional uncertainty if

Each has the same distribution function, FX (x),

only a small number of bootstrap samples are

but its form is not known. If we did know

generated. Inferences made with the bootstrap are

the distribution we could easily write down

asymptotically exact15 provided that F X (x) is a

the joint density function of the random vector

consistent estimator of FX (x).

X = (X1 , . . . , Xn )T , and with luck derive the

distribution of parameter estimator ±(X).13 To

F X (x) is the empirical

14 In the ordinary bootstrap,

keep the discussion simple, assume that ± is a

distribution function. However, other estimators of the

location parameter and that the distribution of distribution function can also be used. For example, F X (x)