[8.3.13].

random variables. Points are expected to lie

approximately on the y = x line if the SOI is

5.2.5 Estimating the First Moment. The ¬rst

normal. The lines parallel to y = x are thresholds

moment µ(1) = µ of a real-valued random variable

that, if crossed, indicate that H0 : ˜sample is

X with probability density function f X is the

normal™ should be rejected at the 5% signi¬cance

expected value of X, E(X), given by

level (see [4.1.10]). The test may not be reliable

∞

because the sampling assumptions are not satis¬ed

µ= x f X (x) d x. (5.5)

by the SOI.

’∞

Stephens [356] and Pearson and Hartley [307] describe how to

We identi¬ed the sample mean (4.2)

adjust several goodness-of-¬t tests, including the Kolmogorov“

n

Smirnov test, when sample sizes are small and when it

1

¯

µ=X=

is necessary to estimate the parameters of the distribution Xk (5.6)

n

speci¬ed in H0 . k=1

5.2: Examples of Estimators 83

as a reasonable estimator of µ in [4.3.2] because 5.2.6 Estimating the Second and Higher

its expectation is µ and its variance goes to Moments. Useful estimators for the jth moment

∞

µ( j) = ’∞ x j f X (x) d x can be de¬ned, in a

zero as the sample size n increases. However,

the relationship between (5.5) and (5.6) is not manner similar to that of the ¬rst moment, as

immediately obvious. n

1 j

µ( j)

A heuristic argument that links the two = Xk .

expressions is as follows. First, let X(i|n) , i = n k=1

1, . . . , n, be the order statistics of sample

For the second central moment, the variance, we

{X1 , . . . , Xn } (see [2.6.9]). Then equation (5.5)

have

can be rewritten as

n

1

(X(1|n) +X(2|n) )/2

σ= (Xk ’ µ)2 .

2

(5.11)

µ= x f X (x) d x (5.7) n k=1

’∞

(X(i|n) +X(i+1|n) )/2

n’1

Note that the estimator (5.11) differs from the

+ x f X (x) d x sample variance (4.5) by a factor of n/(n ’ 1). We

i=2 (X(i’1|n) +X(i|n) )/2

return to this point in [5.3.7].

∞

+ x f X (x) d x. The same rules that apply to moments apply to

(X(n’1|n) +X(n|n) )/2 the estimated moments as well. For example, for

σ 2 as given in (5.11),

Now, in the ith sub-integral, we approximate the

integrand x f X (x) with X(i|n) f X (x). Thus, the ith 2

σ 2 = µ(2) ’ µ(1) .

sub-integral, for i = 2, . . . , n ’ 1, is approximated

as

5.2.7 Mean Vectors, Covariances, and

(X(i|n) +X(i+1|n) )/2

(5.8) Correlations. The univariate estimators of

f X (x) d x

X(i|n)

(X(i’1|n) +X(i|n) )/2 the mean and variance de¬ned above are easily

extended to apply to samples of n iid random

X(i|n) + X(i+1|n) vectors {X1 , . . . , Xn } distributed as the random

= X(i|n) FX

2 vector X. The mean vector is estimated as

X(i’1|n) + X(i|n)

’ FX . n

1

µ=

2 Xi (5.12)

n i=1

Similarly, the ¬rst sub-integral is approximated as

and, in analogy to the sample variance, the

X(1|n) + X(2|n) covariance matrix Σ (see [2.8.7] and (2.32)) may

’0

X(1|n) FX (5.9) be estimated with the sample covariance matrix as

2

n

1

and the nth sub-integral is approximated as

C= (Xi ’ µ)(Xi ’ µ)T . (5.13)

n’1

X(n’1|n) + X(n|n) i=1

X(n|n) 1 ’ FX . (5.10)

As with the variance, we can also de¬ne an

2

estimator Σ, expressed in terms of the moments

The next step is to approximate the true of the sample, and obtained by dividing the sum of

distribution function FX with its estimator (5.3). products in (5.13) by n rather than n ’ 1:

Note that each of the cumulative distribution

1n

function differences in (5.8)“(5.10) straddles one

Σ= (Xi ’ µ)(Xi ’ µ)T .

of the ˜steps™ in (5.3). Thus, each of these n i=1

differences is equal to 1/n and the ith sub-integral

1

in (5.7) is further approximated as n X(i|n) . Finally When we want to clarify that the estimated

we obtain covariance matrix refers to the random vector

X, we add subscripts to matrices C or Σ. The

n

1 elements of the estimated covariance matrix Σ,

µ≈ X(i|n)

denoted σ jk , are given by

n

i=1

n

1 n

= Xi = µ. 1

σ jk = (Xi; j ’ µ j )(Xi;k ’ µk ), (5.14)

n

n

i=1

i=1

5: Estimation

84

5.3 Properties of Estimators

where Xi; j represents the jth component of the

ith random vector Xi . Similarly µ j is the jth

5.3.1 Estimator Selection Criterion. Chapter 4

component of the estimated mean vector µ.

mentions that a good estimator will produce

It may happen, in practice, that there are missing

estimates ± in the neighbourhood of the true

values in some of the n sample vectors x1 , . . . , xn .

parameter value ±. A mathematically concise

Then the summations in (5.12) and (5.14) are taken

de¬nition of ˜in the neighbourhood™ is obtained

only over the non-missing values and the sums are

by de¬ning a ˜distance™ such as the mean squared

divided not by n but by the number of terms in

error

the sum. Theoretical results concerning properties

M(±; ±) = E (± ’ ±)2 .

of the estimators may not extend smoothly when (5.17)

there are gaps in the data.