The correlation between the jth and kth

two estimators. In particular, we have the

elements of X is

following de¬nition about the relative ef¬ciency

σ jk of estimators:

ρi j = √ ,

σ j j σkk Let ± and ± be two competing estimators of a

parameter ±. Then ± is said to be a more ef¬cient

where σ jk is the covariance between X j and Xk , estimator of ± than ± if M(±; ±) < M(±; ±) for

and σ j j and σkk are the corresponding variances all possible values of ±.

(see [2.8.7] and (2.33)).

This correlation is estimated with the sample Estimators that have mean squared error less

correlation than or equal to that of all other estimators of ±

are obviously desirable. However, other properties,

σ jk

ρ jk = . (5.15) such as unbiasedness (de¬ned in [5.3.3]) are

σ j j σ kk

also desirable. In [5.3.7] we show that the mean

squared error may be written as the sum of

the mean squared bias and the variance of the

5.2.8 Estimating L-Moments. Recall that L-

estimator. Because lack of bias is often very

moments (see [2.6.9] and (2.20)“(2.24)) are the

desirable, the search for ef¬cient estimators is

expected values of linear combinations of order

often restricted to unbiased estimators. Thus,

statistics of samples that are the same size as

statisticians often search for minimum variance

the order of the L-moment. For example, the

unbiased estimators. The search is often further

third L-moment is the expected value of a linear

restricted to estimators that can be expressed as

combination of the order statistics of a sample

linear combinations of the random variables that

of size three. The natural way to estimate an L-

make up the sample.

moment [183] is with a U statistic (¬rst described

We will continue to discuss the bias and

by Hoeffding [178]). That is, if the third L-moment

variance of a variety of estimators after formally

is to be estimated, then, at least conceptually, all

de¬ning bias.

possible sub-samples of size three are selected

from the full sample, the linear combination is

5.3.2 De¬nition: Bias. Let ± be a parameter of

computed, as for the expected order statistics, from

the distribution of random variable X and let ±

the order statistics of each sub-sample, and these

be an estimator of this parameter. Then the bias of

linear combinations are averaged. Hosking [183]

estimator ± is its expected, or mean, error, which

uses combinatorial arguments to show that the jth

is given by

L-moment can be estimated as

B(±) = E(±) ’ ±.

j’1

j ’1 j +l ’1 b

»( j) = (’1) j’l’1 l

Positive bias indicates that ± overestimates ±,

l l

l=0

on average, when the experiment that generates

(5.16)

the sample is repeated several times. Similarly,

negative bias indicates that ± underestimates ±, on

where

average. An estimator that has no bias is said to be

unbiased.

n

(i ’ 1)(i ’ 2) · · · (i ’ l)

1

bl = X(i|n) . Positive bias does not imply that all realizations

(n ’ 1)(n ’ 2) · · · (n ’ l)

n

of ± are greater than ±, although that could be

i=1

5.3: Properties of Estimators 85

estimator of σ 2 , while σ 2 (5.11), is a biased

true if B(±) is large compared with the variability

of ±. Also, unless we know something about the estimator of σ 2 . The bias of the latter is given by5

distribution of ±, we can not say what proportion

12

of realizations of ± will be greater than ±. For B(σ 2 ) = σ. (5.20)

n

example, if ± is positively biased and distributed

symmetrically about E(±), then we can say that The bias of S2 and σ 2 is derived as follows. First,

more than 50% of all estimates will be larger than note that

±. However, if the distribution of ± is skewed, then n

(Xi ’ µ)2

we can make this statement only if we know that

the median4 value of ± is greater than ±. Similar i=1

comments apply if ± is negatively biased. n

= (Xi ’ µ ’ µ + µ)2

It is highly desirable to have estimators with

i=1

little or no bias, but, as we will see below, it may

n

be necessary to balance small bias against other

= (Xi ’ µ)2 ’ n(µ ’ µ)2 .

desirable properties.

i=1

Then

5.3.3 The Bias of Some Estimators. We now

1 n

E(σ 2 ) = E k=1 (xk ’ µ)

2

derive the bias of some frequently used estimators.

n

The propositions to be proved appear in italics.

1 n

=E k=1 (xk ’ µ) ’ E (µ ’ µ)

2 2

The empirical distribution function F X (5.3) has n

zero bias as an estimator of the cumulative 1n 2

= σ ’ Var(µ) (5.21)

distribution function FX . That is,

n k=1

B( F X ) = 0. = σ 2 ’ Var(µ).

(5.18) (5.22)

The step that results in (5.21) requires the

To prove this, recall that n F X (y) is the number

˜identically distributed™ assumption. We will show

of random variables Xk in the sample such

below that Var(µ) = n σ 2 if the random variables

1

that Xk < y. As usual, all random variables

in the sample are also independent.6 Thus, (5.20)

are assumed to be independent and identically

is proven. The unbiasedness of S2 follows from the

distributed. Since the random variables are

relationship S2 = ( n’1 )σ 2 .

n

identically distributed, P (Xk ¤ y) = FX (y).

Similar results are obtained for the multivariate

Thus, using independence, we see that the

integer-valued random variable n F X (y) has mean and the sample covariance matrix:

the binomial distribution B(n, FX (y)). Therefore

B(µ) = 0

E n F X (y) = n FX (y) for all y. This proves

B(C) = 0

(5.18).

1

B(Σ) = Σ.

The sample mean µ (5.6) is an unbiased estimator

n

of µ. That is,

The uncertainty of the estimator of the mean vector