B(µ) = 0.

1

Cov µ, µ = Σ,

The proof of (5.19) is straightforward:

n

but the uncertainty of the estimator of the

1 1

E(µ) = E(Xk ) = nE(X) = µ. covariance matrix Σ is not easily characterized

n n

k

5 It is assumed here that the sample consists of iid random

variables. Both estimators are, in general, biased if the

The sample variance S2 (4.5) is an unbiased independence assumption is replaced by the more general

assumption that the sample is obtained from a stationary,

4 The median of a random variable X is a value x ergodic stochastic process.

0.5 such that

P (X ¤ x0.5 ) ¤ 0.5 and P (X ≥ x0.5 ) ≥ 0.5 (see [2.6.4]). If the 6 The bias is caused by the Var(µ) term in (5.22).

distribution of X is symmetric about the mean µ = E(X) (i.e., This term can be considerably greater than σ 2 /n when the

f X (x ’ µ) = f X (x + µ) for all x ≥ 0), then x0.5 = µ. If X is independence assumption is replaced by the stationary and

skewed, with a large tail to the right, x0.5 < µ, and x0.5 > µ if ergodic assumption. Then the ˜memory™ within the sample

tends to in¬‚ate the variance of µ (see Section 6.6).

X is skewed with a large tail on the left.

5: Estimation

86

because it involves all of the fourth moments of Now, using independence, all the expectations in

X. This is possible using the Wishart distribution the last expression vanish except those where k =

when X is multivariate normal [2.8.9] (see [197] j. Consequently

[147]). 1 1

Var(µ) = 2 σ 2 = σ 2.

n

nk

5.3.4 Asymptotically Unbiased Estimators.

We have shown that the empirical distribution

The variance of σ 2 (5.11) is given by

function F X , the sample mean µ = X, and the

sample variance S2 are all unbiased estimators of 1

Var σ 2 = (γ — ’ σ 4 ) (5.25)

the distribution function, of the mean, and of the n

2 1

variance, respectively, when the sample consists of

’ 2 (γ — ’ 2σ 4 ) + 3 (γ — ’ 3σ 4 ),

iid random variables. On the other hand, σ 2 (5.11) n n

is a biased estimator of the variance. Here the bias

where γ — = E (X ’ µ)4 is the fourth central

disappears as sample size increases. Indeed,

moment.7 The variance of S2 is n 2 /(n ’ 1)2 times

lim B(σ 2 ) = 0. the variance of σ 2 .

n’∞

The proof of this result is lengthy but elementary

Estimators with this property are said to be (see [325]).

asymptotically unbiased.

Many biased estimators are asymptotically When the sample consists of iid normal random 2

unbiased, for example, the estimator of the variables, the variance of the sample variance S

correlation coef¬cient ρ (5.15) or the estimator of and the biased variance estimator σ are

2

2(n ’ 1) 4

the L-moments (5.16). Var σ 2 = σ (5.26)

n2

2

5.3.5 Variances of Some Estimators. We Var S 2 = σ 4. (5.27)

n’1

derive here the expression for the variance of

the sample mean used in [5.3.3] as well as For normal random variables, γ2 = 0, so (5.26)

some other results. Again we assume that the and (5.27) are a direct consequence of (5.25).

sample consists of n independent and identically It can be shown that the estimator (5.15) of the

correlation coef¬cient ρ has asymptotic variance

distributed random variables.

The variance of the empirical distribution function equal to (1 ’ ρi j )/n, meaning that

2

F X (5.3) at point x is given by 1 ’ ρi2j

lim Var ρ i j = .

1

Var F X (x) = FX (x)(1 ’ FX (x)). n

n’∞

n We describe the uncertainty of this estimator

(5.23)

when samples are ¬nite in [8.2.3].

The proof of (5.18) shows that n F X (x) ∼ Hosking provides an expression for the asymp-

B(n, FX (x)). Therefore, using (2.9), we obtain totic covariance matrix of the L-moment estimator

Var n F X (x) = n FX (x) (1 ’ FX (x)), proving (5.16), but this expression is dif¬cult to use be-

cause it depends upon the form of the distribution

(5.23).

of the elements of the sample.

The variance of the sample mean µ (5.6) is given

by

5.3.6 Consistency. Another desirable property

12

(5.24) of an estimator is that it be consistent.

Var(µ) = σ .

n

An estimator ± is ˜consistent™ if its mean squared

To demonstrate this we ¬rst note that error (5.17) goes to zero with increasing sample

size. That is, if

Var(µ)

lim M(±; ±) = 0.

n

= E (n k=1 xk ) ’ µ2

1 2

n’∞

n

All of the estimators discussed in [5.3.3]“[5.3.5]

1

=2 E xk x j ’ µ2 can be shown to be consistent using the following

n k, j=1

proposition.

n

1

= E (xk ’ µ)(x j ’ µ) . 7 The fourth central moment is related to the kurtosis via

n2 γ2 = γ — /(σ 4 ’ 3) (see (2.19)).

k, j=1

5.3: Properties of Estimators 87

The consequences of bias correction are

interesting even in this limited context, that is,

where a scale correction will make an estimator

unbiased. In particular, the ˜improved™ ± may not

always be more ef¬cient than the original ±. If

the scaling factor c(n) > 1, then ± is more

ef¬cient than ± because both components of the

expected mean square error, the squared bias, and

the variance, have been reduced. On the other

hand, if c(n) < 1, the bias is reduced but the

variance is enhanced. Thus, it is generally advised

that the ˜improved™ estimator be accepted with

caution.

The scaling factor that turns biased σ 2 into the

unbiased S2 is c(n) = (n ’ 1)/n < 1. The mean

squared error for the unbiased estimator S2 is

2

M(S 2 ; σ 2 ) = Var S 2 = σ 4,

n’1

while that for the biased estimator σ 2 is

1 4 2(n ’ 1) 4

Figure 5.3: Bias and variance contribute to the M(σ ; σ ) = n 2 σ + σ

2 2

n2

expected mean squared error. 2n ’ 1 4

= σ.

n2

The mean squared error of an estimator ± is Since

2n ’ 1

the sum of its squared bias and its variance (see 2

< ,

n’1

Figure 5.3). That is, n2

(5.28) we see that the biased estimator σ 2 is slightly more