5.1.2 Estimation and the ˜iid™ Assumptions.

mean (4.2) and the variance (4.5) are derived in

In Chapter 4 we stressed the importance of the

this setting.

˜iid™ (or sampling) assumptions in the process of

The standard notation used to differentiate a

inference. However, these assumptions are often

parameter p from its estimator is to indicate the

not satis¬ed in climate research. Even so, many

estimator with a hat, as in p. Confusion can

estimators will still produce useful parameter

arise because the notation does not make it clear

estimates. But it is much more dif¬cult (sometimes

when p represents a random variable and when

even impossible) to construct con¬dence intervals

it represents a realization of a random variable.

or other measures of the uncertainty of the point

Estimators should be viewed as random variables

estimate.

unless the context makes it clear that a particular

value has been realized. The language we use also

5.1.3 Some ways in which to violate the ˜iid™

gives verbal cues that help to distinguish between

assumptions. The ˜independence™ assumption is

the two; we generally think of an estimator as

violated when methods that require independence

a function on a sample (and hence as a random

are applied to serially correlated data. A possible

variable) and an estimate as a particular value

solution is to sub-sample the data, that is, remove

that is realized by an estimator. Just to exercise

data from the complete data set until the gaps

this notation, the estimators of the mean and

between the remaining observations are long

variance that are introduced in Section 4.3 are

µ = X, and σ 2 = S2 . Intuitively, these enough to ensure independence.

estimators behave as we would expect. They take Information is generally lost by sub-sampling

values in the neighbourhoods of the true values and the quality of the estimator is not improved

79

5: Estimation

80

setup and the standard normal setup in Chapter 4 is

(in terms of bias or mean squared error; see

that we do not yet assume a speci¬c form for f X .

Section 5.3). The estimate computed from the

sub-sampled data is generally less certain than that Having now set the stage, we carry on to

computed from all of the data. introduce a number of estimators. Whenever

possible, we write the estimators in their random

However, sometimes the use of the entire data

(rather than realized) form to emphasize that they

set leads to problems. For example, when serially

are subject to sampling variability inherited from

correlated data are not evenly distributed in time,

the sampling process.

the use of all of the data can lead to severe biases

(systematic errors).

For example, suppose that we want to estimate 5.2.1 Histograms. The frequency histogram

the expected (i.e., mean) daily summer rainfall at is a crude estimator of the true probability

a location affected by the El Ni˜ o phenomenon

n density function, f X , of X. To obtain a frequency

using a 31-year data set of rainfall observations. A histogram or a relative frequency distribution,

the real line, R (or the complex plane, or the

naive estimate could be constructed by averaging

over all observations without accounting for the multi-dimensional space), is partitioned into K

characteristics of the data set. Suppose that the subsets k such that

data set contains 1 year of very good daily data

K

(obtained during a special observing project) and

=R and (5.1)

k

30 years of once weekly observations. Further,

k=1

suppose that the special observing project took

© =… for k = j.

k j

place during an El Ni˜ o year in which there was

n

a marked lack of rain. If we average over all The number of observations that fall into each k

the available data, then the year of the special is counted, and the total count is divided by the

observing project has seven times more in¬‚uence total number of observations so we obtain

on the estimate than any of the other years. It

|{Xk : Xk ∈ k }|

is very likely, then, that the computed average

k) = ,

H(

n

underestimates the true expected (long-term mean)

rainfall. Sub-sampling is an appropriate solution to where |S| denotes the number of elements in set S.

this problem. H( k ) is an estimator of

The ˜identically distributed™ assumption is

violated when the sampled process is non- P (X ∈ = f X (x) d x,

k)

stationary. For example, if there are annual or k

diurnal cycles in the mean of the sampled process,

which in turn is a discretized approximation of

the sampling method affects the way in which

the density function f X . Consequently, the random

an estimated mean can be interpreted. A data

step function

set that contains observations taken at frequent,

equally spaced intervals over an integral number H( k)

f X (x) = if x ∈ k, (5.2)

of years or days will provide good estimates of the

dx

annual or daily mean respectively. On the other k

hand, if all the data come from winter, or from

is a crude estimator of the true density function.1

the early morning, then the estimate will not be

The denominator in (5.2) is the area of subset

representative of the true annual mean value.

k (or the length of the interval, if the

partitions (5.1) are intervals, as is often true). The

denominator in (5.2) has been introduced to ensure

R f X (x) d x = 1. It turns out, with suitable

5.2 Examples of Estimators

regularity conditions, that this estimator converges

to the true density function as sample size n ’ ∞

5.2.0 The Setting. We again assume that the

if the number of elements in each subset tends

result of the sampling process can be represented

to in¬nity as the sample size n ’ ∞, and if

by a sample of n independent and identically

distributed random variables {X1 , . . . , Xn }. In the number of subsets k also goes to in¬nity as

n ’ ∞.

general, we use X to represent any of the iid

random variables in the sample and assume that 1 Kernel type density estimators produce much better density

the (common) probability density function of X function estimates. See, for example, Silverman [350] or Jones,

is f X (·). The only difference between the current Marron, and Sheather [200].

5.2: Examples of Estimators 81

The histogram is also an estimator of prob-

0.0 0.2 0.4 0.6 0.8 1.0

abilities. The probability that X ∈ [a, b] is

conveniently estimated by

b

P(X ∈ [a, b]) = f X (x) d x

a

|{Xk : Xk ∈ [a, b]}|

= = H([a, b]).

n

That is, the probability of obtaining an observation

in a given interval or region the next time the