q À 1 . 0 and na . p, along with their variances. This clearly includes the

parameters themselves as special cases, as well as the moments, the harmonic mean,

the mode, the coef¬cient of variation, the mean excess function, and the Gini index

(see Section 3.6.7 below), but not the quantiles, the geometric mean, and

the skewness and kurtosis coef¬cients. The resulting family of estimators can be

expressed in terms of a difference in the con¬‚uent hypergeometric function and the

variances in terms of a bivariate hypergeometric function.

3.6.5 Robust Estimation

A well-known problem with ML estimators (and indeed many classical estimators) is

that they are very sensitive to extreme observations and model deviations such as

gross errors in the data. Victoria-Feser (1993) and Victoria-Feser and Ronchetti

(1994) proposed robust alternatives to ML estimation in the context of income

distribution models. Following Hampel et al. (1986), they assessed the robustness of

a statistic Tn ¼ Tn (x1 , . . . , xn ) in terms of the in¬‚uence function. In order to de¬ne

this function, it is convenient to consider Tn as a functional of the empirical

distribution function

1X n

dx (x),

Fn (x) ¼

n i¼1 i

where dx denotes a point mass in x. If we write T (Fn ) :¼ Tn (x1 , . . . , xn ), the in¬‚uence

function (IF) at the (parametric) model Fu , u [ Q # IRk , is de¬ned by the

population counterpart of T (Fn ), namely, T (Fu ), as

T [(1 À e)Fu þ edx ] À T (Fu )

IF(x; T ; Fu ) ¼ lim , (3:87)

e

e!0

that is, as the directional derivative of T at Fu in the direction of dx .

87

3.6 ESTIMATION

The IF describes the effect of a small contamination (namely, edx ) at a point x on

the functional/estimate, standardized by the mass e of the contamination. Hence, the

linear approximation eIF(x; T ; Fu ) measures the asymptotic bias of the estimator

caused by the contamination.

In the case of the ML estimator, the IF is proportional to the score function

s(x; u) ¼ (@=@u) log f (x; u). In the Pareto case we have

1

s(x; a) ¼ À log x þ log x0 ,

a

which is seen to be unbounded in x. Thus, a single point can carry the MLE

arbitrarily far. (This is also the case for most other size distributions.) Clearly,

a desirable robustness property for an estimator is a bounded IF. Estimators

possessing this property are referred to as bias-robust (or, more concisely, B-robust)

estimators. An optimal B-robust estimator (OBRE) as de¬ned by Hampel et al.

(1986) belongs to the class of M estimators, that is, it is a solution Tn of the system

of equations

X

n

c (xi ; Tn ) ¼ 0

i¼1

for some function c.

The OBRE is optimal in the sense that it is the M estimator that minimizes the

trace of the asymptotic covariance matrix under the constraint that it has a bounded

in¬‚uence function. There are several variants of this estimator depending on the way

one chooses to bound the IF. Victoria-Feser and Ronchetti (1994) employed the

so-called standardized OBRE. For a given bound, c, say, on the IF, it is de¬ned

implicitly by

X X

n n

c (xi ; Tn ) ¼ {s(xi ; u) À a(u)}Wc (xi ; u) ¼ 0,

i¼1 i¼1

where

& '

c

Wc (xi ; u) ¼ min 1; :

kA(u){s(x; u) À a(u)}k

Here the k ‚ k matrix A(u) and k ‚ 1 vector a(u) are de¬ned implicitly by

E{c(x; u)c(x; u)` } ¼ {A(u)A(u)` }À1

and

Ec(x; u) ¼ 0:

88 PARETO DISTRIBUTIONS

The idea behind the OBRE is to have an estimator that is as similar as possible to

the ML estimator for the bulk of the data (for ef¬ciency reasons) and therefore to use

the score as its c function for those values and to truncate the score if a certain

bound c is exceeded (for robustness reasons). The constant c can be considered the

regulator between robustness and ef¬ciency: for small c the estimator is quite robust

but loses ef¬ciency relative to the MLE, and vice versa for large c. The matrix A(u)

and vector c(x; u) can be considered the Lagrange multipliers for the constraints

resulting from a bounded IF and the condition of Fisher consistency, T (Fu ) ¼ u.

Victoria-Feser and Ronchetti suggested using a c so that 95% ef¬ciency at

the model is achieved. In the Pareto case this means that c ¼ 3 should be employed.

For computational purposes they recommend an algorithm based on the Newton “

Raphson method using the MLE, a trimmed moment estimate, or a less robust

OBRE (large c) as the initial value. We refer the interested reader to Victoria-Feser

and Ronchetti (1994) for further algorithmic details.

Also aiming at a favorable tradeoff between ef¬ciency and robustness, Brazauskas

and Ser¬‚ing (2001a) proposed a new class of estimators called the generalized median

(GM) estimators. These are de¬ned by taking the median of the evaluations

h(X1 , . . . , Xn ) of a given kernel h(x1 , . . . , xn ) over all subsets of observations taken k at

a time [of which there are ( n )]. Speci¬cally, in the case of the Pareto parameter a,

k

aGM ¼ med{h(Xi1 , . . . , Xik )},

^ (3:88)

where {i1 , . . . , ik } is a set of distinct indices from {1, . . . , n}, with two particular

choices of kernel h(x1 , . . . , xn ):

1 1

h(1) (x1 , . . . , xn ) ¼ (3:89)

Pk

Ck k À1 j¼1 log xj À log X1:k

and

1 1

h(2) (x1 , . . . , xn ; X1:n ) ¼ : (3:90)

Pk

Cn,k k À1 j¼1 log xj À log X1:n

Here Ck and Cn,k are median unbiasing factors, chosen in order to assure that

in each case the distribution of h( j) (Xi1 , . . . , Xik ), j ¼ 1, 2, has median a. For n ¼ 50,

100, and 200, Brazauskas and Ser¬‚ing provided the approximation Cn,k % k=

[k(1 À 1=n) À 1=3]. Note that the kernel h(1) can be viewed as providing the MLE

based on a particular subsample and thus inherits the ef¬ciency properties of the MLE

in extracting the information about a pertaining to that sample. h(2) is a modi¬cation

that always employs the minimum of the whole sample instead of the minimum of the

particular subsample.

In Brazauskas and Ser¬‚ing (2001b), these estimators are compared to several

estimators of trimmed mean and quantile type with respect to ef¬ciency-robustness

tradeoffs. Ef¬ciency criteria are exact and asymptotic relative MSEs with respect to

89

3.6 ESTIMATION

the MLE, robustness criterion is the breakdown point, with upper outliers receiving