and moreover have opposite signs (compared to Pareto™s work), contradicting

Pareto™s remark that the shift parameter m is positive for earnings from employment.

In a study aimed at a reassessment of the “conventional wisdom” that earnings are

approximately lognormally distributed but with an upper tail that is better modeled

by a Pareto distribution, Harrison (1981) considered the gross weekly earnings of

91,968 full-time male workers aged 21 and over from the 1972 British New Earnings

Survey, disaggregated by occupational groups and divided into 34 earnings ranges.

Although for the aggregated data he found that the main body of the distribution

comprising 85% of the total number of employees is “tolerably well described” by

the lognormal distribution for the (extreme) upper tail, the ¬t provided by the Pareto

distribution is “distinctly superior.” Speci¬cally, there is evidence for a fairly stable

Pareto tail with a coef¬cient in the vicinity of 3.85. However, he pointed out that

“ . . . a strict interpretation . . .suggests that . . .[the Pareto distribution] applies to only

a small part of the distribution rather than to the top 20% of all employees”

[as implied by Lydall™s (1959) model of hierarchical earnings]. When disaggregated

92 PARETO DISTRIBUTIONS

data divided into 16 occupational groups are considered, the stability of the Pareto

tail often disappears, with estimates for the Pareto coef¬cient varying quite markedly

for different lower thresholds.

Ratz and van Scherrenberg (1981) studied the distribution of incomes of

registered professional engineers, in all disciplines, in the province of Ontario,

Canada, annually for the period 1955 “1978. Modeling the relationship between the

distribution™s parameters and years of experience by regression techniques, they

found a negative relation between that variable and the shape parameter a. Thus,

evidence exists that the incomes of more experienced engineers are more spread out

than those of engineers at the beginning of their careers, a fairly intuitive result.

Ransom and Cramer (1983) argued that in most studies income distribution

functions are put forth as approximate descriptive devices that are not meant to hold

exactly. They therefore suggested employing a measurement error model, viewing

observed income y as the sum of two independent variates, y ¼ x þ u, where x is the

systematic component and u is a N(0, s2 ) error term. Fitting a model of this type with a

systematic component following a Pareto distribution to U.S. family incomes for 1960

and 1969, they discovered that the error accounts for about a third of the total variation,

rendering the underlying distribution almost meaningless. The results are also inferior to

those of other three-parameter models, notably the Singh“Maddala distribution.

Cowell, Ferreira, and Litch¬eld (1998) studied income distribution in Brazil over

the 1980s (the decade of the international debt crisis). Brazil is a rather interesting

country for scholars researching the distribution of income because it exhibits one of

the most unequal distributions in the world, with 51% of total income going to the

richest 10% and only 2.1% going to the poorest 20% (in 1995). Applying

nonparametric density estimates, the authors found that the conventional nonpara-

metric approach employing a normal kernel and a ¬xed window width does not seem

to work well with data as heavily skewed as these. When ¬tting a Pareto distribution

through incomes above $1,000, it turns out that inequality among the very rich was

not too extreme in 1981, with an a in the vicinity of 3. However, the situation

considerably worsened over the 1980s and a decreased to about 2 in 1990.

More recently, analyzing the extreme right tail of Japanese incomes and income

tax payments for the ¬scal year 1998, Aoyama et al. (2000) obtained estimates of a

in the vicinity of 2.

From these studies (and the older ones cited in Chapter 1), it emerges that the

Pareto distribution is usually unsuitable to approximate the full distribution of

income (as has long been known). However, it should be noted that the distribution

has been successfully used for interpolation purposes in connection with grouped

income data where is often desirable to introduce a distributional assumption for the

open-ended category (e.g., “U.S. $100,000 and over”). See Cowell and Mehta (1982)

or Parker and Fenwick (1983) for further discussion of this topic.

Wealth Data

Steindl (1972) obtained an estimate of 1.7 for a from Swedish wealth data of 1955

and 1968. For Dutch data, he found that the Pareto coef¬cient increased slightly

from 1.45 in 1959 to 1.52 in 1967.

93

3.7 EMPIRICAL RESULTS

Chesher (1979) estimated a Pareto type I model for Irish wealth data (grouped

into 26 classes) of 1966, obtaining an estimate of a as low as 0.45. However, it

emerges that the lognormal distribution performs much better on these data, with x2

and likelihood improvements of about 93%. The ¬t of the Pareto distribution is very

poor in the upper tail, 274 times the observed number of individuals being predicted

in the highest wealth class. Chesher attributed this to the fact that the sparsely

populated upper classes carry little weight in the multinomial ML procedure he

employed. When attempting to incorporate the 65% of individuals whose estate size

is unrecorded, the Pareto distribution outperforms the lognormal distribution only

for individuals whose wealth exceeds £40,000 (comprising only 0.4% of the

population). Thus, the Pareto distribution does not seem to be appropriate for

these data.

Analyzing data from the 1996 Forbes 400 list of the richest people in the United

States, Levy and Solomon (1997) obtained an estimate of the Pareto coef¬cient of

1.36. Since the data comprise only extremely large incomes, their agreement with a

Pareto tail is quite adequate.

Firm Sizes

In a classical study on the size distribution of ¬rms, Steindl (1965) obtained Pareto

coef¬cients in the range between 1.0 and 1.5. For all corporations in the United

States in 1931 and 1955 (by assets), the parameter a is approximately equal to 1.1;

for German ¬rms in 1950 and 1959 (by turnover), it is about 1.1 in manufacturing

and about 1.3 in retail trade, whereas for German ¬rms in 1954 (by employment) it is

about 1.2 in manufacturing.

Quandt (1966a) investigated the distribution of ¬rm sizes (size being measured in

terms of assets) in the United States. Using the Fortune lists of the 500 largest ¬rms

in 1955 and 1960 and 30 samples representing industries according to four-digit

S.I.C. classes, he concluded that the Pareto types I“ III (the Pareto type III is

erroneously referred to as the Champernowne distribution in his paper) are

appropriate models for only about half of the samples. The best of the three appears

to be the type III variant, while the classical Pareto type I provides only six adequate

¬ts. Pareto type I and II distributions seem to be appropriate for the two Fortune

samples, however. Overall, the lognormal distribution does considerably better than

Pareto distributions for these data.

Engwall (1968) studied the largest ¬rms (according to sales) in 1965 within ¬ve

areas: the United States, all countries outside the United States, Europe, Scandinavia,

and Sweden, obtaining a shape parameter a between 1 and 2 in all cases.

More recently, Okuyama, Takayasu, and Takayasu (1999) obtained Pareto

coef¬cients in the range (0.7, 1.4). They considered annual company incomes

utilizing Moody™s Company Data and Moody™s International Company Data as well

as Japanese data on companies having incomes above 40 million yen (the former

databases comprise about 10,000 U.S. companies and 11,000 non-U.S. companies,

respectively; the latter comprises 85,375 Japanese ¬rms). It turns out that there are

not only differences between countries but also between industry sectors, thus

con¬rming earlier work by Takayasu and Okuyama (1998).

94 PARETO DISTRIBUTIONS

It becomes clear that for ¬rm sizes the Pareto coef¬cient is somewhat smaller

than for incomes and bounded by 2 from above, implying ¬rm size distributions in

the domain of attraction of a nonnormal stable law.

Insurance Losses

A number of researchers have suggested the use of the Pareto distribution as a

plausible model for ¬re loss amount.

Benckert and Sternberg (1957) postulated the Pareto law for the distribution of

¬re losses in Swedish homes during the period 1948 “ 1952, obtaining estimates in

the vicinity of 0.5 for the damages to four types of houses.

Andersson (1971) used the Pareto distribution to model ¬re losses in the Northern

countries (Denmark, Finland, Norway, and Sweden) for the periods 1951 “ 1958 and

1959 “1966, obtaining Pareto coef¬cients in the range from 1.25 to 1.76 and

con¬rming an international trend toward an increase in the number of large claims

from the ¬rst to the second period, as measured by a decrease in the parameter a.

However, this trend appeared to be less pronounced for the Northern countries.

As was the case with ¬rm sizes, it is noteworthy that ¬re insurance data seem to

imply an a less than 2, thus pointing toward distributions in the domain of attraction

of a nonnormal stable law. However, for automobile insurance data Benktander

^

(1962) obtained a considerably larger estimate of a ¼ 2:7. The paper by Seal (1980)

contains a more extensive list of estimates of a compiled from the early actuarial

literature up to the 1970s.

We must reitereate that a number of studies using “the Pareto distribution,” notably

in the actuarial literature, actually employ not the classical Pareto distribution (3.1) but

the Pareto type II distribution that is a special case of the beta type II (Pearson type VI)

distribution. They will therefore be mentioned in Chapter 6.

3.8 STOPPA DISTRIBUTIONS

Stoppa (1990a,b) proposed a generalization of the classical Pareto distribution by

introducing a power transformation of the Pareto c.d.f. Thus, the c.d.f. of the Stoppa