Pareto (II) yielding inferior results (Suruga, 1982). Compared to eight other

distributions utilizing income data from the 1975 Japanese Income Redistribution

Survey (in grouped form), the lognormal ranks 6“8th for various strata in terms of SSE

and information criteria (Atoda, Suruga, and Tachibanaki, 1988). In a later study

modeling individual incomes from the same source, Tachibanaki, Suruga, and Atoda

(1997) employed six different distributions; here the lognormal is almost always the

worst model.

Henniger and Schmitz (1989) considered the lognormal distribution when

comparing various parametric models (including, among others, gamma and

Singh “Maddala) for the UK Family Expenditure Survey for the period 1968 “

1983 to nonparametric ¬ttings. However, for the whole population all parametric

models are rejected; for subgroups models such as the Singh “ Maddala or

Fisk perform considerably better than the lognormal, in terms of goodness-of-¬t

tests.

129

4.9 EMPIRICAL RESULTS

Summarizing research on the distribution of income in Poland over 50 years,

Kordos (1990) observed that the two-parameter lognormal distribution describes the

Polish data until 1980 with a reasonable degree of accuracy. In particular, for the

distribution of monthly wages in 1973 the lognormal model compares favorably with

alternative beta type I, beta type II, and gamma ¬ttings.

Bordley, McDonald, and Mantrala (1996) ¬t the lognormal model to U.S. family

incomes for 1970, 1975, 1980, 1985, and 1990. For all ¬ve data sets the distribution

is outperformed by 13 out of 15 considered distributions, mainly of beta and gamma

type, by very wide margins; only the (one-parameter) exponential distribution does

worse.

Creedy, Lye, and Martin (1997) estimated the two-parameter lognormal

distribution for individual earnings from the 1987 U.S. Current Population Survey

(March Supplement). The distribution is outperformed, by wide margins, by a

generalized lognormal-type distribution; see (4.67) below, as well as the standard

and a generalized gamma distribution.

Botargues and Petrecolla (1997, 1999a,b) ¬t the lognormal distribution to the

labor incomes for the province of greater Buenos Aires for each year from 1992 “

1997. However, the model is outperformed by several other distributions, notably the

Dagum models.

Wealth Data

Sargan (1957) considered British wealth data for 1911 “1913, 1924“ 1930, 1935 “

1938, and 1946 “1947. Graphical methods indicate a fairly good approximation to a

lognormal distribution.

Chesher (1979) estimated a lognormal model for the distribution of wealth in

Ireland (grouped into 26 classes) in 1966 over the population of individuals with a

recorded estate size. It is clear that the lognormal distribution is superior to the Pareto

distribution on these data, with x 2 and likelihood improvements of about 93%. In view

of the conventional wisdom that the Pareto distribution is an appropriate model for the

upper tail, it is particularly noteworthy that the ¬t is “unexpectedly good in the upper

tail” (p. 7).

Bhattacharjee and Krishnaji (1985) ¬t the lognormal distribution to Indian data of

landholdings, for 14 states for 1961“ 1962. However, the distribution is out-

performed by both the gamma and loggamma distributions.

Firm Sizes

In his pioneering research, Kalecki (1945) considered the size distribution of factories

(size being de¬ned as “number of workers”) in the U.S. manufacturing industry in

1937, ¬nding the agreement between actual and calculated series to be “fairly good.”

Observing that the empirical Lorenz curve for British ¬rm size data is roughly

symmetric about the alternate diagonal of the unit square, Hart and Prais (1956)

approximated the distribution of ¬rm sizes by a lognormal distribution, the best-

known distribution possessing this property. However, in the discussion of the Hart

and Prais paper, their choice was criticized by Champernowne (1956) and

Kendall (1956), both of whom provided general expressions for distributions with

this property.

130 LOGNORMAL DISTRIBUTIONS

Quandt (1966a), in a study investigating the distribution of ¬rm sizes (size being

measured in terms of assets) in the United States, found the lognormal distribution to

be more appropriate than the Pareto distributions of types I“ III for his data. He

considered the Fortune lists of the 500 largest ¬rms in the United States in 1955 and

1960, and 30 samples representing industries according to four-digit S.I.C. classes.

However, Pareto type I and II distributions appear to ¬t the two Fortune samples

rather well.

More recently, Stanley et al. (1995) used the lognormal to model the size

distribution of American ¬rms (by sales), noting that the model overpredicts in the

upper tails. Hart and Oulton (1997) studied the size distribution (by employment) of

50,441 independent UK ¬rms in 1993 and arrived at the opposite conclusion”for

UK ¬rms there is excess mass in the upper tail compared to a lognormal benchmark

model. Voit (2001) considered the size (de¬ned in terms of annual sales) of 570

German ¬rms over the period 1987 “ 1997. He noted that for these data the

lognormal lower tail decreases too fast toward the abscissa.

Insurance Losses

The lognormal distribution is favored by a number of studies for a diverse variety of

types of insurance.

Benckert (1962) studied industrial and nonindustrial ¬re losses and business

interruption and accident insurance as well as automobile third-party insurance in

Sweden for 1948 “ 1952.

Ferrara (1971) employed a three-parameter lognormal distribution for modeling

industrial ¬re losses in Italy for the period 1963 “1965.

Benckert and Jung (1974) studied ¬re insurance claims for four types of houses in

Sweden for the period 1958 “1969, concluding that for one class of buildings (“stone

dwellings”) the lognormal distribution provides a reasonable ¬t.

Considering automobile bodily injury loss data, Hewitt and Lefkowitz (1979)

employed the two-parameter lognormal distribution as well as a lognormal-gamma

mixture. The latter model performs considerably better on these data.

Hogg and Klugman (1983) ¬t the lognormal distribution to a small data set (35

observations) of hurricane losses and found that it ¬ts about as well as the Weibull

distribution. They also considered data for malpractice losses, for which (variants of )

Pareto distributions are preferable to the lognormal distribution.

Cummins et al. (1990) ¬t the two-parameter lognormal distribution to aggregate

¬re losses. However, most of the distributions they considered (mainly of the gamma

and beta type) seem to be more appropriate. The same authors also considered data

on the severity of ¬re losses and ¬t the lognormal distribution to both grouped and

individual observations. Again, most of the other distributions they considered do

considerably better for these data.

Burnecki, Kukla, and Weron (2000) used the lognormal distribution when model-

ing property insurance losses and found that it outperforms the Pareto distribution for

these data.

Overall, it would thus seem that the popular lognormal distribution is not the best

choice for modeling income, ¬rm sizes, and insurance losses.

131

4.10 GENERALIZED LOGNORMAL DISTRIBUTION

4.10 GENERALIZED LOGNORMAL DISTRIBUTION

The material in this section has been collected from diverse sources and to the best

of our knowledge appears here in uni¬ed form for the ¬rst time.

Vianelli (1982a,b, 1983) proposed a three-parameter generalized lognormal

distribution. It is obtained as the distribution of X ¼ exp Y ; where Y follows a

generalized error distribution, with density

& '

1 1 r

À1 , y , 1,

exp À r jy À mj ,

f ( y) ¼ 1=r (4:41)

2r sr G(1 þ 1=r) rsr

where À1 , m , 1 is the location parameter, sr ¼ [EjY À mjr ]1=r is the scale

parameter, and r . 0 is the shape parameter. Like many of the distributions discussed

in this book, the generalized error distribution is known under a variety of names and

it was (re)discovered several times in different contexts. For r ¼ 2 we arrive at the

normal distribution and r ¼ 1 yields the Laplace distribution. The generalized error

distribution is thus known as both a generalized normal distribution, in particular in

the Italian literature (Vianelli, 1963), and a generalized Laplace distribution. The

generalized form was apparently ¬rst proposed by Subbotin (1923) in a Russian

publication. Box and Tiao (1973) called it the exponential power distribution, the

name under which this distribution is presumably best known in statistical literature,

and used the following parameterization of the p.d.f.:

& '

1 1

exp À jy À mj2=(1þb) , À1 , y , 1,

f ( y) ¼ (3þb)=2

sG[(3 þ b)=2]

2 2s

(4:42)

where À1 , b 1, s . 0: Here b ¼ 0 corresponds to the normal and b ¼ 1 to the