2

lognormal characteristics depending only on the shape parameter s”such as the

coef¬cient of variation, the Gini coef¬cient, and various other inequality measures”

is equivalent to a test for equality of the variances computed for the logarithms of the

data, having a normal distribution, as noted by Iyengar (1960) who used this

approach when testing for the equality of Gini coef¬cients. Thus, the problem may

be solved, in the case where k independent samples are available, by the classical

Bartlett test for the homogeneity of variances. If dependence must be taken into

account, Singh and Singh suggested the following LR statistic:

Q ¼ nlog jGj þ n tr G À nk,

^2 ^2

where G ¼ I þ ns0 S À1 À S À1 Sd , s0 ¼ tr Sd =nk; and S is a standard estimator of

P

the covariance matrix, namely, S ¼ (sij ); where sij ¼ (Yil À Y i )(Yjl À Y j ) and Sd is

126 LOGNORMAL DISTRIBUTIONS

a diagonal matrix de¬ned in terms of the diagonal of S: The approximate distribution

2

of Q is a xkÀ1 :

4.9 EMPIRICAL RESULTS

Being one of the two classical size distributions, a large number of empirical studies

employing lognormal distributions are available.

Income Data

The lognormal distribution has been ¬t to various income data for at least the last

50 years. One of the earliest investigations was completed by Kalecki (1945) who

considered it for the United Kingdom personal incomes for 1938“ 1939. He found

the two-parameter lognormal ¬t for the whole range of incomes to be quite poor, but

a two-parameter model for incomes above a certain threshold”that is, a three-

parameter lognormal distribution”provides a good approximation.

Champernowne (1952) employed the three-parameter lognormal when studying

Bohemian data of 1933. He found that a two-parameter log-logistic distribution ¬ts

as good as the lognormal.

Steyn (1959, 1966) considered income data for South African white males for

1951 and 1960 that are adequately described by a mixture of a lognormal and a

doubly truncated lognormal distribution.

Employing a three-parameter lognormal distribution, Metcalf (1969) studied

the changes between three distributional characteristics”the mean as well as income

levels at certain bottom and top quantiles divided by median income”and aggregate

economic activity by means of regression techniques for the period 1949 “1965.

He established separate patterns of movement in these measures for each of three

family groups: families with a male head and a wife in the paid labor force, families

with a male head and a wife not in the labor force, and families with a female head

(these groups received about 88% of all personal income and almost 98% of all personal

income going to families for the period under study). In particular, increases in real

wages and employment rates appear to improve the relative position of low-income

families that are labor-force-oriented and to lower the relative”but not absolute”

position of high-income families. Also, families with a female head responded less

elastically to employment and real wage changes than did families with a male head.

Using nonparametric bounds on the Gini coef¬cient developed by Gastwirth

(1972), Gastwirth and Smith (1972) found that the implied Gini indices derived from

two- and three-parameter lognormal distributions fall outside these bounds for U.S.

individual gross adjusted incomes for 1955“ 1969 and concluded that lognormal

distributions are inappropriate for modeling these data.

In a very thorough and exhaustive study, Kmietowicz and Webley (1975) ¬t the

lognormal distribution to data for rural households from the 1963 “1964 Income and

Expenditure Survey of the Central Province of Kenya. They employed various ¬tting

procedures in order to cope with some peculiarities of the data and found that the ¬t

is better for the entire province than for any of its ¬ve districts. Also, they used

127

4.9 EMPIRICAL RESULTS

lognormal distributions to “predict” the size distribution for urban households, for

which only the average household income was available.

Kloek and van Dijk (1977) ¬t the lognormal distribution to Australian family

disposable incomes for the period 1966 “1968, disaggregated by age of the head of

the family, occupation and education of the head of the family, and by family size.

For some subsamples, the ¬t of the distribution is comparable to the log-t (which has

one additional parameter); however the Champernowne distribution often performs

better.

Kloek and van Dijk (1978) considered 1973 Dutch earnings data, to which they

¬t several income distributions. They found that a substantially better approximation

(compared to the two-parameter lognormal distribution) is obtained by using three-

and four-parameter families such as the log-t or Champernowne distributions.

McDonald and Ransom (1979a) considered the distribution of U.S. family

income for 1960 and 1969 through 1975. When compared to alternative beta,

gamma, and Singh “Maddala approximations using three different estimation

techniques, the lognormal always provides the worst approximation in terms of

sum of squared errors (SSE) and chi-square criteria.

In a detailed study comparing the performance of the Pareto and lognormal

distributions, Harrison (1979, 1981) considered the gross weekly earnings of 91,968

full-time male workers aged 21 and over from the 1972 British New Earnings

Survey, disaggregated by occupational groups and divided into 34 earnings ranges.

For the aggregated data he found that the main body of the distribution comprising

85% of the total number of employees is “tolerably well described” by the lognormal

distribution, whereas for the (extreme) upper tail the Pareto distribution is “distinctly

superior.” However, he pointed out that “the lognormal performs less well, even in

the main body of the distribution, than is usually believed . . . ; and a strict

interpretation . . . suggests that it [the Pareto distribution] applies to only a small part

of the distribution rather than to the top 20% of all employees” [as implied

approximately by Lydall™s (1968) model of hierarchical earnings]. When

disaggregated data divided into 16 occupational groups is considered, the ¬t of

the lognormal distribution improves considerably, the strongest evidence for

lognormality being found for the group “textiles, clothing and footwear.”

Nonetheless, there are still problems in the tails for some distributions, with the

dif¬culties being more persistent in the lower tail in a number of cases.

Dagum (1983) estimated a two-parameter lognormal distribution for 1978 U.S.

family incomes. The distribution is outperformed by wide margins by the Dagum

type III and type I as well as the Singh “Maddala distribution (four- and three-

parameter models, respectively), and even the two-parameter gamma distribution

does considerably better. In particular, the mean income is substantially

overestimated.

For the French wages strati¬ed by occupation for 1970 “ 1978 the three-parameter

model outperforms a three-parameter Weibull distribution as well as a four-

parameter beta type I model, but the Dagum type II, the Singh “Maddala, and a

Box“ Cox-transformed logistic appear to be more appropriate for these data

(Espinguet and Terraza, 1983).

128 LOGNORMAL DISTRIBUTIONS

Arguing that in most studies income distribution functions are put forth as

approximate descriptive devices that are not meant to hold exactly, Ransom and

Cramer (1983) suggested employing a measurement error model, viewing observed

income as the sum of a systematic component and an independent N (0, s 2 ) error

term. Utilizing models with systematic components following Pareto, lognormal,

and gamma distributions, they found that the lognormal variant performs best in

terms of chi-square statistics for U.S. family incomes for 1960. However, these

goodness-of-¬t tests still reject all three models.

McDonald (1984) estimated the lognormal distribution for 1970, 1975, and

1980 U.S. family incomes. However, the distribution is outperformed by 9 out of

10 alternative models (of gamma or beta type), for all three data sets. In McDonald

and Xu (1995), the distribution is outperformed by all 10 alternative models, again

mainly of beta and gamma type, for 1985 U.S. family incomes.

Kmietowicz (1984) used a bivariate lognormal model for the distribution of

household size and income when analyzing a subsample consisting of 200 rural

households from the Household Budget and Living Conditions Survey, Iraq, for the

period 1971 “ 1972. The model was also ¬tted to data from other household budget

surveys, namely, Iraq 1971 “1972 (urban), Iraq 1976 (rural and urban), and Kenya

1963 “1964 (rural). In all cases, the distribution of household income per head

follows the lognormal distribution more closely than the marginal distribution of

household income.

Kmietowicz and Ding (1993) considered the distribution of household incomes

per head in the Jiangsu province of China (the city of Shanghai is located at the

southeastern tip of this province), for 1980, 1983, and 1986. The ¬t is quite poor for

the 1980 data but somewhat better for 1983 incomes; however, chi-square goodness-

of-¬t tests reject the lognormal distribution as an appropriate model. For 1986 this is

no longer the case and hence the lognormal distribution may be considered

appropriate for these data. (Note the substantial economic changes in China

beginning in 1982 “ 1983.)

For Japanese incomes for 1963“1971 the distribution is outperformed by the