Are all large-sample series Normally Distributed?

Some formal mathematical theorems and proofs support the theory that

“as the sample size gets larger most Density Functions become more like

the Normal Density Function.” Therefore, for example, if a series has a

left skewed Density Function when a sample of 20 observations is used, it

may also behave more like a symmetrical (that is, a zero“skewed) Normal

Density Function if the sample size is, for example, 1000 observations.

(Even if the Density Function does not have the classic bell“shape of a

116

Chapter 7: Probabiity Density Functions & Confidence Intervals

normal curve, it can behave like a Normal Density Function if it satisfies

” to a sufficient extent ” the conditions that imply normality “

• The fact that the mode, mean and median are very close to each

other

• An additional feature is that the Density Function is roughly

symmetrical around the mode/median/mean.

Statistics & Econometrics: Dependence of Methodologies on the

assumption of Normality

Assuming that variables are distributed Normally is a practice that

underlies” and even permits” most hypothesis testing in econometrics

and statistics. Without this assumption, statistics, as we know it, would

lose much of its power to estimate coefficients and establish relationships

amongst variables.

Assume you have three variables ” X1, X2, and X3. X1 is measured in

dollars with a mean of $2.30, X2 also in dollars with a mean of $30,000

and X3 in tons. You assume that all the variables are distributed

Normally. This permits you to make inferences about the series. Once

you know the mean and standard deviation for X1, you can make

statements like “60% of the values of X1 lie below $2.62,” “Between the

values $24,000 and $28,000, we will find that 18% of the values of X2 will

lie,” or, “Over 40% of the values of X3 lie below 24 tons.” (Note: the

figures are chosen arbitrarily). This is fine. But the problem is that the

relation between the “mean, standard deviation, X values and probability”

must be calculated anew for each of the variables because they are

measured in different units (dollars versus tons) or/and on different

scales and ranges (X1 versus X2 in our example).

117

Statistical Analysis with Excel

This limits the usefulness of using the Normal Density Function to assess

the relation between series values and the probability of values occurring

less than, equal to or above them. In practical terms, you would need a

statistics textbook that lays out the relationship between an X value and

probability for all possible combinations of mean and standard deviation!

The Standard Normal and its power

Luckily, a method removes the need for such exhaustive table listings.

This method involves rescaling all series that follows a Normal Density

Functions to a common scale such that, on the new scale, the variables

have a mean/mode/median of zero and a standard deviation of one. The

process is called “standardization” and this standardized Density

Function is called the standard Normal Density Function or the Z “

Density Function.

The Z “scores are also used to standardize the Density Functions of the

means of variables or the estimates of statistical coefficients. If the

standard error of mean for the population from which the sample is

unknown (as is typically the case), then the T Density Function is used

instead of the Z Density Function.

7.2.A THE PROBABILITY DENSITY FUNCTION (PDF) AND

CUMULATIVE DENSITY FUNCTION (CDF)

PDF:

NORMDIST (x, mean, standard deviation, false) probability of values

taking the value X

CDF:

118

Chapter 7: Probabiity Density Functions & Confidence Intervals

NORMDIST (x, mean, standard deviation, true) probability of values

lying to the left of X

Figure 106: The dialog for estimating the probability associated with a value of a point in a

series that follows a Normal Density Function

Figure 107: The Cumulative Density Function (CDF) for a series that follows a Normal

Density Function. The arrows show the value to the left of which lie 95% of the values in the

Density Function.

The Cumulative Density Function (CDF) is the integral of the function on

the right hand side in the above equation. The range of integration is

negative infinity (or the population minimum) to the X value being

studied.

Menu path to function: INSERT / FUNCTION / STATISTICAL /

NORMDIST.

Data requirements: The data series should follow the assumed Density

Function type (Normal).