” ASSUMPTIONS UNDERLYING REGRESSION MODELS

” CONDUCTING THE REGRESSION

This chapter requires the Analysis ToolPak Add-Ins; chapter 9

shows how to learn how to launch the Add-Ins.

ASSUMPTIONS UNDERLYING REGRESSION

12.1

MODELS

The field of econometrics uses regression analysis to create quantitative

models that can be used to predict the value of a series if one knows the

value of several other variables. For example, the wage per hour can be

predicted if one knows the values of the variables that constitute the

regression equation. This is a big leap of faith from a correlation or

Confidence interval estimate. In a correlation, the statistician is not

presuming or implying any causality or deduction of causality. On the

other hand, regression analysis is used so often (probably even abused)

because of its supposed ability to link cause and effect. Skepticism of

causal relationships is not only healthy but also important because real

power of regression lies in a comprehensive interpretation of the results.

210

Chapter 12: Regression

Regression models are used to test the statistical validity of causal

relation presumed in theory or hypothesis. Regression can never be

divorced from the hypothesis it is testing. The construction of the model

has to be based upon the hypothesis, and not on the availability of the

data. Therefore, if you believe you have a valid hypothesis, but do not

have the correct data series to represent each factor in your hypothesis,

the best practice is not running a regression analysis.

On the other hand, the method of throwing in all variables into the model

and making the computer select the best model is a misleading technique

that sadly has gained popularity because of the belief that the best model

is the one that fits the data the best.

The best models can only be a subset of “valid models.” (That is, models

that have passed all diagnostic test for presumptions for conforming to the

assumptions required by a regression.) Furthermore, note that if the

model is shown to “not fit” the data, or the expected relationship between

variables is estimated as negligible, you still have valid results. The

variance between the hypothesis and the results is always important and

can give rise to a new perspective relative to the hypothesis.

The process of interpretation is called inferential analysis and is far more

important than the actual number punching. Inferential analysis also

includes testing if the data and model have complied with the strong

assumptions underlying a regression model.

The very veracity and validity depends upon several diagnostic tests.

Unfortunately, many econometricians do not perform the diagnostic

testing or simply lie about the inferences and conclusions derived from the

model.

Our book “Interpreting Regression Output” provides a summary table (a

211

Statistical Analysis with Excel

cheat“sheet for you!) that lists the implications of the invalidity of

assumptions. (The book can be purchased at http://www.vjbooks.net).

This summary provides, in one page, what other books have spread out

over many chapters. Please use this table as a checklist before you

interpret any model. Most statistic professors and textbooks teach the

interpretation of regression results before discussing the issue of validity.

You will save yourself a lot of grief if you always perform diagnostics after

running a regression model.

Once you have a valid model, interpret the results in the logical sequence

shown in the table interpreting regression output in our book

“Interpreting Regression Output.” This table will provide a framework

and flowchart for interpretation thereby enabling a structured and

comprehensive inferential analysis.

12.1.A ASSUMPTION 1: THE RELATIONSHIP BETWEEN ANY ONE

INDEPENDENT SERIES AND THE DEPENDENT SERIES CAN BE

CAPTURED BY A STRAIGHT LINE IN A 2“AXIS GRAPH

This is also called the assumption of linearity in the regression

coefficients. (None of the regression coefficients ” the betas ” should

have an exponential power or any other non” linear transformation.)

12.1.B ASSUMPTION 2: THE INDEPENDENT VARIABLES DO NOT

CHANGE IF THE SAMPLING IS REPLICATED

The independent variables are truly independent” the model assumes is

using deviations across the X variables to explain the dependent series.

The regression attempts to explain the dependent series™ variations across

212

Chapter 12: Regression

the combination of values of the independent variables.

If repeated samples are used, the model predicts the same predicted

dependent series for each combination of X values, but” across the

samples” the observed Y may differ across the same combination of X

values. (The gap between the predicted and observed Y values is the

residual or error.)

12.1.C ASSUMPTION 3: THE SAMPLE SIZE MUST BE GREATER

THAN THE NUMBER OF INDEPENDENT VARIABLES (N

SHOULD BE GREATER THAN K“1)