38
Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models D iscreteD istributions Prepared by Dr. Clarence S. Bayne JMSB, Concordia University Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Embed Size (px)

Citation preview

Page 1: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Business Statistics, Can. ed.By Black, Chakrapani & Castillo

Chapter 14

Building Multiple Regression Models

Discrete Distributions

Prepared by Dr. Clarence S. Bayne

JMSB, Concordia University

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 2: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Learning Objectives

• Analyze and interpret nonlinear variables in multiple regression analysis.

• Understanding the role of qualitative variables and how to use them in multiple regression analysis.

• How to build and evaluate multiple regression models.

• What is multicollinearity and how to deal with it

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 3: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Mathematical Transformations: Recoding Independent Variables to Create

Non-linear Models

Description of Models Equations

First-order model with Two Independent Variables

Second-order Model with One Independent variable

Second-order Model with an Interaction Term

Second-order with Two Independent Variables

0 1 1 2 2Y X X

20 1 1 2 1Y X X

0 1 1 2 2 3 1 2Y X X X X

2 20 1 1 2 2 3 1 4 2 5 1 2Y X X X X X X

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 4: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

A Curvilinear Scatter Plot of Sales Data for 13 Manufacturing Companies

050

100150200250300350400450500

0 2 4 6 8 10 12

Number of Representatives

Sales

ManufacturerSales

($1,000,000)

Number of Manufacturing Representatives

1 2.1 22 3.6 13 6.2 24 10.4 35 22.8 46 35.6 47 57.1 58 83.5 59 109.4 6

10 128.6 711 196.8 812 280.0 1013 462.3 11

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 5: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Excel Simple Linear Regression Output for the Manufacturing Example

Regression StatisticsMultiple R 0.933R Square 0.870Adjusted R Square 0.858Standard Error 51.10Observations 13

Coefficients Standard Error t Stat P-valueIntercept -107.03 28.737 -3.72 0.003numbers 41.026 4.779 8.58 0.000

ANOVAdf SS MS F Significance F

Regression 1 192395 192395 73.69 0.000Residual 11 28721 2611Total 12 221117

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 6: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Second Order Model with one Independent Variable: Manufacturing

Sales Data: Table 14.2

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 7: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Scatter Plots Showing Original Curvilinear With More Linear Transformed Data: Figure 14.2

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 8: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Computer Output for Quadratic Model to Predict Sales

Regression StatisticsMultiple R 0.986R Square 0.973Adjusted R Square 0.967Standard Error 24.593Observations 13

Coefficients Standard Error t Stat P-valueIntercept 18.067 24.673 0.73 0.481MfgrRp -15.723 9.5450 - 1.65 0.131MfgrRpSq 4.750 0.776 6.12 0.000

ANOVAdf SS MS F Significance F

Regression 2 215069 107534 177.79 0.000Residual 10 6048 605Total 12 221117

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 9: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Tukey’s Ladder of TransformationThe Four Quadrant Approach

2 3Move toward , , ,

log , -1 ,

or

toward x x

y y

Move toward log , -1 , ,

toward log Y, -1 ,

x or

y

x

2 3

2 3

Move toward , , ,

toward , ,

ory y

x x

2 3Move toward , ,

log , -1 ,

or

toward y y

x x

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 10: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Regression Models With Interactions Often in the real world of business and economics interaction occurs

between two variables

One variable acts differently over a range of values for the second variable than it does over another range of values for the second variable

In a manufacturing plant humidity might affect the hardness of material differently at differently at different temperatures

The ANOVA model in Chapter 11 addressed this problem by using an interaction variable as a blocking variable

In regression analysis, interaction can be examined as a separate independent variable

This is illustrated by using the second-order model design with two independent variables and an interaction term.

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 11: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Table 14.3 Share Prices of Three Stocks over a 15-Month Period

Stock 1 Stock 2 Stock 3

41 36 35

39 36 35

38 38 32

45 51 41

41 52 39

43 55 55

47 57 52

49 58 54

41 62 65

35 70 77

36 72 75

39 74 74

33 83 81

28 101 92

31 107 91

Problem Definition:The data represent the closing prices for three corporations over a 15 months period. An investment firm wants to use the prices for stocks 2 and 3 to develop a regression model to predict the price of stock 1

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 12: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Develop Model Using Step by Step Approach and Explore for Interaction

Y

where

X X

0 1 1 2 2

: Y = price of stock 1

price of stock 2

price of stock 3

1

2

XX

First-order with Two Independent Variables

Second-order with an Interaction Term

XXXXX

XXXXXXX

Ywhere

Y

Y

213

2

1

3322110

21322110

3stock of price

2stock of price

1stock of price = :

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 13: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Initial Regression First-order Model with Two Independent Variables

The regression equation isStock 1 = 50.9 - 0.119 Stock 2 - 0.071 Stock 3

Predictor Coef StDev T PConstant 50.855 3.791 13.41 0.000Stock 2 -0.1190 0.1931 -0.62 0.549Stock 3 -0.0708 0.1990 -0.36 0.728

S = 4.570 R-Sq = 47.2% R-Sq(adj) = 38.4%

Analysis of Variance

Source DF SS MS F PRegression 2 224.29 112.15 5.37 0.022Error 12 250.64 20.89Total 14 474.93

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 14: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Excel Regression Second-order Model with Interaction Term for the Three Stocks

The regression equation is

Stock 1 = 12.0 - 0.879 Stock 2 - 0.220 Stock 3 – 0.00998 Inter

Predictor Coef StDev T PConstant 12.046 9.312 1.29 0.222Stock 2 0.8788 0.2619 3.36 0.006Stock 3 0.2205 0.1435 1.54 0.153Inter -0.009985 0.002314 -4.31 0.001

S = 2.909 R-Sq = 80.4% R-Sq(adj) = 75.1%

Analysis of Variance

Source DF SS MS F PRegression 3 381.85 127.28 15.04 0.000Error 11 93.09 8.46Total 14 474.93

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 15: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Response Surface for the Stock Example- Without and With Interaction

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 16: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Regression Statistics from Two Excel Output Summaries With and Without

Interaction Summary Regression Statistics for Share Prices of Three

Stocks

Summary Output : With No Interaction

Summary Output With Interaction

Multiple R 0.687213365 Multiple R 0.89666084

R Square 0.47226221 R Square 0.804000661

Adjusted R Square 0.384305911 Adjusted R Square 0.750546296

Standard Error 4.570195728 Standard Error 2.90902388

Observations 15 Observations 15

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 17: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Analysis and Conclusions • By using the interaction term the coefficient of determination( R2)

increases from 0.47 to 0.80

• The Standard error decreases from 4.57 in the first model down to 2.909 in the second.

• The t ratios for the X1 term and the interaction term are statistically significant in the second model

• T = 3.36 with a p value of 0.006 for X1 and t= -4.31 with a probability of 0.001 for X1X2 .

• Inclusion of X1X2 helped the model account for a substantially greater amount of the dependent variable. It is a significant contributor to the model

• The second graph in figure 14.6 shows how the interaction term bends the curve to fit the data as stock 2 is increased

• Be cautious in interpreting the accuracy of the partial coefficients because of the high likelihood of multicollinearity

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 18: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Model-Building: Search Procedures Search procedure are processes whereby more than one

multiple regression model is developed for a given database, and the models are compared and sorted by different criteria, depending on the given procedure

There are many search procedures. Among the most widely known are All Possible Regressions Stepwise Regression Forward Selection Backward Elimination

Which approach is best is subject to much debate and depends on the disciplines and the philosophy of enquiry that the researcher brings to the research.

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 19: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

All Possible Regressions • All possible regressions search procedure computes all possible linear

multiple regression models from the data using all variables

• If a data set contains k independent variables all possible regressions will determine 2k – 1 different models

• This produces all possible different models with single predictors; two predictors; three predictors up to all k predictors

• The next slide show predictors for all possible regressions for five independent variables

• If a research methodology and study design exist that identifies all essential variables, the procedure enables the business researcher to examine every model

• Warning. This search through all possible models can be tedious, time consuming, inefficient, and perhaps overwhelming

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 20: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

All Possible Regressions with Five Independent Variables

FourPredictors

X1,X2,X3,X4

X1,X2,X3,X5

X1,X2,X4,X5

X1,X3,X4,X5

X2,X3,X4,X5

SinglePredictor

X1

X2

X3

X4

X5

TwoPredictors

X1,X2

X1,X3

X1,X4

X1,X5

X2,X3

X2,X4

X2,X5

X3,X4

X3,X5

X4,X5

ThreePredictorsX1,X2,X3

X1,X2,X4

X1,X2,X5

X1,X3,X4

X1,X3,X5

X1,X4,X5

X2,X3,X4

X2,X3,X5

X2,X4,X5

X3,X4,X5

Five PredictorsX1,X2,X3,X4,X5

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 21: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Stepwise Regression• Stepwise regression is a step-by-step process that begins by

developing a regression model with a single predictor variable and adds and deletes predictors one step at a time

• It allows the researcher to examine the fit of the model at each step until no more significant predictors remain outside the model

• This starts by choosing the single predictor regression with the highest t or F value and which is significant at some predetermined Alpha value.

• If none of the independent variables meet this criteria, no model is recommended.

• Incrementally other variables are added to the equation and tested for the significance of their contribution to explaining Total variation relative to other variable, then test for the significance.

• This procedure continues until all significant predictor are included • Stepwise regression allows checks for multicollinearity and the

dropping of variables that were included in earlier stages

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 22: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Forward Selection

Like stepwise, except that variables are not reevaluated after entering the model

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 23: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Backward Elimination

• Start with the “full model” (all k predictors)• If all predictors are significant, stop• Otherwise, eliminate the most nonsignificant

predictor; and return to previous step

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 24: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Stepwise Regression• Perform k simple regressions; and select the best as the

initial model

• Evaluate each variable not in the model– If none meet the criterion, stop– Add the best variable to the model; reevaluate previous variables,

and drop any which are not significant

• Return to previous step

• The criteria for inclusion and exclusion of variables may be of a technical nature; common sense observational nature; based on a body of theory; the usefulness of the discovery of new relationships as insights to meaning

• The researcher has to be keenly aware of the problem of spurious relationships when using these search procedures

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 25: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Choosing the Variables for a Stepwise Regression Predicting World Crude Oil

Production Example Problem Definition: Predicting world crude oil production• Choice of a method: many different types of prediction models can be

constructed. the researcher adopts an econometric approach using multiple regression

• After a preliminary survey of the industry and the factors surrounding it, the researcher realizes that much of the world crude oil market is driven by variables related to the usage and production in the USA

The researcher identifies five independent variables as predictors: 1.U.S. energy consumption 2. Gross U.S. nuclear electricity generation 3.U.S. Coal production 4.Total U.S. dry gas (natural gas) production 5. Fuel rate of U.S. owned automobiles

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 26: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Systematic Framework Underlying Data Collection

• A survey of published and other data on energy production and usage suggest that world production of crude oil is driven by previous years activities in the U.S.

• Expected that as energy consumption of the U.S. increased, so would world production of crude oil

• It seemed reasonable to introduce nuclear electricity generation, coal production, dry gas production and fuel rates to the study

• Rationale: their increase output may be expected to have a negative effect on crude oil production if energy consumption remained fixed.

• Data on five independent variables and the dependent variable (world crude oil production) was gathered and is presented on the next slide

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 27: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Definition and Measurement of Variables: Data for Multiple Regression Model to

Predict World Crude Oil Production

Y World Crude Oil Production (millions of barrels per Day)

X1 U.S. Energy Consumption

(quadrillion BTUs generation per year)

X2 U.S. Nuclear Generation

(billion kilowatts-hours)

X3 U.S. Coal Production

(million short-tons)

X4 U.S. Dry Gas Production

(trillion cubic feet)

X5 U.S. Fuel Rate for Autos (miles per gallon)

Y X1 X2 X3 X4 X555.7 74.3 83.5 598.6 21.7 13.3055.7 72.5 114.0 610.0 20.7 13.4252.8 70.5 172.5 654.6 19.2 13.5257.3 74.4 191.1 684.9 19.1 13.5359.7 76.3 250.9 697.2 19.2 13.8060.2 78.1 276.4 670.2 19.1 14.0462.7 78.9 255.2 781.1 19.7 14.4159.6 76.0 251.1 829.7 19.4 15.4656.1 74.0 272.7 823.8 19.2 15.9453.5 70.8 282.8 838.1 17.8 16.6553.3 70.5 293.7 782.1 16.1 17.1454.5 74.1 327.6 895.9 17.5 17.8354.0 74.0 383.7 883.6 16.5 18.2056.2 74.3 414.0 890.3 16.1 18.2756.7 76.9 455.3 918.8 16.6 19.2058.7 80.2 527.0 950.3 17.1 19.8759.9 81.3 529.4 980.7 17.3 20.3160.6 81.3 576.9 1029.1 17.8 21.0260.2 81.1 612.6 996.0 17.7 21.6960.2 82.1 618.8 997.5 17.8 21.6860.6 83.9 610.3 945.4 18.2 21.0460.9 85.6 640.4 1033.5 18.9 21.48

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 28: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Step 1: Stepwise Regression Results with One Predictor

The results of simple regression using each independent variable to predict oil production produces the initial regression equation

y = 13.075 + 0.580x1 where y is world crude oil production and x1 is U.S. Energy consumption. Note the t value (11.77) in Table 14.8 is the highest of all variables tried, an R-squared is 85.2%

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 29: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Excel Output of Regression for Crude Oil Production

Page 30: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Step 2: Stepwise Regression Results with Two Predictors

• X2 is retained initially in the model and a search is conducted to determine which of the other models together with it produces the highest significant t value( add most to explaining variation in Y).

• The new equation emerging from computer calculation is y = 7.14 + 0.772x1 – 0.517x2 . X2 is U.S. fuel rate. It has a t value of -

3.75 and an r-squared of 90.8. Both very significant.

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 31: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Step 3: Regression Results with Three Predictors

• Step 3 continues the search for additional predictor variables

• Table 14.10 shows that any other values added make no significant contributions to the regression obtained at step 2. The t values are very small.

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 32: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Minitab Stepwise OutputStepwise Regression

F-to-Enter: 4.00 F-to-Remove: 4.00

Response is Coiler on 5 predictors, with N = 26

Step 1 2Constant 13.075 7.140

Seconds 0.580 0.772T-Value 11.77 11.91P-value 0.000 0.000

Fuel Rate -0.52T-Value -3.75P-value 0.001

S 1.52 1.22R-Sq 85.24 90.83

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 33: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Key Concerns • The search procedures provide a framework for an analysis and must be

applied subject to commonsense and an explanatory theory or analysis• Avoid the mistake of using the strict sequential order in which variables

come into a computer print out ( on stepwise and forward selection) to rank the importance of the variable

• In multiple regression (unlike simple regression) the importance of an independent variable is ranked in terms of its net contribution to explaining Y when used with other variables; not in terms of its individual correlation with y

• Problems of multicollinearity require transformation or omission of variable(s) before or as analysis proceeds . Adding a variable that is highly correlated with other independent variables is very problematic. It distorts the value of coefficients and renders all tests unreliable.

• An increase in R-squared is not in and of itself a good indicator of the importance of the last variable added.

• Common sense and use value is the final arbiter in choosing the final model

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 34: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Multicollinearity

Condition that occurs when two or more of the independent variables of a multiple regression model are highly correlated

Effect of Multicollinearity Difficult, if not impossible, to interpret the estimates of the

regression coefficients Inordinately small t values for the regression coefficients Standard deviations of regression coefficients are

overestimated: t-tests and F test may have no meaning Algebraic sign of predictor variable’s coefficient opposite of

what expected

In practice correlations as high as 60 to 70 percent may be tolerated without causing a serious problem of multicollinearity

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 35: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Testing for Multicollinearity Two techniques for determining the possible existence of

Multicollinearity Prepare a correlation matrix of the independent variables using an Excel or

other software program and identify those pairs of variables that have correlations in excess of 0.70

The Variance Inflation factor (VIF): conduct a regression analysis to predict one independent variable by the other. Thus the independent variable being predicted becomes the dependent variable. This is done for all possible different pairs and R-squares (Coefficients of determination) for each calculated.

is the measure that determines whether the standard errors of the

estimates are inflated.

Some researchers follow a guideline that for VIF greater than 10 or an R2 greater than 0.90 for the largest VIFs indicates a severe multicollinearity problem

2

1

1 i

VIFR

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 36: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Correlations among Oil Production Predictor Variables

EnergyConsumption Nuclear Coal Dry Gas Fuel Rate

EnergyConsumption 1 0.856 0.791 0.057 0.791

Nuclear 0.856 1 0.952 -0.404 0.972

Coal 0.791 0.952 1 -0.448 0.968

Dry Gas 0.057 -0.404 -0.448 1 -

Fuel Rate 0.796 0.972 0.968 -0.423 1

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 37: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Problem of Interpretation When Multicollinearity Exists: World Crude Oil

Production Regression • The algebraic signs in a regression model must conform to

common sense observation or established theory• Note the following three equations considered at different

stages f the stepwise regression analysis

1. Ŷ = 44.869 + 0.7838(fuel rate). The positive fuel rate coefficient can be interpreted in terms of economic theory: price substitution effect.

2. Ŷ = 45.072 + 0.0157(coal). The positive coal coefficient is explainable in a complementary sense.

3. Ŷ = 45.806 + 0.0227(coal) – 0.3934(fuel rate). The negative fuel rate coefficient is opposite to that in equation 1 and is contrary to what by normally expected in economic theory or common sense observation

• The reason for the apparent contradiction in equation 3 can be attributed to multicollinearity: R2 = 0.968 or VIF =31

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Page 38: Business Statistics, Can. ed. By Black, Chakrapani & Castillo Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia

Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.

Copyright Notice Copyright © 2010 John Wiley & Sons Canada, Ltd. All rights reserved. Reproduction or

translation of this work beyond that permitted by Access Copyright (The Canadian Copyright Licensing Agency) is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley & Sons Canada, Ltd. The purchaser may make back-up copies for his/her own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages caused by the use of these programs or from the use of the information herein.