Upload
bugti1986
View
266
Download
1
Embed Size (px)
Citation preview
8/12/2019 Ch18 Multiple Regression
1/51
1
MultipleRegression
Chapter 17
8/12/2019 Ch18 Multiple Regression
2/51
2
Introduction
In this chapter we extend the simple linearregression model, and allow for any number of
independent variables. We expect to build a model that fits the data
better than the simple linear regression model.
8/12/2019 Ch18 Multiple Regression
3/51
3
Weight
Calories consumed
Introduction We all believe that weight is affected by the amount of calories
consumed. Yet, the actual effect is different from one individual toanother.
Therefore, a simple linear relationship leaves much unexplained error.
8/12/2019 Ch18 Multiple Regression
4/51
4
Weight
Calories consumed
Introduction
Click to to continue
In an attempt to reduce the unexplained errors, well adda second explanatory (independent) variable
8/12/2019 Ch18 Multiple Regression
5/51
5
Weight
Calories consumed
Weight = b0+ b1Calories+ b2Height + e
Introduction
If we believe a persons height explains his/her weight too, we can add this
variable to our model. The resulting Multiple regression model is shown:
8/12/2019 Ch18 Multiple Regression
6/51
6
We shall use computer printout to
Assess the model
How well it fits the data Is it useful
Are any required conditions violated?
Employ the model
Interpreting the coefficients Making predictions using the prediction equation
Estimating the expected value of the dependent variable
Introduction
8/12/2019 Ch18 Multiple Regression
7/51
7
Dependent variable Independent variables
Random error variable
17.1 Model and Required Conditions
Coefficients
We allow k independent variables to potentiallyexplain the dependent variable
y = b0+ b1x1+ b2x2+ + bkxk+ e
8/12/2019 Ch18 Multiple Regression
8/51
8
The erroreis normally distributed.
The mean is equal to zero and the standard deviation isconstant (se)for all values of y.
The errors are independent.
Model AssumptionsRequired conditions for e
8/12/2019 Ch18 Multiple Regression
9/51
9
If the model assessment indicates good fit to the data, use itto interpret the coefficients and generate predictions.
Assess the model fit using statistics obtained from the
sample.
Diagnose violations of required conditions. Try to remedyproblems when identified.
17.2 Estimating the Coefficients and
Assessing the Model The procedure used to perform regression analysis:
Obtain the model coefficients and statistics using a
statistical software.
8/12/2019 Ch18 Multiple Regression
10/51
10
Example 1 Where to locate a new motor inn?
La Quinta Motor Inns is planning an expansion.
Management wishes to predict which sites are likely to beprofitable.
Several areas where predictors of profitability can be identifiedare:
Competition Market awareness
Demand generators
Demographics
Physical quality
Estimating the Coefficients and
Assessing the Model, Example
8/12/2019 Ch18 Multiple Regression
11/51
11
Profitability
Competition Marketawareness Customers Community Physical
Operating Margin
Rooms Nearest Officespace
Enrollment Income Distance
Distance todowntown.
Medianhouseholdincome.
Distance tothe nearestLa Quinta inn.
Number ofhotels/motelsrooms within3 miles from
the site.
X1 x2 x3 x4 x5 x6
CollegeEnrollment
Estimating the Coefficients and
Assessing the Model, Example
8/12/2019 Ch18 Multiple Regression
12/51
12
Data were collected from randomly selected 100 inns that belongto La Quinta, and ran for the following suggested model:
Margin = b0 b1Rooms b2Nearest b3Officeb4College + b5Income + b6Disttwn + e
INN MARGIN ROOMS NEAREST OFFICE COLLEGE INCOME DISTTWN1 55.5 3203 4.2 549 8 37 2.7
2 33.8 2810 2.8 496 17.5 35 14.4
3 49 2890 2.4 254 20 35 2.6
4 31.9 3422 3.3 434 15.5 38 12.1
5 57.4 2687 0.9 678 15.5 42 6.9
6 49 3759 2.9 635 19 33 10.8
Estimating the Coefficients and
Assessing the Model, Example
8/12/2019 Ch18 Multiple Regression
13/51
13
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.724611
R Square 0.525062
Adjusted 0.49442
Standard 5.512084
Observatio 100
ANOVA
df SS MS F gnificance F
Regressio 6 3123.832 520.6387 17.13581 3.03E-13
Residual 93 2825.626 30.38307
Total 99 5949.458
Coeff icient andard Err t Stat P-value Lower 95%Upper 95%
Intercept 38.13858 6.992948 5.453862 4.04E-07 24.25197 52.02518
Number -0.00762 0.001255 -6.06871 2.77E-08 -0.01011 -0.00513
Nearest 1.646237 0.632837 2.601361 0.010803 0.389548 2.902926
Office Spa 0.019766 0.00341 5.795594 9.24E-08 0.012993 0.026538
Enrollment 0.211783 0.133428 1.587246 0.115851 -0.05318 0.476744
Income 0.413122 0.139552 2.960337 0.003899 0.135999 0.690246
Distance -0.22526 0.178709 -1.26048 0.210651 -0.58014 0.129622
This is the sample regression equation(sometimes called the prediction equation)
MARGIN = 38.14 -0.0076ROOMS+1.65NEAREST
+ 0.02OFFICE+0.21COLLEGE+0.41INCOME - 0.23DISTTWN
Regression Analysis, Excel OutputLa Quinta
http://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/La%20Quinta.xlshttp://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/La%20Quinta.xls8/12/2019 Ch18 Multiple Regression
14/51
14
Model Assessment -
Standard Error of Estimate A small value of seindicates (by definition) a small
variation of the errors around their mean.
Since the mean is zero, small variation of the errorsmeans the errors are close to zero.
So we would prefer a model with a small standarddeviation of the error rather than a large one.
How can we determine whether the standard deviationof the error is small/large?
8/12/2019 Ch18 Multiple Regression
15/51
15
The standard deviation of the error seis estimated bythe Standard Error of Estimate se:
1kn
SSEs
e
Model Assessment -
Standard Error of Estimate
.yThe magnitude of seis judged by comparing it to
8/12/2019 Ch18 Multiple Regression
16/51
16
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.724611
R Square 0.525062
Adjusted R S 0.49442
Standard Erro 5.512084
Observat ions 100
ANOVA
df SS MS F gnificance F
Regression 6 3123.832 520.6387 17.13581 3.03E-13
Residual 93 2825.626 30.38307
Total 99 5949.458
Coefficientsandard Err t Stat P-value Lower 95%Upper 95%
Intercept 38.13858 6.992948 5.453862 4.04E-07 24.25197 52.02518
Number -0.00762 0.001255 -6.06871 2.77E-08 -0.01011 -0.00513
Nearest 1.646237 0.632837 2.601361 0.010803 0.389548 2.902926
Office Space 0.019766 0.00341 5.795594 9.24E-08 0.012993 0.026538
Enrollment 0.211783 0.133428 1.587246 0.115851 -0.05318 0.476744
Income 0.413122 0.139552 2.960337 0.003899 0.135999 0.690246
Distance -0.22526 0.178709 -1.26048 0.210651 -0.58014 0.129622
From the printout, se= 5.5121
Calculating the mean value ofy we have 739.45y
Standard Error of Estimate
8/12/2019 Ch18 Multiple Regression
17/51
17
Model Assessment
Coefficient of Determination In our example it seems seis not particularly small, or is it?
If seis small the model fits the data well, and is considered useful.The usefulness of the model is evaluated by the amount of variability
in the y values explained by the model. This is done by thecoefficient of determination.
The coefficient of determination is calculated by
As you can see, SSE (thus se) effects the value of r2.
SST
SSESST
SST
SSRR2
8/12/2019 Ch18 Multiple Regression
18/51
18
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.724611
R Square 0.525062
Adjusted 0.49442
Standard 5.512084
Observatio 100
ANOVA
df SS MS F gnificance F
Regressio 6 3123.832 520.6387 17.13581 3.03E-13
Residual 93 2825.626 30.38307
Total 99 5949.458
Coefficient andard Err t Stat P-value Lower 95% pper 95%
Intercept 72.45461 7.893104 9.179483 1.11E-14 56.78049 88.12874
ROOMS -0.00762 0.001255 -6.06871 2.77E-08 -0.01011 -0.00513
NEAREST -1.64624 0.632837 -2.60136 0.010803 -2.90292 -0.38955
OFFICE 0.019766 0.00341 5.795594 9.24E-08 0.012993 0.026538
COLLEGE 0.211783 0.133428 1.587246 0.115851 -0.05318 0.476744
INCOME -0.41312 0.139552 -2.96034 0.003899 -0.69025 -0.136
DISTTWN 0.225258 0.178709 1.260475 0.210651 -0.12962 0.580138
Coefficient of Determination
From the printout, R2= 0.5251that is, 52.51% of the variabilityin the margin values is explainedby this model.
8/12/2019 Ch18 Multiple Regression
19/51
19
To answer the question we test the hypothesis
H0: b1= b2= = bk= 0H1: At least one biis not equal to zero.
If at least one biis not equal to zero, the model hassome validity.
We pose the question:
Is there at least one independent variable linearlyrelated to the dependent variable?
Testing the Validity of the Model
8/12/2019 Ch18 Multiple Regression
20/51
20
Note, that if all the data points satisfy the linear equation without errors, yiandcoincide, and thus SSE = 0. In this case all the variation in y is explained bythe regression (SS(Total) = SSR).
The total variation in y (SS(Total)) can be explained in part by the regression(SSR) while the rest remains unexplained (SSE):SS(Total) = SSR + SSE or
iy
2
ii
2
i )y(y)yy( +2
i )y(y
If errors exist in small amounts, SSR will be close to SS(Total) and the ratio
SSR/SSE will be large. This leads to the F ratio test presented next.
Testing the Validity of the Model
8/12/2019 Ch18 Multiple Regression
21/51
21
Testing for Significance
1kn
SSEMSEk
SSRMSR
1knSSE
kSSR
MSE
MSRF
Define the Mean of the Sum of Squares-Regression (MSR)Define the Mean of the Sum of Squares-Error (MSE)
The ratio MSR/MSE is F-distributed
8/12/2019 Ch18 Multiple Regression
22/51
22
Rejection region
F>Fa,k,n-k-1
Testing for Significance
Note.
A Large Fresults from a large SSR, which indicates much of the
variation in y is explained by the regression model; this is when themodel is useful. Hence, the null hypothesis (which states that themodel is not useful) should be rejected when F is sufficiently large.Therefore, the rejection region has the form of F > Fa,k,n-k-1
8/12/2019 Ch18 Multiple Regression
23/51
23
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.724611
R Square 0.525062
Adjusted 0.49442
Standard 5.512084Observatio 100
ANOVA
df SS MS F gnificance F
Regressio 6 3123.832 520.6387 17.13581 3.03E-13
Residual 93 2825.626 30.38307
Total 99 5949.458
Coefficient andard Err t Stat P-value ower 95%Upper 95%
Intercept 72.45461 7.893104 9.179483 1.11E-14 56.78049 88.12874
ROOMS -0.00762 0.001255 -6.06871 2.77E-08 -0.01011 -0.00513
NEAREST -1.64624 0.632837 -2.60136 0.010803 -2.90292 -0.38955
OFFICE 0.019766 0.00341 5.795594 9.24E-08 0.012993 0.026538
COLLEGE 0.211783 0.133428 1.587246 0.115851 -0.05318 0.476744
INCOME -0.41312 0.139552 -2.96034 0.003899 -0.69025 -0.136DISTTWN 0.225258 0.178709 1.260475 0.210651 -0.12962 0.580138
ANOVA
df SS MS F Significance F
Regression 6 3123.832 520.6387 17.13581 3.03382E-13
Residual 93 2825.626 30.38307Total 99 5949.458
k =
nk1 =n1 =
Testing the Model Validity of the La Quinta
Inns Regression Model
MSE=SSE/(n-k-1)
MSR=SSR/k
MSR/MSE
SSE
SSR
The F ratio test is performed using the ANOVAportion of the regression output
8/12/2019 Ch18 Multiple Regression
24/51
24
ANOVA
df SS MS F Significance F
Regression 6 3123.832 520.6387 17.13581 3.03382E-13
Residual 93 2825.626 30.38307
Total 99 5949.458
k =nk1 =
n1 =
If alpha = .05, the critical F isFa,k,n-k-1= F0.05,6,100-6-1=2.17
F = 17.14 > 2.17
Also, the p-value = 3.033(10)-13.Clearly,p-value=3.033(10)-13
8/12/2019 Ch18 Multiple Regression
25/51
25
b0 = 38.14.This is the y intercept, the value of y when
all the variables take the value zero. Since the data
range of all the independent variables do not cover thevalue zero, do not interpret the intercept.
Interpreting the Coefficients
Interpreting the coefficients b1through bk
y = b0+ b1x1+ b2x2++bkxky = b0+ b1(x1+1)+ b2x2++bkxk
= b0+ b1x1+ b2x2++bkxk + b1
8/12/2019 Ch18 Multiple Regression
26/51
26
Interpreting the Coefficients
b1=0.0076.In this model, for each additional room
within 3 mile of the La Quinta inn, the operating margin
decreases on the average by .0076% (assuming the
other variables are held constant).
8/12/2019 Ch18 Multiple Regression
27/51
27
b2= 1.65.In this model, for each additional mile that thenearest competitor is to a La Quinta inn, the average operating
margin increases by 1.65% when the other variables are held
constant.
b3 = 0.02. For each additional 1000 sq-ft of office space, theaverage increase in operating margin will be .02%.
b4= 0.21. For each additional thousand students the averageoperating margin increases by .21% when the othervariables
remain constant.
Interpreting the Coefficients
8/12/2019 Ch18 Multiple Regression
28/51
28
b5= 0.41.For additional $1000 increase in medianhousehold income, the average operating marginincreases by .41%, when the other variables remainconstant.
b6= - 0.23.For each additional mile to the downtown
center, the average operating margin decreases by
.23%.
Interpreting the Coefficients
8/12/2019 Ch18 Multiple Regression
29/51
29
Test statistic
ib
ii
s
bt
b d.f. = n - k -1
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 38.13858 6.992948 5.453862 4.04E-07 24.25196697 52.02518Number -0.007618 0.00125527 -6.06871 2.77E-08 -0.010110585 -0.00513
Nearest 1.646237 0.63283691 2.601361 0.010803 0.389548431 2.902926
Office Spa 0.019766 0.00341044 5.795594 9.24E-08 0.012993078 0.026538
Enrollment 0.211783 0.13342794 1.587246 0.115851 -0.053178488 0.476744
Income 0.413122 0.1395524 2.960337 0.003899 0.135998719 0.690246
Distance -0.225258 0.17870889 -1.26048 0.210651 -0.580138524 0.129622
The hypothesis for each bi is
Excel printout
H0: bi0
H1: bi0
Testing the Coefficients
For example, a test for b1:t = (-.007618-0)/.001255 = -6.068Suppose alpha=.01. t.005,100-6-1=3.39There is sufficient evidence to rejectH
0at 1% significance level.
Moreover the p=value of the test is2.77(10-8). Clearly H0is stronglyrejected. The number of rooms islinearly related to the margin.
8/12/2019 Ch18 Multiple Regression
30/51
30
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 38.13858 6.992948 5.453862 4.04E-07 24.25196697 52.02518Number -0.007618 0.00125527 -6.06871 2.77E-08 -0.010110585 -0.00513
Nearest 1.646237 0.63283691 2.601361 0.010803 0.389548431 2.902926
Office Spa 0.019766 0.00341044 5.795594 9.24E-08 0.012993078 0.026538
Enrollment 0.211783 0.13342794 1.587246 0.115851 -0.053178488 0.476744
Income 0.413122 0.1395524 2.960337 0.003899 0.135998719 0.690246
Distance -0.225258 0.17870889 -1.26048 0.210651 -0.580138524 0.129622
The hypothesis for each bi is
Excel printout
H0: bi0
H1: bi0
Testing the Coefficients
See next the interpretationof the p-value results
8/12/2019 Ch18 Multiple Regression
31/51
31
Interpretation
Interpretation of the regression results for this model
The number of hotel and motel rooms, distance to thenearest motel, the amount of office space, and the medianhousehold income are linearly related to the operating margin
Students enrollment and distance from downtown are notlinearly related to the margin
Preferable locations have only few other motels nearby,much office space, and the surrounding households areaffluent.
8/12/2019 Ch18 Multiple Regression
32/51
32
The model can be used for making predictions by Producing prediction interval estimate of the particular
value of y, for given values of xi.
Producing a confidence interval estimate for theexpected value of y, for given values of xi.
The model can be used to learn aboutrelationships between the independent variables xi,and the dependent variable y, by interpreting thecoefficients bi
Using the Regression Equation
8/12/2019 Ch18 Multiple Regression
33/51
33
Predict the average operating margin of an inn at a sitewith the following characteristics:
3815 rooms within 3 miles, Closet competitor 3.4 miles away,
476,000 sq-ft of office space,
24,500 college students,
$39,000 median household income, 3.6 miles distance to downtown center.
MARGIN = 38.14 -0.0076(3815)-1.646(.9) + 0.02(476)
+0.212(24.5) -0.413(35) + 0.225(11.2) = 37.1%
La Quinta
La Quinta Inns, Predictions
http://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/La%20Quinta.xlshttp://f/Study%20Guide/Ch19-Multiple%20Regression/WINDOWS/Desktop/Power%20Point/Keller-Xms/Xm19-01.xlshttp://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/La%20Quinta.xlshttp://f/Study%20Guide/Ch19-Multiple%20Regression/WINDOWS/Desktop/Power%20Point/Keller-Xms/Xm19-01.xls8/12/2019 Ch18 Multiple Regression
34/51
34
Interval estimates by Excel (Data analysis plus)
Prediction Interval
Margin
Predicted value = 37.09149
Prediction Interval
Lower limit = 25.39527
Upper limit = 48.78771
Interval Estimate of Expected Value
Lower limit = 32.96972
Upper limit = 41.21326
It is predicted that the averageoperating margin will lie
within 25.4% and 48.8%,with 95% confidence.
It is expected the averageoperating margin of all sitesthat fit this category falls
within 33% and 41.2% with95% confidence.
The average inn would not beprofitable (Less than 50%).
La Quinta Inns, Predictions
8/12/2019 Ch18 Multiple Regression
35/51
35
18.2 Qualitative Independent Variables
In many real-life situations one or moreindependent variables are qualitative.
Including qualitative variables in a regressionanalysis model is done via indicator variables.
An indicator variable (I) can assume one out of
two values, zero or one.
1 if a first condition out of two is met0 if a second condition out of two is met
I=1 if data were collected before 19800 if data were collected after 1980
1 if the temperature was below 50o0 if the temperature was 50o or more
1 if a degree earned is in Finance0 if a degree earned is not in Finance
8/12/2019 Ch18 Multiple Regression
36/51
36
Qualitative Independent Variables;
Example: Auction Car Price (II) Example 2 - continued
Recall: A car dealer wants to predict the auction
price of a car. The dealer believes now that both odometer reading
and car colorare variables that affect a cars price.
Three color categories are considered: White Silver
Other colors
Note: Color is aqualitative variable.
8/12/2019 Ch18 Multiple Regression
37/51
37
Example 2 - continued
I1 = 1 if the color is white0 if the color is not white
I2 =1 if the color is silver0 if the color is not silver
The category Other colors is defined by:
I1= 0; I2= 0
Qualitative Independent Variables;
Example: Auction Car Price (II)
8/12/2019 Ch18 Multiple Regression
38/51
38
Note: To represent the situation of three possiblecolors we need only two indicator variables.
Generally to represent a nominal variable with m
possible values, we must create m-1indicatorvariables.
How Many Indicator Variables?
8/12/2019 Ch18 Multiple Regression
39/51
39
Solution
the proposed model is
y = b0+ b1(Odometer) + b2I1+ b3I2+ e The data
Price Odometer I-1 I-2
14636 37388 1 0
14122 44758 1 014016 45833 0 0
15590 30862 0 0
15568 31705 0 1
14718 34010 0 1
. . . .
. . . .
White color
Other color
Silver color
Qualitative Independent Variables;
Example: Auction Car Price (II)
Enter the data in Excel as usual
8/12/2019 Ch18 Multiple Regression
40/51
40
Odometer
Price
Price = 16.837 - .0591(Odometer) + .0911(0) + .3304(1)
Price=16.837 - .0591(Odometer) + .0911(1) + .3304(0)
Price = 16.837 - .0591(Odometer) + .0911(0) + .3304(0)
From Excel we get the regression equationPRICE = 16.837 - .0591(Odometer) + .0911(I-1) + .3304(I-2)
Example: Auction Car Price (II)
The Regression Equation
8/12/2019 Ch18 Multiple Regression
41/51
41
From Excel we get the regression equation
PRICE = 16701-.0591(Odometer)+.0911(I-1)+.3304(I-2)
A white car sells, on the average,for $91.1 more than a car of the
Other color category
A silver color car sells, on the average,for $330.4 more than a car of theOther color category.
For one additional mile theauction price decreases by5.91 cents on the average.
Example: Auction Car Price (II)
The Regression EquationInterpreting the equation
8/12/2019 Ch18 Multiple Regression
42/51
42
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.837135
R Square 0.700794Adjusted R Square 0.691444
Standard Error 0.304258
Observations 100
ANOVA
df SS MS F ignificance F
Regression 3 20.814919 6.938306 74.9498 4.65E-25
Residual 96 8.8869809 0.092573
Total 99 29.7019
Coefficient tandard Err t Stat P-value Lower 95%Upper 95%
Intercept 16.83725 0.1971054 85.42255 2.28E-92 16.446 17.2285
Odometer -0.059123 0.0050653 -11.67219 4.04E-20 -0.069177 -0.049068
I-1 0.091131 0.0728916 1.250224 0.214257 -0.053558 0.235819
I-2 0.330368 0.0816498 4.046157 0.000105 0.168294 0.492442
There is insufficient evidenceto infer that a white color car anda car of other color sell for adifferent auction price.
There is sufficient evidenceto infer that a silver color carsells for a larger price than acar of the other color category.
Car Price-Dummy
Example: Auction Car Price (II)
The Regression Equation
http://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/CarPrice%20Dummy.xlshttp://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/CarPrice%20Dummy.xlshttp://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/CarPrice%20Dummy.xlshttp://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/CarPrice%20Dummy.xls8/12/2019 Ch18 Multiple Regression
43/51
43
Recall: The Dean wanted to evaluate applications for theMBA program by predicting future performance of the
applicants. The following three predictors were suggested:
Undergraduate GPA
GMAT score
Years of work experience
It is now believed that the type of undergraduate degreeshould be included in the model.
Qualitative Independent Variables;
Example: MBA Program Admission (II)
Note: The undergraduate
degree is qualitative.
8/12/2019 Ch18 Multiple Regression
44/51
44
Qualitative Independent Variables;
Example: MBA Program Admission (II)
I1 =1 if B.A.0 otherwise
I2 =1 if B.B.A0 otherwise
The category Other group is defined by:
I1= 0; I2= 0; I3= 0
I3 =1 if B.Sc. or B.Eng.0 otherwise
8/12/2019 Ch18 Multiple Regression
45/51
45
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.746053
R Square 0.556595
Adjusted R Square 0.524151
Standard Error 0.729328
Observations 89
ANOVA
df SS MS F gnificance F
Regression 6 54.75184 9.125307 17.15544 9.59E-13
Residual 82 43.61738 0.531919
Total 88 98.36922
Coeffic ientsandard Err t Stat P-value ower 95%Upper 95%
Intercept 0.189814 1.406734 0.134932 0.892996 -2.60863 2.988258
UnderGPA -0.00606 0.113968 -0.05317 0.957728 -0.23278 0.22066
GMAT 0.012793 0.001356 9.432831 9.92E-15 0.010095 0.015491
Work 0.098182 0.030323 3.237862 0.001739 0.03786 0.158504
I-1 -0.34499 0.223728 -1.54199 0.126928 -0.79005 0.100081
I-2 0.705725 0.240529 2.934058 0.004338 0.227237 1.184213
I-3 0.034805 0.209401 0.166211 0.8684 -0.38176 0.45137
Qualitative Independent Variables;
Example: MBA Program Admission (II)MBA-II
http://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/MBA-II.xlshttp://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/MBA-II.xlshttp://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/MBA-II.xlshttp://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/MBA-II.xls8/12/2019 Ch18 Multiple Regression
46/51
46
Applications in Human Resources
Management: Pay-Equity Pay-equity can be handled in two different forms:
Equal pay for equal work
Equal pay for work of equal value.
Regression analysis is extensively employed incases of equal pay for equal work.
8/12/2019 Ch18 Multiple Regression
47/51
47
Human Resources Management:
Pay-Equity Example 3
Is there sex discrimination against female managers
in a large firm? A random sample of 100 managers was selectedand data were collected as follows:
Annual salary
Years of education Years of experience
Gender
8/12/2019 Ch18 Multiple Regression
48/51
48
Solution Construct the following multiple regression model:
y = b0+ b1Education + b2Experience + b3Gender + e Note the nature of the variables:
Educationquantitative Experiencequantitative
Genderqualitative (Gender = 1 if male; =0 otherwise).
Human Resources Management:
Pay-Equity
8/12/2019 Ch18 Multiple Regression
49/51
49
SolutionContinued (HumanResource)
Human Resources Management:
Pay-Equity
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.83256
R Square 0.693155
Adjusted R Square 0.683567
Standard Error 16273.96
Observations 100
ANOVA
df SS MS F gnificance F
Regression 3 5.74E+10 1.91E+10 72.28735 1.55E-24Residual 96 2.54E+10 2.65E+08
Total 99 8.29E+10
Coeff ic ient andard Err t Stat P-value Lower 95%Upper 95%
Intercept -5835.1 16082.8 -0.36282 0.71754 -37759.2 26089.02
Education 2118.898 1018.486 2.08044 0.040149 97.21837 4140.578
Experience 4099.338 317.1936 12.92377 9.89E-23 3469.714 4728.963
Gender 1850.985 3703.07 0.499851 0.618323 -5499.56 9201.527
Analysis and Interpretation The model fits the data quite well.
The model is very useful. Experience is a variable strongly
related to salary. There is no evidence of sex discrimination.
http://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/HumanResource.xlshttp://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/HumanResource.xls8/12/2019 Ch18 Multiple Regression
50/51
50
SolutionContinued (HumanResource)
Human Resources Management:
Pay-Equity
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.83256
R Square 0.693155
Adjusted R Square 0.683567
Standard Error 16273.96
Observations 100
ANOVA
df SS MS F gnificance F
Regression 3 5.74E+10 1.91E+10 72.28735 1.55E-24Residual 96 2.54E+10 2.65E+08
Total 99 8.29E+10
Coeff ic ient andard Err t Stat P-value Lower 95%Upper 95%
Intercept -5835.1 16082.8 -0.36282 0.71754 -37759.2 26089.02
Education 2118.898 1018.486 2.08044 0.040149 97.21837 4140.578
Experience 4099.338 317.1936 12.92377 9.89E-23 3469.714 4728.963
Gender 1850.985 3703.07 0.499851 0.618323 -5499.56 9201.527
Analysis and Interpretation Further studying the data we find:
Average experience (years) for women is 12.Average experience (years) for men is 17
Average salary for female manager is $76,189Average salary for male manager is $97,832
http://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/HumanResource.xlshttp://localhost/var/www/apps/conversion/tmp/scratch_9/Exercises/HumanResource.xls8/12/2019 Ch18 Multiple Regression
51/51
51
Review problems
http://localhost/var/www/apps/conversion/tmp/Additional/Chapter%2019-20.ppthttp://localhost/var/www/apps/conversion/tmp/scratch_9//Dads-moms/OnlineKeller/Review%20Problems/Chapter%2019-20.ppthttp://localhost/var/www/apps/conversion/tmp/scratch_9//Dads-moms/OnlineKeller/Review%20Problems/Chapter%2019-20.ppthttp://localhost/var/www/apps/conversion/tmp/Additional/Chapter%2019-20.ppt