Upload
channing-melendez
View
16
Download
0
Embed Size (px)
DESCRIPTION
Lecture 6 Notes. Note: I will e-mail homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis (Chapter 4.2) - PowerPoint PPT Presentation
Citation preview
Lecture 6 Notes
• Note: I will e-mail homework 2 tonight. It will be due next Thursday.
• The Multiple Linear Regression model (Chapter 4.1)
• Inferences from multiple regression analysis (Chapter 4.2)
• In multiple regression analysis, we consider more than one independent variable x1,…,xK . We are interested in the conditional mean of y given x1,…,xK .
Automobile Example• A team charged with designing a new automobile is
concerned about the gas mileage that can be achieved. The design team is interested in two things:
(1) Which characteristics of the design are likely to affect mileage?
(2) A new car is planned to have the following characteristics: weight – 4000 lbs, horsepower – 200, cargo – 18 cubic feet, seating – 5 adults. Predict the new car’s gas mileage.
• The team has available information about gallons per 1000 miles and four design characteristics (weight, horsepower, cargo, seating) for a sample of cars made in 1989. Data is in car89.JMP.
Multivariate Correlations GP1000MHwy Weight(lb) Horsepower Cargo Seating GP1000MHwy 1.0000 0.7097 0.6157 0.3405 0.2599 Weight(lb) 0.7097 1.0000 0.7509 0.1816 0.3499 Horsepower 0.6157 0.7509 1.0000 -0.0548 -0.0914 Cargo 0.3405 0.1816 -0.0548 1.0000 0.4894 Seating 0.2599 0.3499 -0.0914 0.4894 1.0000 7 rows not used due to missing values. Scatterplot Matrix
25
35
45
55
2000
3000
4000
100
150
200
250
20
60
100
140
180
2
4
6
8
GP1000MHwy
25 35 4550
Weight(lb)
2000 3000 4000
Horsepower
100 150 200 250
Cargo
20 60 100 160
Seating
2 3 4 5 6 7 8
Best Single Predictor
• To obtain the correlation matrix and pairwise scatterplots, click Analyze, Multivariate Methods, Multivariate.
• If we use simple linear regression with each of the four independent variables, which provides the best predictions?
Best Single Predictor
• Answer: The simple linear regression that has the highest R2 gives the best predictions because recall that
• Weight gives the best predictions of GPM1000Hwy based on simple linear regression.
• But we can obtain better predictions by using more than one of the independent variables.
SST
SSER 12
Multiple Linear Regression Model
•
• Assumptions about :– The expected value of the disturbances is zero for
each ,
– The variance of each is equal to ,i.e.,
– The are normally distributed.
– The are independent.
11 | ,..., 0 1 1( | , , )( )KK y x x K KE Y x x x x
iKiKiii exxxy 22110
ie
1( , , )Kx x1( | , , ) 0i i iKE e x x
ie
2e
ie
ie
21( | , , )i i iK eVar e x x
Point Estimates for Multiple Linear Regression Model
• We use the same least squares procedure as for simple linear regression.
• Our estimates of are the coefficients that minimize the sum of squared prediction errors:
• Least Squares in JMP: Click Analyze, Fit Model, put dependent variable into Y and add independent variables to the construct model effects box.
K ,...,0 Kbb ,...,0
n
i iKKiibbK xbxbbybbK 1
2*1
*1
*0,...,0 )(minarg,..., **
0
KK xbxbby 110ˆ
R esp onse G P1000M H w y W h ole M odel Actu al b y P red icted P lo t S um m ary o f F it R S quare 0.589015 R S quare A dj 0 .573208 R oot M ean S quare E rror 3.542778 M ean o f R esponse 37.33359 O bservations (or S um W gts) 109 An a lysis o f V ariance S ource D F S um of S quares M ean S quare F R atio M odel 4 1870.7788 467.695 37.2627 E rror 104 1305.3330 12.551 P rob > F C . Total 108 3176.1118 < .0001 P aram eter E stim ates Term E stim ate S td E rror t R atio P rob>|t| In tercept 19 .100521 2.098478 9.10 < .0001 W eight(lb ) 0 .0040877 0.001203 3.40 0.0010 H orsepower 0 .0426999 0.01567 2.73 0.0075 C argo 0.0533 0.013787 3.87 0.0002 S eating 0.0268912 0.428283 0.06 0.9501 R esidu al b y P red icted P lo t
-10
-5
0
5
10
GP
1000M
Hw
y R
esid
ual
25 30 35 40 45 50 55
GP1000MHwy Predicted
Root Mean Square Error
• Estimate of :
• = Root Mean Square Error in JMP • For simple linear regression of GP1000MHWY on
Weight, . For multiple linear regression of GP1000MHWY on weight, horsepower, cargo, seating,
e
1
)ˆ( 2
1
Kn
yys i
n
i ie
es
54.3es
86.3es
Residuals and Root Mean Square Errors
•
• Residual for observation i = prediction error for observation i =
• Root mean square error = Typical size of absolute value of prediction error• As with simple linear regression model, if multiple linear regression model
holds– About 95% of the observations will be within two RMSEs of their
predicted value • For car data, about 95% of the time, the actual GP1000M will be within
2*3.54=7.08 GP1000M of the predicted GP1000M of the car based on the car’s weight, horsepower, cargo and seating.
1 1 0 1 1ˆ ( | , , )K K K KE Y X x X x b b x b x
1 1
0 1 1
ˆ ( | , , )i i K iK
i i K iK
Y E Y X x X x
Y b b x b x
Inferences about Regression Coefficients
• Confidence intervals: confidence interval for :
Degrees of freedom for t equals n-(K+1). Standard error of , , found on JMP output.
• Hypothesis Test:
Decision rule for test: Reject H0 if or where p-value for testing is printed in JMP
output under Prob>|t|.
%100)1(
kkbk stb 2/
kbkbs
*
*0
:
:
kka
kk
H
H
2/tt 2/tt
kb
kk
s
bt
*
0:0 kH
Inference Examples
• Find a 95% confidence interval for ?
• Is seating of any help in predicting gas mileage once horsepower, weight and cargo have been taken into account? Carry out a test at the 0.05 significance level.
horsepower
Partial Slopes vs. Marginal Slopes
• Multiple Linear Regression Model:
• The coefficient is a partial slope. It indicates the change in the mean of y that is associated with a one unit increase in while holding all other variables fixed.
• A marginal slope is obtained when we perform a simple regression with only one X, ignoring all other variables. Consequently the other variables are not held fixed.
KKxxy xxK
110,...,| 1
k
kxKkk xxxx ...,,,..., ,111
Simple Linear Regression Bivariate Fit of GP1000MHwy By Seating
25
30
35
40
45
50
55
GP
10
00
MH
wy
2 3 4 5 6 7 8
Seating
Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 30.829816 2.277905 13.53 <.0001 Seating 1.3022488 0.442389 2.94 0.0040
Multiple Linear Regression Response GP1000MHwy Whole Model Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 19.100521 2.098478 9.10 <.0001 Weight(lb) 0.0040877 0.001203 3.40 0.0010 Cargo 0.0533 0.013787 3.87 0.0002 Seating 0.0268912 0.428283 0.06 0.9501 Horsepower 0.0426999 0.01567 2.73 0.0075
Partial Slopes vs. Marginal Slopes Example
• In order to evaluate the benefits of a proposed irrigation scheme in a certain region, suppose that the relation of yield Y to rainfall R is investigated over several years.
• Data is in rainfall.JMP.
B i v a r i a t e F i t o f Y i e l d B y T o t a l S p r i n g R a i n f a l l
3 0
4 0
5 0
6 0
7 0
8 0
9 0Y
ield
7 8 9 1 0 11 1 2 1 3
T o ta l S p rin g R a in fa ll
L in e a r F it
L i n e a r F i t Y i e l d = 7 6 . 6 6 6 6 6 7 - 1 . 6 6 6 6 6 6 7 T o t a l S p r i n g R a i n f a l l S u m m a r y o f F i t R S q u a r e 0 . 0 2 7 7 7 8 R S q u a r e A d j - 0 . 1 3 4 2 6 R o o t M e a n S q u a r e E r r o r 1 3 . 9 4 4 3 3 M e a n o f R e s p o n s e 6 0 O b s e r v a t i o n s ( o r S u m W g t s ) 8 P a r a m e t e r E s t i m a t e s T e r m E s t i m a t e S t d E r r o r t R a t i o P r o b > | t | I n t e r c e p t 7 6 . 6 6 6 6 6 7 4 0 . 5 5 4 6 1 . 8 9 0 . 1 0 7 6 T o t a l S p r i n g R a i n f a l l - 1 . 6 6 6 6 6 7 4 . 0 2 5 3 8 2 - 0 . 4 1 0 . 6 9 3 2
42.5
45
47.5
50
52.5
55
57.5A
vera
ge S
prin
g Te
mpe
ratu
re
7 8 9 10 11 12 13
Total Spring Rainfall
Bivariate Fit of Average Spring Temperature By Total Spring Rainfall
Higher rainfall is associated with lower temperature.
Multiple Linear Regression Response Yield Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept -144.7619 55.8499 -2.59 0.0487 Total Spring Rainfall 5.7142857 2.680238 2.13 0.0862 Average Spring Temperature 2.952381 0.692034 4.27 0.0080
Rainfall is estimated to be beneficial once temperature is held fixed.
Multiple regression provides a better picture of the benefits of an irrigation scheme because temperature would be held fixed inan irrigation scheme.