41
1 Experimental Experimental Statistics Statistics - week 12 - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression

Experimental Statistics - week 12

Embed Size (px)

DESCRIPTION

Experimental Statistics - week 12. Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression. April 5 -- Lab. Analysis of Variance Approach. Mathematical Fact. SS(Total) = SS(Regression) + SS(Residuals). (S yy ). (SS “explained” by the model). - PowerPoint PPT Presentation

Citation preview

Page 1: Experimental Statistics           - week 12

1

Experimental StatisticsExperimental Statistics - week 12 - week 12Experimental StatisticsExperimental Statistics - week 12 - week 12

Chapter 11:

Linear Regression and Correlation

Chapter 12:

Multiple Regression

Page 2: Experimental Statistics           - week 12

2

April 5 -- Lab

Page 3: Experimental Statistics           - week 12

3

Analysis of Variance Approach

2 2 2

1 1 1

( ) ( ) ( )ˆ ˆ n n n

i i i ii i i

y y y y y y

Mathematical Fact

SS(Total) = SS(Regression) + SS(Residuals)

p. 649

(SS “explained” by the model)

(SS “unexplained” by the model)

(Syy )

Page 4: Experimental Statistics           - week 12

4

Plot of Production vs Cost

Page 5: Experimental Statistics           - week 12

5

SS(???)

Page 6: Experimental Statistics           - week 12

6

SS(???)

Page 7: Experimental Statistics           - week 12

7

SS(???)

Page 8: Experimental Statistics           - week 12

8

2R

2R measures the proportion of the variability in Y that is explained by the regression on X

2 (Regression)

(Total)

SSR

SS

Page 9: Experimental Statistics           - week 12

9

12 8 8 7 12 4 15 11

1015 12 20 8 17 14 24

7 8 812 4 121115

Y X

15

5.3y

y

s

X

Page 10: Experimental Statistics           - week 12

10

The GLM ProcedureDependent Variable: y Sum of Source DF Squares Model 1 19.575 Error 6 174.425 Corrected Total 7 194.000

The GLM ProcedureDependent Variable: y Sum of Source DF Squares Model =SS(reg) 1 170.492 Error =SS(Res) 6 23.508 Corrected Total 7 194.000 =SS(Total)2 170.492

194.0 .899

R

2 19.575

194.0 .101

R

Page 11: Experimental Statistics           - week 12

11

RECALLTheoretical Model

Regression line

0 1y x

0 1ˆ ˆy x

2' (0, )where the s (errors) are distributed N 2- i.e. all the errors have the same variance

1

2 2 20 1

1 1 1

ˆ ˆ

ˆ ˆ( ) [ ( )]ˆn n n

i i i i ii i i

e y y y x

0 and are chosen to minimize

ˆi.e. i i ie y y residuals

Page 12: Experimental Statistics           - week 12

12

Residual Analysis

Examination of residuals to help determine if: - assumptions are met - regression model is appropriate

Residual Plot: Plot of x vs residuals

Page 13: Experimental Statistics           - week 12

13

Page 14: Experimental Statistics           - week 12

14

Page 15: Experimental Statistics           - week 12

15

Study Time Data

PROC GLM; MODEL score=time; OUTPUT out=new r=resid;RUN;

PROC GPLOT; TITLE 'Plot of Residuals'; PLOT resid*time;RUN;

Page 16: Experimental Statistics           - week 12

16

Average Height of Girls by Age

Page 17: Experimental Statistics           - week 12

17

Average Height of Girls by Age

Page 18: Experimental Statistics           - week 12

18

Residual Plot

Page 19: Experimental Statistics           - week 12

19

Residual Analysis

Examination of residuals to help determine if: - assumptions are met - regression model is appropriate

Residual Plot:

- plot of x vs residuals

Normality of Residuals: - probability plot - histogram

Page 20: Experimental Statistics           - week 12

20

Residuals from Car Dataset fit using √hp

Page 21: Experimental Statistics           - week 12

21

Residuals from Car Dataset fit using log(hp)

Page 22: Experimental Statistics           - week 12

22

Y X4.3 4 5.5 56.8 68.0 74.0 45.2 56.6 6 7.5 72.0 4 4.0 5 5.7 6 6.5 7

Data – Page 572

Y = weight loss (wtloss)

X = exposure time (exptime)

Weight loss in a chemical compound as a function of how long it is exposed to air

Page 23: Experimental Statistics           - week 12

23

PROC REG;MODEL wtloss=exptime/r cli clm;output out=new r=resid;RUN;

The REG Procedure

Dependent Variable: wtloss

Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 26.00417 26.00417 40.22 <.0001 Error 10 6.46500 0.64650 Corrected Total 11 32.46917

Root MSE 0.80405 R-Square 0.8009 Dependent Mean 5.50833 Adj R-Sq 0.7810 Coeff Var 14.59701

Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -1.73333 1.16518 -1.49 0.1677 exptime 1 1.31667 0.20761 6.34 <.0001

Page 24: Experimental Statistics           - week 12

24

Plot of Residuals - MLR Model The REG Procedure Dependent Variable: wtloss Output Statistics

Dependent Predict Std ErrorObs Variable Value Mean Predict 95% CL Mean 95% CL Predict Residual1 4.3000 3.5333 0.3884 2.6679 4.3987 1.5437 5.5229 0.76672 5.5000 4.8500 0.2543 4.2835 5.4165 2.9710 6.7290 0.65003 6.8000 6.1667 0.2543 5.6001 6.7332 4.2877 8.0456 0.63334 8.0000 7.4833 0.3884 6.6179 8.3487 5.4937 9.4729 0.51675 4.0000 3.5333 0.3884 2.6679 4.3987 1.5437 5.5229 0.46676 5.2000 4.8500 0.2543 4.2835 5.4165 2.9710 6.7290 0.35007 6.6000 6.1667 0.2543 5.6001 6.7332 4.2877 8.0456 0.43338 7.5000 7.4833 0.3884 6.6179 8.3487 5.4937 9.4729 0.01679 2.0000 3.5333 0.3884 2.6679 4.3987 1.5437 5.5229 -1.533310 4.0000 4.8500 0.2543 4.2835 5.4165 2.9710 6.7290 -0.850011 5.7000 6.1667 0.2543 5.6001 6.7332 4.2877 8.0456 -0.466712 6.5000 7.4833 0.3884 6.6179 8.3487 5.4937 9.4729 -0.9833

Page 25: Experimental Statistics           - week 12

25

Page 26: Experimental Statistics           - week 12

26

The REG Procedure

Dependent Variable: wtloss

Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 26.00417 26.00417 40.22 <.0001 Error 10 6.46500 0.64650 Corrected Total 11 32.46917

Root MSE 0.80405 R-Square 0.8009 Dependent Mean 5.50833 Adj R-Sq 0.7810 Coeff Var 14.59701

Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -1.73333 1.16518 -1.49 0.1677 exptime 1 1.31667 0.20761 6.34 <.0001

???

For testing H0:

For testing H0:

Page 27: Experimental Statistics           - week 12

27

Analysis of Variance Sum of MeanSource DF Squares Square F Value Pr > FModel 1 26.00417 26.00417 40.22 <.0001Error 10 6.46500 0.64650Corrected Total 11 32.46917

Recall: SS(Regression) = “Model SS”

SS(Residual) = “Error SS”

Page 28: Experimental Statistics           - week 12

28

H0: there is no linear relationship between X and Y

H1: there is a linear relationship between X and Y

F MS(Regression) MS(Regression)MS(Residual) MSE

Reject H0 if F > F(1,n – 2)

where

Page 29: Experimental Statistics           - week 12

29

H0: there is no linear relationship between weight loss and exposure timeH1: there is a linear relationship between weight loss and exposure time

Page 30: Experimental Statistics           - week 12

30

Note: In simple linear regression

H0: there is no linear relationship between X and Y

H1: there is a linear relationship between X and Y

and

H0: 0

H1: ≠ 0

are equivalent and F t2

Page 31: Experimental Statistics           - week 12

31

Multiple Regression Use of more than one independent variable to predict Y

0 1 1 ... k ky x x

1 2, ,..., px x x -- call these

2' (0, )- the s (errors) are distributed N 2- i.e. all the errors have the same variance

- errors are independent

Assumptions:

Page 32: Experimental Statistics           - week 12

32

Data

1 11 12 1

2 21 22 2

1 2

k

k

n n n nk

y x x x

y x x x

y x x x

...

...

...

0 1 1 ...i i k ik iy x x

and so we have

ijx ith observation, jth independent variable

Page 33: Experimental Statistics           - week 12

33

0 1 1ˆ ˆ ˆ...ˆ k ky x x

ˆi i ie y y i where is called the th residual

Goal:

1

2 20 1 1

1 1

ˆ ˆ ˆ

ˆ ˆ ˆ( ) [ ( ... )]ˆ

k

n n

i i i i k iki i

y y y x x

0, , ... , are chosen to minimize

Find “best” prediction equation of the form

As before:

Page 34: Experimental Statistics           - week 12

34

Again: the solution involves calculus

-- solving the Normal Equations on page 627

Page 35: Experimental Statistics           - week 12

35

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value

Model k SS(Reg.) MS(Reg.)=SS(Reg.)/k MS(Reg.)/MSE

Error n-k-1 SSE MSE=SSE/(n-k-1)

Corr. Total n-1 SS(Total)

Page 36: Experimental Statistics           - week 12

36

H0: there is no linear relationship between Y and the independent variablesH1: there is a linear relationship between Y and the independent variables

F MS(Regression)MSE

Reject H0 if F > F(k, n k1)

where

Multiple Regression Setting

Page 37: Experimental Statistics           - week 12

37

2R

2R measures the proportion of the variability in Y that is explained by the regression

2 (Regression)

(Total)

SSR

SS

- in MLR Setting has the same interpretation as before

Page 38: Experimental Statistics           - week 12

38

Y X1 X2

4.3 4 .25.5 5 .26.8 6 .28.0 7 .24.0 4 .35.2 5 .36.6 6 .37.5 7 .32.0 4 .44.0 5 .45.7 6 .46.5 7 .4

Data – Page 628

Y = weight loss (wtloss)

X1 = exposure time (exptime)

X2 = relative humidity (humidity)

Weight loss in a chemical compound as a function of exposure time and humidity

Page 39: Experimental Statistics           - week 12

39

The REG Procedure Dependent Variable: wtloss Number of Observations Read 12 Number of Observations Used 12

Analysis of Variance

Sum of MeanSource DF Squares Square F Value Pr > FModel 2 31.12417 15.56208 104.13 <.0001Error 9 1.34500 0.14944Corrected Total 11 32.46917

Root MSE 0.38658 R-Square 0.9586 Dependent Mean 5.50833 Adj R-Sq 0.9494 Coeff Var 7.01810

Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.66667 0.69423 0.96 0.3620 exptime 1 1.31667 0.09981 13.19 <.0001 humidity 1 -8.00000 1.36677 -5.85 0.0002

Chemical Weight Loss – MLR Output

Page 40: Experimental Statistics           - week 12

40

H0: there is no linear relationship between weight loss and the variables exposure time and humidityH1: there is a linear relationship between weight loss and the variables exposure time and humidity

Page 41: Experimental Statistics           - week 12

41

Examining Contributions of Individual X variables

Use t-test for the X variable in question.

- this tests the effect of that particular independent variable while all other independent variables stay constant.

Parameter Estimates Parameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept 1 0.66667 0.69423 0.96 0.3620exptime 1 1.31667 0.09981 13.19 <.0001humidity 1 -8.00000 1.36677 -5.85 0.0002