61
1 Experimental Experimental Statistics Statistics - week 14 - week 14 Multiple Regression miscellaneous topics

1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

Embed Size (px)

Citation preview

Page 1: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

1

Experimental StatisticsExperimental Statistics - week 14 - week 14Experimental StatisticsExperimental Statistics - week 14 - week 14

Multiple Regression – miscellaneous topics

Page 2: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

2

Polynomial Regression:2

0 1 2 ... ppy x x x

- we looked at this briefly in Lab

- basically a multiple regression where the independent variables are powers of a single independent variable

- use SAS to compute the independent variables x2, x3, … , xp

Page 3: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

3

Outlier Detection

- there are tests for outliers

- throwing away outliers should technically be done only when there is evidence that the values “do not belong”

Page 4: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

4

Use of Dummy Variables in Regression

Page 5: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

5

Example 6.1, Text page 268-269

Does a drug retains its potency after 1 year of storage?2 groups: 1) fresh product 2) product stored for 1 year n = 10 observations from each group -- indep. samples)

Fresh Stored10.2 9.810.5 9.6 . . . . . .

Variable measured is potency reading

Question: How would you compare groups?

Page 6: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

6

ij i ijy

1-Factor ANOVA Model

where mean of fresh product

mean of 1-year old product

0

1

:

:F S

F S

H

H

We want to test:

Page 7: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

7

data ott269;input type$ y; datalines;F 10.2 F 10.5F 10.3F 10.8F 9.8 . . .S 9.6 S 9.8S 9.9;proc glm; class type; model y=type; means type/lsd; title 'ANOVA -- Potency Data - page 269 (t-test)';run;

Page 8: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

8

ANOVA -- Potency Data - page 269 (t-test) The GLM Procedure

Class Level Information Class Levels Values type 2 F S

The GLM ProcedureDependent Variable: y Sum ofSource DF Squares Mean Square F Value Pr > FModel 1 1.45800000 1.45800000 17.95 0.0005Error 18 1.46200000 0.08122222Corrected Total 19 2.92000000

R-Square Coeff Var Root MSE potency Mean 0.499315 2.821734 0.284995 10.10000

Source DF Type I SS Mean Square F Value Pr > Ftype 1 1.45800000 1.45800000 17.95 0.0005

Source DF Type III SS Mean Square F Value Pr > Ftype 1 1.45800000 1.45800000 17.95 0.0005

Page 9: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

9

Since p =.0005 we reject

0

1

:

:F S

F S

H

H

and conclude that storage time does make a difference.

t Tests (LSD) for y

NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise error rate.

Alpha 0.05 Error Degrees of Freedom 18 Error Mean Square 0.081222 Critical Value of t 2.10092 Least Significant Difference 0.2678

Means with the same letter are not significantly different.

t Grouping Mean N type

A 10.3700 10 F

B 9.8300 10 S

Fresh product has higher potency on average.Also – estimated difference in means = 10.37 – 9.83 = .54

Page 10: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

10

Regression analysis – requires the independent variables to be quantitativequantitative

Let’s consider recoding the group membership variable (i.e. F and S) into the numeric scores:

0 = fresh 1 = stored one year

and running a regression analysis with this new “dummy” variable as a “quantitative” independent variable - let’s call the “dummy” variable x.

0 1y x Regression Model

Page 11: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

11

data ott269;input x y; datalines;0 10.2 0 10.50 10.30 10.80 9.8 . . .1 9.6 1 9.81 9.9;proc reg; model y=x;title ‘Regression Analysis -- Potency Data - page 269';run;

Page 12: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

12

The REG Procedure

Dependent Variable: y Number of Observations Read 20 Number of Observations Used 20

Analysis of Variance Sum of MeanSource DF Squares Square F Value Pr > FModel 1 1.45800 1.45800 17.95 0.0005Error 18 1.46200 0.08122Corrected Total 19 2.92000

Root MSE 0.28500 R-Square 0.4993 Dependent Mean 10.10000 Adj R-Sq 0.4715 Coeff Var 2.82173

Parameter Estimates

Parameter StandardVariable DF Estimate Error t Value Pr > |t|

Intercept 1 10.37000 0.09012 115.06 <.0001x 1 -0.54000 0.12745 -4.24 0.0005

10.37 .54y x Regression Equation:

Page 13: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

13

Note: the regression model

0 1y x

On the basis of this model:

F

S

F S

Page 14: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

14

Dummy Variables with More than 2 Groups

Example: Balloon Data - 4 groups

Page 15: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

15

1122.4 2324.6 3120.3 4419.8 5324.3 6222.2 7228.5 8225.7 9320.210119.611228.812424.013417.114419.315324.216115.817218.318117.519418.720322.921116.322414.023416.624218.125218.926416.027220.128322.529316.030119.331115.932320.3

Balloon Data  Col. 1-2 - observation number Col. 3 - color (1=pink, 2=yellow, 3=orange, 4=blue) Col. 4-7 - inflation time in seconds

“Research Question”:Is the average time required to inflate the balloons the same for each color?

Recall:

Page 16: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

16

GLM Procedure ANOVA --- Balloon Data

Dependent Variable: time Sum ofSource DF Squares Mean Square F Value Pr > FModel 3 126.1512500 42.0504167 3.85 0.0200Error 28 305.6475000 10.9159821Corrected Total 31 431.7987500

R-Square Coeff Var Root MSE time Mean 0.292153 16.31069 3.303934 20.25625

Source DF Type I SS Mean Square F Value Pr > Fcolor 3 126.1512500 42.0504167 3.85 0.0200

Analysis using 1-factor ANOVA Model with 4 Groups

Grouping Mean N color

A 22.575 8 2(yellow) A A 21.875 8 3(orange)

B 18.388 8 1(pink) B B 18.188 8 4(blue)

LSD Results

Page 17: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

17

Dummy Variables

For 4 groups -- 3 dummy variables needed.

1 11 0 if obs. is in group 2, otherwisex x

2 21 0 if obs. is in group 3, otherwisex x

3 31 0 if obs. is in group 4, otherwisex x

0 1 1 2 2 3 3y x x x

Page 18: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

18

11 000 22.4 23 010 24.6 31 000 20.3 44 001 19.8 53 010 24.3 62 100 22.2 72 100 28.5 82 100 25.7 93 010 20.2101 000 19.6112 100 28.8124 001 24.0134 001 17.1144 001 19.3153 010 24.2161 000 15.8172 100 18.3181 000 17.5194 001 18.7203 010 22.9211 000 16.3224 001 14.0234 001 16.6242 100 18.1252 100 18.9264 001 16.0272 100 20.1283 010 22.5293 010 16.0301 000 19.3311 000 15.9323 010 20.3

  Col. 1-2 - observation number Col. 3 - color (1=pink, 2=yellow, 3=orange, 4=blue) Col. 5 X1 Col. 6 X2 Col. 7 X3

Col. 9-12 - inflation time in seconds

Balloon Data Set with Dummy Variables:

Page 19: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

19

ANOVA --- Balloon Data using Dummy Variables The REG Procedure Analysis of Variance

Sum of MeanSource DF Squares Square F Value Pr > FModel 3 126.15125 42.05042 3.85 0.0200Error 28 305.64750 10.91598Corrected Total 31 431.79875

Root MSE 3.30393 R-Square 0.2922

Dependent Mean 20.25625 Adj R-Sq 0.2163 Coeff Var 16.31069

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 18.38750 1.16812 15.74 <.0001 x1 1 4.18750 1.65197 2.53 0.0171 x2 1 3.48750 1.65197 2.11 0.0438 x3 1 -0.20000 1.65197 -0.12 0.9045

Page 20: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

20

According to the Model:

1 0

2 0 1 0 1

0 1 1 2 2 3 3y x x x

3 0 2

4 0 3 3 4 1

1 2 1

2 3 1

Page 21: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

21

Problem 1

Page 22: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

22

Multiple Comparisons for Fixed Effect (Inspection Level)

-- Use MSAB in place of MSE

1 2(y y2 marginal means and ) are declared

to be significantly different (using LSD) if

1 2 22

( ) | | MSAB

α/y y tN

where ▪ N denotes the # of observations involved in the computation of a marginal mean ▪ v denotes the df associated with AB interaction

Recall: Mixed Model

Page 23: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

23

When comparing means using a multiple comparison procedure (i.e. LSD, Bonferroni, etc.) use the MS used in the denominator of the associated F-test

SAS always gives multiple comparison results using MSE

General Rule

Note:

Page 24: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

24

PROC GLM; class group ewe week; TITLE 'Ewe Study'; model milk=group ewe(group) week group*week; random ewe(group)/test; means group week/lsd; output out=newe r=resmilk;RUN;

Ewe Data – problem 1

The GLM ProcedureDependent Variable: milk Sum ofSource DF Squares Mean Square F Value Pr > FModel 23 388848.1481 16906.4412 8.54 <.0001Error 30 59411.1111 1980.3704Corrected Total 53 448259.2593

R-Square Coeff Var Root MSE milk Mean 0.867463 21.57157 44.50135 206.2963

Source DF Type I SS Mean Square F Value Pr > Fgroup 2 256803.7037 128401.8519 64.84 <.0001ewe(group) 6 114788.8889 19131.4815 9.66 <.0001week 5 2970.3704 594.0741 0.30 0.9090group*week 10 14285.1852 1428.5185 0.72 0.6983

Page 25: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

25

Ewe Study The GLM Procedure

Source Type III Expected Mean Square group Var(Error) + 6 Var(ewe(group)) + Q(group,group*week) ewe(group) Var(Error) + 6 Var(ewe(group)) week Var(Error) + Q(week,group*week) group*week Var(Error) + Q(group*week)

Ewe Study The GLM Procedure Tests of Hypotheses for Mixed Model Analysis of Variance

Dependent Variable: milk Source DF Type III SS Mean Square F Value Pr > F* group 2 256804 128402 6.71 0.0295 Error 6 114789 19131

Error: MS(ewe(group))* This test assumes one or more other fixed effects are zero.

Source DF Type III SS Mean Square F Value Pr > F ewe(group) 6 114789 19131 9.66 <.0001* week 5 2970.370370 594.074074 0.30 0.9090 group*week 10 14285 1428.518519 0.72 0.6983

Error: MS(Error) 30 59411 1980.370370 * This test assumes one or more other fixed effects are zero.

Page 26: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

26

Ewe Study The GLM Procedure

Source Type III Expected Mean Square group Var(Error) + 6 Var(ewe(group)) + Q(group,group*week) ewe(group) Var(Error) + 6 Var(ewe(group)) week Var(Error) + Q(week,group*week) group*week Var(Error) + Q(group*week)

Ewe Study The GLM Procedure Tests of Hypotheses for Mixed Model Analysis of Variance

Dependent Variable: milk Source DF Type III SS Mean Square F Value Pr > F* group 2 256804 128402 6.71 0.0295 Error 6 114789 19131

Error: MS(ewe(group))* This test assumes one or more other fixed effects are zero.

Source DF Type III SS Mean Square F Value Pr > F ewe(group) 6 114789 19131 9.66 <.0001* week 5 2970.370370 594.074074 0.30 0.9090 group*week 10 14285 1428.518519 0.72 0.6983

Error: MS(Error) 30 59411 1980.370370 * This test assumes one or more other fixed effects are zero.

Page 27: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

27

t Tests (LSD) for Group Differences

NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise error rate.

Alpha 0.05 Error Degrees of Freedom 30 Error Mean Square 1980.37 Critical Value of t 2.04227 Least Significant Difference 30.295

Means with the same letter are not significantly different.

t Grouping Mean N group

A 291.67 18 1

B 204.44 18 2

C 122.78 18 3

Page 28: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

28

t Tests (LSD) for Group Differences

NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise error rate.

Alpha 0.05 Error Degrees of Freedom 30 Error Mean Square 1980.37 Critical Value of t 2.04227 Least Significant Difference 30.295

Means with the same letter are not significantly different.

t Grouping Mean N group

A 291.67 18 1

B 204.44 18 2

C 122.78 18 3

“Error” Degrees of Freedom =

“Error” Mean Square =

Critical Value of t =

Least Significant Difference =

Corrected:

Page 29: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

29residuals milk

Ewe Data interaction plot – Milk Production by Week

Why non-normal?

Page 30: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

30

Ewe Data – Box Plots

Page 31: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

31

Problem 2

Page 32: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

32

Outliers Removed

Kidney DataOriginal Model Log Model

Page 33: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

33

Kidney Data Original Model Log Model

R2=.855 R2=.866

Page 34: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

34

Kidney Data Original Model Outliers Removed

R2=.855 R2=.871

Page 35: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

35

Kidney Data Log Model

R2=.866Log Model – Outliers Removed

R2=.901

Page 36: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

36

Problem 3

Page 37: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

37

Original Variables Log Survival vs Other Original Variables

Survival Data

Page 38: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

38

Original Variables Survival vs Square of Independent Variables

Survival Data

Page 39: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

39

Dependent Variable: Log(Survival)

Number in Adjusted Model R-Square R-Square Variables in Model

4 0.7122 0.7229 clot prog enzyme liver 3 0.7115 0.7196 clot prog enzyme 5 0.7104 0.7239 clot prog enzyme liver age 4 0.7098 0.7207 clot prog enzyme age 3 0.6781 0.6871 prog enzyme liver 4 0.6758 0.6879 prog enzyme liver age 2 0.6412 0.6479 prog enzyme 3 0.6388 0.6489 prog enzyme age 4 0.5357 0.5531 clot enzyme liver age

Dependent Variable: Survival

Number in Adjusted Model R-Square R-Square Variables in Model

4 0.6999 0.7112 clot prog enzyme liver 5 0.6970 0.7112 clot prog enzyme liver age 3 0.6908 0.6995 clot prog enzyme 4 0.6878 0.6995 clot prog enzyme age 3 0.6520 0.6618 prog enzyme liver 4 0.6487 0.6618 prog enzyme liver age 2 0.5750 0.5829 prog enzyme 3 0.5709 0.5829 prog enzyme age

Page 40: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

40

Survival Data – Log(Survival) Model without Age

Page 41: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

41

Grades“Conditional” – under assumption of good performance on next Thursday’s lab

Final Exam -- optional (scheduled for 8:00 AM – 11:00 AM Friday, May 6)-- “in class” exam

-- will be averaged in equally with the other 2 exams to comprise 75% of grade - can raise or lower final grade

From SyllabusGRADE COMPUTATION:

Exam Grades (75%) Daily Assignments (25%)

Page 42: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

42

We showed that 1-factor ANOVA can be run using regression analysis with dummy variables.

Question: What’s the benefit?

Answer: Dummy variables can be mixed in with regular quantitative variables to give a combination of regression and ANOVA analyses.

Dummy Variables

Page 43: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

43

For 4 groups -- 3 dummy variables needed.

1 11 0 if obs. is in group 2, otherwisex x

2 21 0 if obs. is in group 3, otherwisex x

3 31 0 if obs. is in group 4, otherwisex x

Dummy Variables for 4 Groups:

0, 0, 0 → group 11, 0, 0 → group 20, 1, 0 → group 30, 0, 1 → group 4

Page 44: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

44

Dummy Variables for 4 Groups:

1 0

2 0 1 0 1

0 1 1 2 2 3 3y x x x

3 0 2

4 0 3 3 4 1

1 2 1

2 3 1

Page 45: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

45

1122.4 2324.6 3120.3 4419.8 5324.3 6222.2 7228.5 8225.7 9320.210119.611228.812424.013417.114419.315324.216115.817218.318117.519418.720322.921116.322414.023416.624218.125218.926416.027220.128322.529316.030119.331115.932320.3

Balloon Data  Col. 1-2 - observation number Col. 3 - color (1=pink, 2=yellow, 3=orange, 4=blue) Col. 4-7 - inflation time in seconds

“Research Question”:Is the average time required to inflate the balloons the same for each color?

Page 46: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

46

GLM Procedure ANOVA --- Balloon Data

Dependent Variable: time Sum ofSource DF Squares Mean Square F Value Pr > FModel 3 126.1512500 42.0504167 3.85 0.0200Error 28 305.6475000 10.9159821Corrected Total 31 431.7987500

R-Square Coeff Var Root MSE time Mean 0.292153 16.31069 3.303934 20.25625

Source DF Type I SS Mean Square F Value Pr > Fcolor 3 126.1512500 42.0504167 3.85 0.0200

Analysis using 1-factor ANOVA Model with 4 Groups

Grouping Mean N color

A 22.575 8 2(yellow) A A 21.875 8 3(orange)

B 18.388 8 1(pink) B B 18.188 8 4(blue)

LSD Results

Page 47: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

47

11 000 22.4 23 010 24.6 31 000 20.3 44 001 19.8 53 010 24.3 62 100 22.2 72 100 28.5 82 100 25.7 93 010 20.2101 000 19.6112 100 28.8124 001 24.0134 001 17.1144 001 19.3153 010 24.2161 000 15.8172 100 18.3181 000 17.5194 001 18.7203 010 22.9211 000 16.3224 001 14.0234 001 16.6242 100 18.1252 100 18.9264 001 16.0272 100 20.1283 010 22.5293 010 16.0301 000 19.3311 000 15.9323 010 20.3

  Col. 1-2 - observation number Col. 3 - color (1=pink, 2=yellow, 3=orange, 4=blue) Col. 5 X1 Col. 6 X2 Col. 7 X3

Col. 9-12 - inflation time in seconds

Balloon Data Set with Dummy Variables:

Page 48: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

48

ANOVA --- Balloon Data using Dummy Variables The REG Procedure Analysis of Variance

Sum of MeanSource DF Squares Square F Value Pr > FModel 3 126.15125 42.05042 3.85 0.0200Error 28 305.64750 10.91598Corrected Total 31 431.79875

Root MSE 3.30393 R-Square 0.2922 Dependent Mean 20.25625 Adj R-Sq 0.2163 Coeff Var 16.31069

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 18.38750 1.16812 15.74 <.0001 x1 1 4.18750 1.65197 2.53 0.0171 x2 1 3.48750 1.65197 2.11 0.0438 x3 1 -0.20000 1.65197 -0.12 0.9045

Page 49: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

49

Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 18.38750 1.16812 15.74 <.0001 x1 1 4.18750 1.65197 2.53 0.0171 x2 1 3.48750 1.65197 2.11 0.0438 x3 1 -0.20000 1.65197 -0.12 0.9045

2 1 0 1 20, :We conclude i.e. reject H

0 1 3:We reject H

(i.e. “pink” ≠ “yellow”)

i.e. conclude “pink” ≠ “orange”

0 1 4:We do not reject H i.e. we cannot conclude “pink” and “blue” are different

Page 50: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

50

Dummy Variables for 4 Groups:

1 0

2 0 1 0 1

0 1 1 2 2 3 3y x x x

3 0 2

4 0 3 3 4 1

1 2 1

2 3 1

Page 51: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

51

Grouping Mean N color

A 22.575 8 2(yellow) A A 21.875 8 3(orange)

B 18.388 8 1(pink) B B 18.188 8 4(blue)

LSD Results

Page 52: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

52

Recall:

There was an issue with order in which balloons were inflated - lab assistant “improved”

- we tried to account for this by randomizing run order

run order

inflationtime

Page 53: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

53

Put “run order” in the model.

PROC REG; MODEL y=x1 x2 x3 id; TITLE 'ANOVA --- Balloon Data using Dummy Variables and Run Order';RUN;

t-tests in a MLR model test the effects of individual independent variables while all other independent variables stay constant

- in this example, we can test for color effects while “adjusting for” or taking out the effect of run order

Another Strategy:

Recall:

Page 54: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

54

Sum of MeanSource DF Squares Square F Value Pr > F

Model 4 249.57490 62.39373 9.24 <.0001Error 27 182.22385 6.74903Corrected Total 31 431.79875

Root MSE 2.59789 R-Square 0.5780 Dependent Mean 20.25625 Adj R-Sq 0.5155 Coeff Var 12.82513

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 21.85333 1.22494 17.84 <.0001 x1 1 4.05420 1.29932 3.12 0.0043 x2 1 3.75410 1.30044 2.89 0.0076 x3 1 -0.12002 1.29908 -0.09 0.9271 id 1 -0.21328 0.04987 -4.28 0.0002

PROC REG ANOVA Table – Balloon Data

We can see that: - pink is still significantly different from yellow and orange and not significantly different from blue - there is a significant “run order” effect

Page 55: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

55

ANOVA --- Balloon Data with No Randomization The GLM Procedure Dependent Variable: time Sum of Source DF Squares Mean Square F Value Pr > F Model 3 157.9609375 52.6536458 5.37 0.0048 Error 28 274.4987500 9.8035268 Corrected Total 31 432.4596875 R-Square Coeff Var Root MSE time Mean 0.365262 15.46440 3.131058 20.24688 Source DF Type I SS Mean Square F Value Pr > F color 3 157.9609375 52.6536458 5.37 0.0048

t Tests (LSD) for time

Alpha 0.05 Error Degrees of Freedom 28 Error Mean Square 9.803527 Critical Value of t 2.04841 Least Significant Difference 3.2068

Means with the same letter are not significantly different.

t Grouping Mean N color A 23.475 8 1 A B A 21.088 8 2 B B C 18.625 8 4 C C 17.800 8 3

Page 56: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

56

Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 157.96094 52.65365 5.37 0.0048 Error 28 274.49875 9.80353 Corrected Total 31 432.45969 Root MSE 3.13106 R-Square 0.3653 Dependent Mean 20.24688 Adj R-Sq 0.2973 Coeff Var 15.46440 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 23.47500 1.10700 21.21 <.0001 x1 1 -2.38750 1.56553 -1.53 0.1385 x2 1 -5.67500 1.56553 -3.62 0.0011 x3 1 -4.85000 1.56553 -3.10 0.0044

Non-randomized Balloon Data

Page 57: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

57

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F Model 4 158.41667 39.60417 3.90 0.0126 Error 27 274.04302 10.14974

Corrected Total 31 432.45969 Root MSE 3.18587 R-Square 0.3663 Dependent Mean 20.24688 Adj R-Sq 0.2724 Coeff Var 15.73510 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 23.70938 1.57865 15.02 <.0001 x1 1 -1.97083 2.53061 -0.78 0.4429 x2 1 -4.84167 4.24308 -1.14 0.2639 x3 1 -3.60000 6.11036 -0.59 0.5606 id 1 -0.05208 0.24579 -0.21 0.8338

Non-randomized Balloon Data

Page 58: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

58

DATA survival;INPUT clot prog enzyme liver age gender alch1 alch2 survival;DATALINES;6.7 62 81 2.59 50 0 1 0 6955.1 59 66 1.70 39 0 0 0 4037.4 57 83 2.16 55 0 0 0 7106.5 73 41 2.01 48 0 0 0 3497.8 65 115 4.30 45 0 0 1 23435.8 38 72 1.42 65 1 1 0 348 . . .;

PROC reg;MODEL survival=clot prog enzyme liver age/selection=adjrsq;output out=new r=ressurv p=predsurv;RUN;

Survival Data

PROC reg;MODEL lgsurv=clot prog enzyme liver age/selection=adjrsq;output out=new r=ressvlg p=predsvlg;RUN;

Gender: 0=male, 1=female Alcohol Use alch1 alch2 None 0 0 Moderate 1 0 Heavy 0 1

Page 59: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

59

Dependent Variable: survival Number in Adjusted Model R-Square R-Square Variables in Model

6 0.7611 0.7745 clot prog enzyme liver alch1 alch2 5 0.7606 0.7718 clot prog enzyme liver alch2 7 0.7592 0.7749 clot prog enzyme liver age alch1 alch2 7 0.7591 0.7748 clot prog enzyme liver gender alch1 alch2 6 0.7587 0.7723 clot prog enzyme liver age alch2 6 0.7587 0.7722 clot prog enzyme liver gender alch2 8 0.7571 0.7753 clot prog enzyme liver age gender alch1 alch2 7 0.7568 0.7727 clot prog enzyme liver age gender alch2 5 0.7416 0.7536 clot prog enzyme alch1 alch2

Adjusted R-Square Selection Method

Dependent Variable: log(survival)Number in Adjusted Model R-Square R-Square Variables in Model

6 0.7649 0.7781 clot prog enzyme liver gender alch2 7 0.7634 0.7789 clot prog enzyme liver gender alch1 alch2 5 0.7628 0.7738 clot prog enzyme liver alch2 7 0.7627 0.7782 clot prog enzyme liver age gender alch2 6 0.7614 0.7747 clot prog enzyme liver alch1 alch2 8 0.7612 0.7790 clot prog enzyme liver age gender alch1 alch2 6 0.7605 0.7740 clot prog enzyme liver age alch2 7 0.7591 0.7749 clot prog enzyme liver age alch1 alch2

Page 60: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

60

Dependent Variable: lgsurv Number of Observations Read 108 Number of Observations Used 108 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 19.79454 3.95891 63.66 <.0001 Error 102 6.34335 0.06219 Corrected Total 107 26.13789

Root MSE 0.24938 R-Square 0.7573 Dependent Mean 6.36909 Adj R-Sq 0.7454 Coeff Var 3.91545

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 4.31842 0.12650 34.14 <.0001 prog 1 0.01180 0.00157 7.51 <.0001 enzyme 1 0.01216 0.00135 9.02 <.0001 liver 1 0.13055 0.03001 4.35 <.0001 gender 1 0.05179 0.04935 1.05 0.2964 alch2 1 0.32690 0.06095 5.36 <.0001

Page 61: 1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics

61

None: (0,0) mean survival = 640.5 Moderate: (1,0) mean survival = 608.4 Severe: (0,1) mean survival = 815.2

Alcohol Use