72
Group #4 AMS 572 – Data Analysis ANCOVA Professor: Wei Zhu 1/85

Group #4 AMS 572 – Data Analysis

  • Upload
    lalasa

  • View
    50

  • Download
    1

Embed Size (px)

DESCRIPTION

ANCOVA. Group #4 AMS 572 – Data Analysis. Professor : Wei Zhu. Team 4. Lin Wang (Lana). Xian Lin (Ben). Zhide Mo (Jeff). Miao Zhang. Juan E. Mojica. Yuan Bian. Ruofeng Wen. Hemal Khandwala. Lei Lei. Xiaochen Li ( Joe ). Team 4. Introduction to ANCOVA. What is ANCOVA. - PowerPoint PPT Presentation

Citation preview

Page 1: Group  #4 AMS 572 – Data  Analysis

Group #4AMS 572 – Data Analysis

ANCOVA

Professor: Wei Zhu1/85

Page 2: Group  #4 AMS 572 – Data  Analysis

Team 4Lin Wang (Lana)

Zhide Mo (Jeff)

Juan E. Mojica

Yuan Bian

Hemal Khandwala

Xiaochen Li (Joe)

Ruofeng Wen

Miao Zhang

Xian Lin (Ben)

Lei Lei

2/85

Page 3: Group  #4 AMS 572 – Data  Analysis

Team 4

3/85

Page 4: Group  #4 AMS 572 – Data  Analysis

Introduction to ANCOVA

4/85

Page 5: Group  #4 AMS 572 – Data  Analysis

What is ANCOVA

ANCOVA Analysis of Covariance

ANCOVA merge of ANOVA & Linear Regression

Analysis of Variance 5/85

Page 6: Group  #4 AMS 572 – Data  Analysis

Development and Application of ANOVA

6/85

Page 7: Group  #4 AMS 572 – Data  Analysis

ANOVA • described by R. A. Fisher to assist in the

analysis of data from agricultural experiments.

H0 is rejected when it is true

• Compare the means of any number of experimental conditions without any increase in Type 1 error.

7/85

Page 8: Group  #4 AMS 572 – Data  Analysis

ANOVA a way of determining whether the average scores of groups differed significantly.

Psychology Assess the average effect of different experimental conditions on subjects in terms of a particular dependent variable.

8/85

Page 9: Group  #4 AMS 572 – Data  Analysis

Ronald Aylmer Fisher

An English statistician,

Evolutionary biologist, and

Geneticist.

Contributions: Feb.17, 1890 – July 29, 1962

Analysis of Variance(ANOVA), Maximum

likelihood, F-distribution, etc.9/85

Page 10: Group  #4 AMS 572 – Data  Analysis

Development and Application of Linear

Regression

10/85

Page 11: Group  #4 AMS 572 – Data  Analysis

• developed and applied in different areas with

that of ANOVA

• got developed in biology and psychology

• The term "regression" was coined by Francis

Galton in the nineteenth century to describe a

biological phenomenon

Linear Regression

11/85

Page 12: Group  #4 AMS 572 – Data  Analysis

Francis Galton studied

the height of parents and

their adult children

Conclusion: short

parents’ children are usually

shorter than average, but

still taller than their parents.

5’6’’ 5’4’’

5’8’’

5’9’’

<

Average height

Regression toward the Mean 12/85

Page 13: Group  #4 AMS 572 – Data  Analysis

Regression applied to data obtained

from correlational or non-experimental research

Regression analysis helps us

understand the effect of changing one

independent variable on changing dependent

variable value13/85

Page 14: Group  #4 AMS 572 – Data  Analysis

Francis Galton(Feb. 16, 1822-Jan. 17, 1851)English anthropologist, eugenicist, and statistician.

Contributions:• widely promoted regression

toward the mean• created the statistical concept of correlation

• a pioneer in eugenics, coined the term in 1883

• the first to apply statistical methods to the study of human differences 14/85

Page 15: Group  #4 AMS 572 – Data  Analysis

• a statistical technique that combines regression and ANOVA(analysis of variance).

What is ANCOVA

• originally developed by R.A. Fisher to increase the precision of experimental analysis

• applied most frequently in quasi-experimental research

involve variables cannot be controlled directly 15/85

Page 16: Group  #4 AMS 572 – Data  Analysis

• DDDDDDDSDLCJASKDJFLKASJDFLASJD

16/85

Page 17: Group  #4 AMS 572 – Data  Analysis

One-Way Layout Experiment

Treatment1 2

Sample Mean

Sample SD

Balanced design, if

factor ALevels

Samples 17/85

Page 18: Group  #4 AMS 572 – Data  Analysis

• , where

• , where is the grand mean

1, 2,..., ; 1, 2,..., ii a j n

This is a linear model to represent Yij

18/85

Page 19: Group  #4 AMS 572 – Data  Analysis

ESTIMATORS

(grand mean)

19/85

Page 20: Group  #4 AMS 572 – Data  Analysis

Treatment1 2

Sample Mean

Sample SD

𝒏𝒊(𝒚 𝒊− �� )𝟐

… … …

What is SSA?

20/85

Page 21: Group  #4 AMS 572 – Data  Analysis

• the factor A sum of squares

• the factor A mean square, with d.f.

What is SSA?

21/85

Page 22: Group  #4 AMS 572 – Data  Analysis

Treatment1 2

Sample Mean

Sample SD

(𝒚 𝒊𝒋−𝒚 𝒊)𝟐

…… …

What is SSE?

22/85

Page 23: Group  #4 AMS 572 – Data  Analysis

• What is SSE?

23/85

Page 24: Group  #4 AMS 572 – Data  Analysis

Treatment1 2

Sample Mean

Sample SD

𝑺𝑺𝑻=∑𝒊=𝟏

𝒂

∑𝒋=𝟏

𝒏𝒊

(𝒚 𝒊𝒋− �� )𝟐What is SST?

24/85

Page 25: Group  #4 AMS 572 – Data  Analysis

• the total sum of squares

• ANOVA identity

What is SST?

25/85

Page 26: Group  #4 AMS 572 – Data  Analysis

Source of Variance

Sum of Squares Degrees of Freedom

Mean Square F

Treatments

Error

Total

ANOVA TABLE

26/85

Page 27: Group  #4 AMS 572 – Data  Analysis

Theorethical Background

27/85

Page 28: Group  #4 AMS 572 – Data  Analysis

Model of ANOVA

ij i ijY Data, the jth observation

of the ith group

Grand mean of Y

Error N(0,σ2)

Effects of the ith group (We focus on if αi = 0, i = 1, …, a)

28/85

Page 29: Group  #4 AMS 572 – Data  Analysis

Model of Linear Regression

1 0ij ij ijY X

Data, the (ij)th

observation

ErrorPredictor

Slope and Intersect(We focus on the

estimate)29/85

Page 30: Group  #4 AMS 572 – Data  Analysis

ANCOVA is ANOVA merged with Linear Regression

( ..)ij i ij ijY X X

Known Covariate(What is this guy

doing here?)

Effects of the ith group

(We still focus on if αi = 0, i = 1, …, a)

30/85

Page 31: Group  #4 AMS 572 – Data  Analysis

How to perform ANCOVA

( ..)ij i ij ijY X X

¿

( )ij i ijY adjust

(This is just the ANOVA Model!)31/85

Page 32: Group  #4 AMS 572 – Data  Analysis

( ..)ij i ij ijY X X

1 0ij ij ijY X

Within each group, consider αi a constant, and notice that we actually only desire the estimate of slope β instead of INTERSECT.

How do we get ,then?

32/85

Page 33: Group  #4 AMS 572 – Data  Analysis

How do we get ,then?(2)

• Within each group, do Least Square:

. .

2.

( )( )ˆ( )ij i ij ij

iij ij

X X Y Y

X X

• Assume that

33/85

Page 34: Group  #4 AMS 572 – Data  Analysis

2. . .

22.

.

ˆ ( ) ( )( )ˆ

( ) ( )

i ij i ij i ij ii j i j

ij iij ii j

i j

X X X X Y Y

X X X X

How do we get ,then?(3)

• We use Pooled Estimate of β

. .

2.

( )( )ˆ( )ij i ij ij

iij ij

X X Y Y

X X

34/85

Page 35: Group  #4 AMS 572 – Data  Analysis

In each group, find Slope Estimation

via Linear Regression

��𝑖=∑𝑗

¿¿¿

Pool them together

2.

2.

ˆ ( )ˆ

( )

i ij ii j

ij ii j

X X

X X

Get rid of the Covariate ¿

Do ANOVA on the model

~𝑌 𝑖𝑗(𝑎𝑑𝑗𝑢𝑠𝑡)=𝜇+𝛼 𝑖+𝜀𝑖𝑗

ANCOVA begins: ( ..)ij i ij ijY X X

Go home and have dinner.

2 ( ) ?Yammy Cheeseburg ice Coke 35/85

Page 36: Group  #4 AMS 572 – Data  Analysis

ANCOVA, ANOVA and Regression

36/85

ANOVA /ANCOVA

Regression

General Linear Model

Page 37: Group  #4 AMS 572 – Data  Analysis

Simple Linear Regression

0Y X

Response Variable Predictor

Error

IntersectSlope

All of them are Scalars!37/85

Page 38: Group  #4 AMS 572 – Data  Analysis

Multiple Linear Regression

Y X

11 1,( 1)

1 ,( 1)

1

1

n

m m n

x x

x x

1

n

1

n

1

m

y

y

38/85

Page 39: Group  #4 AMS 572 – Data  Analysis

ANOVA: Dummy Variable Regression

0 1i i iY Z Outcome of the ith

unit Categorical variable (binary)

Residual for the ith

unit

coefficient for the intersect

coefficient for the slope

More about the : =1 if unit is the treatment group =0 if unit is the control group

iZ

iZiZ

39/85

Page 40: Group  #4 AMS 572 – Data  Analysis

40

Two-way ANOVA

ijk i j ij ijkY

Response variable

the effect due to any

interaction between the ith level of A and

the jth level of B

Residual for the ith

unit

effect due to the ith level of

factor A

effect due to the jth level of

factor B

Overall mean response

Page 41: Group  #4 AMS 572 – Data  Analysis

General Linear Model

0 1 1 2 2 1 1 2 2...i i i p p p p iy X X X X

Categorical Variables

Continuous Variable

Random Error

Categorical Variables

Continuous Variable

The above formula can be simply denoted as:

41/85

Y X What can this X be?

Before we see an example of X, we have learned thatGeneral Linear Model covers (1) Simple Linear Regression; (2) Multiple Linear Regression; (3) ANOVA; (4) 2-way/n-way ANOVA.

The ith response variable

Page 42: Group  #4 AMS 572 – Data  Analysis

X: Interaction Between Random Variables

Did you see the tricks?Next, let us see what assumptions shall be satisfied before using ANCOVA.

42/85

0 1 1 2 2 3 3Y X X X

X in the GLM might be expanded as

Where X3 in the above formula could be the INTERACTION between X1 and X20 1 1 2 2 3 1 2*Y X X X X

Page 43: Group  #4 AMS 572 – Data  Analysis

1 ... ...i a

Test the Three Assumptions

1. Test the homogeneity of variance

2. Test the homogeneity of regression whether H0:

3. Test whether there is a linear relationship between the dependent variable and covariate.

43/85

Before using ANCOVA…

Page 44: Group  #4 AMS 572 – Data  Analysis

For each i, calculate the MSE/ / 2

i

i i iMSE SSE df SSE n

1. Test the Homogeneity of Variance (1)

44/85

Utilize ( )and ( )i ii iMax MSE Min MSE maxto do a F test

to make sure is a constant under each different

levels.F=Max(MSE ) / ( )i iMin MSE

Page 45: Group  #4 AMS 572 – Data  Analysis

1 ... ...i a 2. Test Whether H0: (1)

45/85

Page 46: Group  #4 AMS 572 – Data  Analysis

i

2. Test Whether H0: 1 ... ...i a

1

aG

ii

SSE SSE

(1) DefineGSSE Sum of Square of Errors within Groups

iSSE Is calculated based on

AND, GSSE is generated by the random error .

(2)

46/85

Page 47: Group  #4 AMS 572 – Data  Analysis

i

2. Test Whether H0: 1 ... ...i a

(2) SSE is generated by

SSB Sum of Square between Groups

• Random Error

SSB is constituted by the difference between different

• Difference between distinct

(3) Let SSB=SSE – SSEG.

We can calculate SSE based on a common

i

(3)

47/85

Page 48: Group  #4 AMS 572 – Data  Analysis

[ ( 1) 1] ( 2) 1/ / 1

/ / ( 2)

Gb e e

b

G G G Ge

df df df a n a n aMSB SSB df SSB a

MSE SSE df SSE a n

Do F test on MSB and MSEG to see whether we can reject our HO

2. Test Whether H0: 1 ... ...i a

MSB Mean Square between GroupsGMSE Mean Square within Groups

F=MSB / MSEG

(4)

48/85

Page 49: Group  #4 AMS 572 – Data  Analysis

3. Test Linear Relationship (1)Assumption 3: Test a linear relationship between the

How to do it?

andHo: = 0 dependent variable covariate.

F test SSRon and SSE

Sum of Square of Regression

49/85

Page 50: Group  #4 AMS 572 – Data  Analysis

From each ix ˆiy0 1

ˆ ˆˆi iy x

SSR is the difference obtained from the summation of the square of the differences between and .yˆiy

3. Test Linear Relationship (2)How to calculate SSR and MSR?

2

1

ˆ( )n

ii

SSR y y

/1MSR SSR

50/85

Page 51: Group  #4 AMS 572 – Data  Analysis

From each ix ˆiy0 1

ˆ ˆˆi iy x

SSE is the error obtained from the summation of the square of the differences between and .

iyˆiy

3. Test Linear Relationship (3)How to calculate SSE and MSE?

2

1

ˆ( )n

i ii

SSE y y

/( 2)MSE SSE n

51/85

Page 52: Group  #4 AMS 572 – Data  Analysis

3. Linear Relationship Test (4)MSRFMSE

0

Based on the T.S. we determine whether to accept H0 ( ) or not.0

Assume Assumptions 01 and 02 are already passed.

• If H0 is true ( ),we do ANOVA.• Otherwise, we do ANCOVA.

So, anytime we want to use ANCOVA, we need to test the three assumptions first!

52/85

Page 53: Group  #4 AMS 572 – Data  Analysis

Application of ANCOVA

53/85

Page 54: Group  #4 AMS 572 – Data  Analysis

Our case• In this hypothetical study, a sample of 36 teams (id in the

data set) of 12-year-old children attending a summer camp participated in a study to determine which one of three different tree-watering techniques worked best to promote tree growth.

Techniques Frequency CodeWatering the base with a hose

10 minutes once per day

1

Watering the ground surrounding (drip system)

2 hours each day 2

Deep watering (sunk pipe) 10 minutes every 3 days

3

54/85

Page 55: Group  #4 AMS 572 – Data  Analysis

Conditions for the experiment• From a large set of equally sized and equally

healthy fast-growing trees, each team was given a tree to plant at the start of the camp.

• Each team was responsible for the watering and general care of their trees

• At the end of the summer, the height of each tree was measured.

60/85

Page 56: Group  #4 AMS 572 – Data  Analysis

Concerns• that some children might have had more

gardening experience than others, and • that any knowledge gained as a result of that

prior experience might affect the way the tree was planted and perhaps even the way in which the children cared for the tree and carried out the watering regime.

How to approach?Create a indicator for that knowledge. (i.e. a 40 point scale gardering experience)

61/85

Page 57: Group  #4 AMS 572 – Data  Analysis

id watering technique

tree growth

dvgardening

exp cov

1 1 39 242 1 36 183 1 30 214 1 42 24

……. ……… ……….. ………32 3 36 1533 3 30 1834 3 39 1835 3 27 936 3 24 6

Real Data

Grouping (1,2,3)

Dependend Variable

Covariate Variable

Data Structure

62/85

Page 58: Group  #4 AMS 572 – Data  Analysis

id watering technique

tree growth

dvgardening

exp cov

1 1 39 242 1 36 183 1 30 214 1 42 24

……. ……… ……….. ………32 3 36 1533 3 30 1834 3 39 1835 3 27 936 3 24 6

Real Data

Grouping (1,2,3)

Dependend Variable

Covariate Variable

( ..)ij i ij ijY X X

Overall Mean Response

Regression coefficient parameter.

Residual error

Data Structure

63/85

Page 59: Group  #4 AMS 572 – Data  Analysis

Model Assumptions

ANCOVASAS

Linearity of Regression

Homogenity of Regression

Homogenity of Variance

and dv is Normal

64/85

Page 60: Group  #4 AMS 572 – Data  Analysis

The Pearson correlation coefficient between the covariate and the dependentvar.is .81150.

n

i in

i i

n

i ii

YX

YX

YXYX

YYXX

YYXXYXEYX

12

12

1,

)()(

))(()])([(),cov(

65/85

Page 61: Group  #4 AMS 572 – Data  Analysis

Assumptions

Clearly a strong linear component to the relationship.

Linearity of regressionassumption appears to be met by the data set

66/85

Page 62: Group  #4 AMS 572 – Data  Analysis

Assumptions (Homogenity of Regresion)

The assumption of homogeneity of regression is tested by examining the interaction of the covariate and the independent variable. If it is not statistically significant, as is the case here, then the assumption is met.

67/85

Page 63: Group  #4 AMS 572 – Data  Analysis

Output

The Model contains the effectsof both the covariate and theindependent variable.

The effects of the covariateand the independent variableare separately evaluated inthis summary table.

68/85

Page 64: Group  #4 AMS 572 – Data  Analysis

Output

69/85

Page 65: Group  #4 AMS 572 – Data  Analysis

Output

Watering techniques coded as 1 (hose watering) and 3 (deep watering) are the only two groups whose means differ significantly

78/85

Page 66: Group  #4 AMS 572 – Data  Analysis

Experiment Conclusions• We can assert that prior gardening experience and

knowledge was quite influential in how well the trees fared under the attention of the young campers.

• when we statistically control for or equate the gardening experience and knowledge of the children, was a relatively strong factor in how much growth was seen in the trees.

• On the basis of the adjusted means, we may therefore conclude that, when we statistically control for gardening experience,deep watering is more effective than hose watering but is not significantly more effective than drip watering.

79/85

Page 67: Group  #4 AMS 572 – Data  Analysis

SAS Code for ANCOVA

GROUP VARIABLE, DEPENDENT VARIABLE and COVARIATE

THIS IS ANCOVA!!!!!

80/85

Page 68: Group  #4 AMS 572 – Data  Analysis

ENTERPRISE GUIDE APPORACH

81/85

Page 69: Group  #4 AMS 572 – Data  Analysis

ENTERPRISE GUIDE APPORACH

Tasks->Graph->Scatter Plot82/85

Page 70: Group  #4 AMS 572 – Data  Analysis

ENTERPRISE GUIDE APPORACH

Tasks->ANOVA->Linear Models83/85

Page 71: Group  #4 AMS 572 – Data  Analysis

ENTERPRISE GUIDE APPORACH

84/85

Page 72: Group  #4 AMS 572 – Data  Analysis

QUESTIONS?THANK YOU!

85/85