Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be...

Preview:

Citation preview

Single-Factor Studies

KNNL – Chapter 16

Single-Factor Models

• Independent Variable can be qualitative or quantitative

• If Quantitative, we typically assume a linear, polynomial, or no “structural” relation

• If Qualitative, we typically have no “structural” relation

• Balanced designs have equal numbers of replicates at each level of the independent variable

• When no structure is assumed, we refer to models as “Analysis of Variance” models, and use indicator variables for treatments in regression model

Single-Factor ANOVA Model

• Model Assumptions for Model Testing All probability distributions are normal All probability distributions have equal variance Responses are random samples from their

probability distributions, and are independent• Analysis Procedure

Test for differences among factor level means Follow-up (post-hoc) comparisons among pairs or

groups of factor level means

Cell Means Model

11

# of levels of the study factor

# of replicates (cases, units) for the level of the study factor

... overall sample size (number of cases)

1,..., 1,...,

Respo

thi

r

r i Ti

ij i ij i

ij

r

n i

n n n n

Y i r j n

Y

2

2 2

nse for case within the level of the study factor

Population mean for the level of the study factor

~ 0, where Normally and Independently Distributed

are i

th th

thi

ij

ij i ij

ij

j i

i

NID NID

E Y Y

Y

2ndependent ,iN

Cell Means Model – Regression Form

1 2 3

211 11

212 12

1 221 21 2

2 222 22

3 231 31

32 32

Suppose 3 and 2

1 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0

0 1 0 0 0 0 0 0

0 0 1 0 0 0 0

0 0 1

r n n n

Y

Y

Y

Y

Y

Y

Y X β ε ε

2

2

11 1

12 11

21 22

22 23

31 3

32 3

11

0

0 0 0 0 0

1 0 0

1 0 0

0 1 0

0 1 0

0 0 1

0 0 1

2 0 0

0 2 0

0 0 2

E Y

E Y

E Y

E Y

E Y

E Y

Y Y

I

E Y Xβ

X'X X'Y

^

1112 11 12

^

2 221 22 21 22

^331 32 31 32

3

0.5 0 0

0 0.5 0

0 0 0.5

YY Y

Y Y Y Y Y

Y Y Y Y Y

^ -1β = X'X X'Y

Model Interpretations

• Factor Level Means Observational Studies – The i represent the

population means among units from the populations of factor levels

Experimental Studies - The i represent the means of the various factor levels, had they been assigned to a population of experimental units

• Fixed and Random Factors Fixed Factors – All levels of interest are observed in study Random Factors – Factor levels included in study represent a

sample from a population of factor levels

Fitting ANOVA Models

1 1 1

1 1 1 1

22

1 1 1

Notation:

Least Squares and Maximum Likelihood Estimation

Error Sum of Squares:

i i

i i

i

n nr

ij ijn nr rij i ji i

ii ij ijj i j ii i T T T

n nr

ij ij ii j j

Y YY n YY

Y Y Y Y Y Yn n n n n

Q Y

1 1

^ ^1

1

221 11 2

2 1 1

2 1,...,

Setting 0 1,...,

1 1Likelihood: ,..., , | ,..., exp

22

maximizing Likelihood wrt

i k

k

k

i

r

nr

kj ki jk

n

kjnj

kk kkj kjk k

nr

r rn ij ini j

QY k r

YQ

Y n Y k rn

L Y Y Y

2

11 1

^ ^

,..., minimizing

Fitted values: Residuals:

inr

r ij ii j

ij i ij iij ij ij

Y

Y Y e Y Y Y Y

Analysis of Variance

Total Deviation from Deviation of trt mean Deviation trt mean (residual) from overall mean

1 1 1 1

2

1 1

0i i

i

i iij ij

n nr r

i i i iij iji j i j

nr

ij ii j

Y Y Y Y Y Y

Y Y Y Y Y Y Y Y

Y Y Y

2 2

1 1 1 1

2

1 1

2 2

1 1 1

Total (Corrected) Sum of Squares: 1

Treatment Sum of Squares: 1

Error Sum of Squares:

i i

i

i

n nr r

i iji j i j

nr

ij TO Ti j

nr r

i ii TRi j i

Y Y Y

SSTO Y Y df n

SSTR Y Y n Y Y df r

2

1 1

2

212 2 2

1 1 1

Note: Useful result:

1 1 11

Mean Squares: 1

i

i

i

nr

iij E Ti j

TO TR E

n

iij n r rj

ii i i ij i i E T ij i ii

T

SSE Y Y df n r

SSTO SSTR SSE df df df

Y Y

s n s Y Y SSE n s df n r nn

SSTR SSEMSTR MSE

r n

r

ANOVA Table

2

22 1

1

22

1 1

2

1 1

2 2 2

1

Source { }

Treatments 11 1

Error

Total 1

Note:

i

i

r

i iri

iii

nr

iT iji j T

nr

T iji j

r

ii T iji j

df SS MS E MS

nSSTR

r SSTR n Y Y MSTRr r

SSEn r SSE Y Y MSE

n r

n SSTO Y Y

SSTR n Y n Y SSE Y

2

1 1 1

2 2 2 2 2 2 2 2

1 1 1

2 22 22 2 2 2

1 1

2 222 21

i

i

nr r

iii i

nr r

ij i ij ij i ij i i Ti j i

r r

i i i ii i i i ii ii i

r

i ii

T T T

n Y

E Y Y E Y E Y n n

E Y Y E Y E n Y n rn n

nE Y Y E Y E

n n n

2 2 2

T Tn Y n

F-Test for H0:r

0 1

*

2 21 02 2

: ... : Not all are equal

Test Statistic:

Under null hypothesis (and independence and normality of errors):

~ ~ and are independent (independent even if falsT

r A i

r n r

H H

MSTRF

MSE

SSTR SSEH

2

2

*0

e)

1~ 1,

Decision Rule: Reject if 1 ; 1,

T

T

T

SSTRr

MSTRF r n r

SSE MSEn r

MSTRH F F r n r

MSE

General Linear Test of Equal Means

0 1

^ ^

2^ 2

1 1 1 1

: ... Common Mean (Reduced Model)

: Not all are equal (Complete Model)

Reduced Model:

( ) 1

Complete (Full) Mo

i i

r c c

A i

ijc

n nr r

ijij ij R Ti j i j

H

H

Y Y

SSE R Y Y Y Y SSTO df n

^ ^

2^ 2

1 1 1 1

*

del:

( )

( ) ( )1 1

Test Statistic: ( )

i i

i iji

n nr r

ij iij ij F Ti j i j

T TR F

F T T

Y Y

SSE F Y Y Y Y SSE df n r

SSTO SSESSE R SSE F SSTRn n rdf df r

FSSE F SSE SSEdf n r n

MSTR

MSE

r

Factor Effects Model

2

1

Alternative Form of Model (Necessary for interactions in multi-factor models):

"Effect" of factor level

~ 0,

Defining :

Unweighted Mean:

thi i i i i

ij i ij ij

r

ii

i

i

Y NID

r

1

1 1 1

1 1

0

Weighted Mean: s.t. 1 0

Weights may represent the population sizes in observational studies

Note: ... ... 0

r

i

r r r

i i i i ii i i

r r

w w w

Regression Approach – Factor Effects Model1 2 3 1 2 3 3 1 2

11 11

12 12

21 211

22 222

31 31

32 32

Suppose 3 and 2 and Unweighted Mean Model: 0

1 1 0

1 1 0

1 0 1

1 0 1

1 1 1

1 1 1

r n n n

Y

Y

Y

Y

Y

Y

Y X β ε

11 11

12 11

21 221

22 222

31 1 2

32 1 2

1 1 0

1 1 0

1 0 1

1 0 1

1 1 1

1 1 1

E Y

E Y

E Y

E Y

E Y

E Y

E Y Xβ

3

3

11 12 21 22 31 32

11 12 31 32

21 22 31 32

11 12 21 22 31 32

11 12 31 32

6 0 0

0 4 2

0 2 4

1/ 6 0 0

0 1/ 3 1/ 6

0 1/ 6 1/ 3

Y Y Y Y Y Y

Y Y Y Y

Y Y Y Y

Y Y Y Y Y Y

Y Y Y Y

^ -1

X'X X'Y

β = X'X X'Y

^

^

1 1

^221 22 31 32

2

Y

Y Y

Y Y Y Y Y Y

Factor Effects Model with Weighted Mean

1 1 1

1 1

1 1

1 1 1 , 1

11

Weights are relative sample sizes:

0 0

...

1 if 1

if

0 otherwise

ii

T

r r ri

i i i i ii i iT

r ri

r r i i r ii i r

ij ij r ij r ij

ijr

nw

n

nw n

n

nn n

n

Y X X

i

nX i r

n

1, 1

1 if 1

... if

0 otherwise

rij r

r

i r

nX i r

n

Regression for Cell Means Model

1 1

1

11^

0 1

...

1 if 1 1 if ...

0 if 1 0 if

When fitting with a regression package, no intercept is used

Under : ...

ij i ij ij r ijr

r

rr

Y X X

i i rX X

i i r

Y

Y

H

β β

:

1

1

r c

c Y

^

X β β

Randomization (aka Permutation) Tests• Treats the units in the study as a finite population of

units, each with a fixed error term ij

• When the randomization procedure assigns the unit to treatment i, we observe Yij = i + ij

• When there are no treatment effects (all i = 0), Yij = ij

• We can compute a test statistic, such as F* under all (or in practice, many) potential treatment arrangements of the observed units (responses)

• The p-value is measured as proportion of observed test statistics as or more extreme than original.

• Total number of potential permutations = nT!/(n1!...nr!)

Power Approach to Sample Size Choice - Tables

2

* 1 1

2

1 1

When the means are not all equal, the -statistic is non-central :

1~ 1, , where where

1When all sample sizes are equal: where

r r

i i i ii i

TT

r

i ii i

F F

n nF F r n r

r n

n

r

*

The power of the test, when conducted at the significance level of :

Pr 1 ; 1, , See Table B.11

Choose sample sizes so that the power is sufficiently high for specific

mean levels of in

r

T

r

F F r n r

1 1terest ,..., or effects levels of interest ,...,

max minTable B.12 is simple to use for equal sample sizes and

r r

i i

Power Approach to Sample Size Choice – R Code

2

* 1 12

2

1 12

When the means are not all equal, the -statistic is non-central :

~ 1, , where where

When all sample sizes are equal: where

r r

i i i ii i

TT

r r

i ii i

F F

n nF F r n r

n

n

* *

The power of the test, when conducted at the significance level of :

Pr 1 ; 1, | ~ 1, ,

In R:

1 ; 1, (1 , 1, )

Power = 1 1 (1 , 1, ), 1, ,

T T

T T

T T

r

F F r n r F F r n r

F r n r qf r n r

pf qf r n r r n r

Power Approach to Finding “Best” TreatmentGoal: Determining the best treatment (one with highest or lowest mean):

1 Probability the treatment with highest (lowest) sample mean

has highest (lowest) population mean

Differenc

e between highest (lowest) mean and 2nd highest (lowest) mean

Number of treatments

Table B.13 gives for various ,1

Solve for for given ,

r

nr

n

Recommended