Chapter 11 Asymptotic Evaluationsjwatkins/J3_anova.pdf · 2020. 4. 28. · Chapter 11 Asymptotic Evaluations Analysis of Variance 1/27. Linear Models One Way Analysis of VarianceCon

Linear Models One Way Analysis of Variance Confidence Intervals Example Contrasts

Chapter 11Asymptotic Evaluations

Analysis of Variance

1 / 27


OutlineLinear Models

Multiple Linear RegressionAnalysis of Variance

One Way Analysis of VarianceSample MeansSums of SquaresThe F Statistic

Confidence Intervals

ExampleHoney Bee Queen Development Time

Contrasts2 / 27


Basic Set-up

For linear models, we begin with a general structure

y = Xβ + ε.

• y is a matrix whose rows form a series of multivariate measurements, the responsevariables,

• X is a matrix of explanatory variables,

• β is a matrix of parameters, and

• ε is a matrix containing residuals (i.e., errors or noise).

If the residuals have a multivariate normal distribution, then least squares estimation isa maximum likelihood estimation procedure for the β.

3 / 27


Multiple Linear Regression

For multiple linear regression:

• y = (y1, y2, . . . , yn)T is a column vector of responses,

• X is a matrix of predictors,

X =

1 x11 · · · x1k

1 x21 · · · x2k...

.... . .

...1 xn1 · · · xnk

.• β = (β0, β1, . . . , βk)T is a column vector of parameters, and

• ε = (ε1, ε2, . . . , εn)T is a column vector of “errors”.

yi = β0 + β1x1i + · · ·+ βkx1k + εi .

4 / 27


Analysis of Variance

Example. The data on 30 forest plots in Borneo are the number of trees per plot.

never logged logged 1 year ago logged 8 years ago

nj 12 12 9yj 23.750 14.083 15.778sj 5.065 4.981 5.761

We compute these statistics from the data y11, . . . yn11, y12, . . . yn22 and y13, . . . yn33,

yj =1

nj

nj∑i=1

yij and s2j =

1

nj − 1

nj∑i=1

(yij − yj)2.

5 / 27


Overview• The basic question is: Are these

means the same (the null hypothesis)or not (the alternative hypothesis)?

• The basic idea of the test is toexamine the ratio of s2

between, thevariance between groups (indicated bythe variation in the center lines of theboxes) and s2

residual , a statistic thatmeasures the variances within groups.

• If the resulting ratio test statistic issufficiently large, then we say, basedon the data, that the means of thesegroups are distinct and we reject H0.

never logged logged 1 year ago logged 8 years ago

510

15

20

25

30

35

Figure: Side-by-side boxplots of the number oftrees per plot.

6 / 27


One Way Analysis of Variance

The hypothesis for one way analysis of variance is

H0 : µj = µk for all j , k and H1 : µj 6= µk for some j , k.

The data {yij , 1 ≤ i ≤ nj , 1 ≤ j ≤ q} represents that we have nj observation for thej-th group and that we have q groups. The total number of observations is denoted byn = n1 + · · ·+ nq. The model is

yij = µj + εij

where εij are independent N(0, σ2) random variables with σ2 unknown.

For X , here called the design matrix, xij is 1 if the i-th observation belongs to group jand 0 otherwise.

7 / 27


Linear ModelsAssume that β ∈ Rm and that X is a n ×m matrix of rank m < n. Let Y1, . . . ,Yn areindependent normally distributed random variables with mean vector µ = Xβ. Then,the likelihood ratio test of the hypothesis

H0 : Aβ = 0 versus H1 : Aβ 6= 0.

where A is a r ×m matrix has critical region

C = {y;F (y) ≥ F0}.

F is given by

F (y) =

∑nk=1(yk − µk)2 −

∑nk=1(yk − µk)2∑n

k=1(yk − µk)2

n −m

r.

8 / 27


Linear Models

For the expressionn∑

k=1

(yk − µk)2,

• the vector µ is the minimum value under the restriction µ = Xβ, and

• The vector µ is the minimum value under the pair of restrictions µ = Xβ andAβ = 0.

Proof. The likelihood function

L(β, σ2|x, y) =1

(2πσ2)n/2exp− 1

2σ2

n∑k=1

(yk − µk)2 =1

(2πσ2)n/2exp− 1

2σ2(y − µ)T (y − µ)

9 / 27


Linear ModelsThe likelihood ratio

Λ(x, y) =sup{L(β, σ2|x, y); y = Xβ,Aβ = 0}

sup{L(β, σ2|x, y); y = Xβ}

For the numerator, let β be the maximum likelihood estimator for the parameter β andlet µ = X β. Then, the the maximum likelihood estimator for σ2 is

σ2 =1

n

n∑k=1

(yk − µk)2 =1

n(y − µ)T (y − µ)

Therefore

L(β, σ2|x, y) =exp−n

2

(2πσ2)n/2.

10 / 27


Linear ModelsSimilarly, for the denominator, let µ = X ˆβ and

σ2 be the corresponding maximum

likelihood estimates when the null hypothesis is true. Then,

L( ˆβ,σ2|x, y) =

exp−n2

(2πσ2)n/2

.

Consequently, the likelihood ratio test,

λ0 ≥ Λ(x, y) =

(σ2σ2

)n

λ−1/n0 − 1 ≤

σ2

σ2− 1

F (y) = (λ−1/n0 − 1)

n −m

r≤

σ2

σ2− 1

n −m

r

=

∑nk=1(yik − µk)2 −


k=1(yk − µk)2

n −m

r.

11 / 27


Sample MeansFor the F statistic, we introduce two types of sample means:• For Θ, differentiate with respect to µj . The maximum valueis the within group

means, the sample mean inside each of the groups,

µj = y j =1

nj

nj∑i=1

yij , j = 1. . . . , q.

• For Θ0, The For µj are all equal to some value For µ. So, differentiate withrespect to µ to see that the maximum value is the mean of the data taken as awhole, known as the grand mean,

µ = y =1

n

q∑j=1

nj∑i=1

yij =1

n

q∑j=1

nj yj ,

the weighted average of the yj with weights nj , the sample size in each group.

Exercise. For the Borneo rain forest example, show that the grand mean is 18.06055.12 / 27


Sums of SquaresFor the numerator of F (y), we have total sums of squares

SStotal =n∑

k=1

(yk − µk) =

q∑j=1

nj∑i=1

(yij − y)2,

the total square variation of individual observations from their grand mean. The teststatistic is determined by decomposing SStotal . We first rewrite the interior sum as

nj∑i=1

(yij − y)2 =

nj∑i=1

(yij − y j)2 + nj(y j − y)2 = (nj − 1)s2

j + nj(y j − y)2.

Here, s2j is the unbiased sample variance based on the observations in the j-th group.

Exercise. Show the first equality above. (Hint: Begin with the difference in the twosums.)

13 / 27


Sums of SquaresHere m = q, the number of groups. Also,

A =

1 0 · · · 0 −10 1 · · · 0 −1...

.... . . 0 −1

0 0 · · · 1 −1

has q − 1 rows showing µj = µq, j = 1, . . . , q − 1. Thus, rank(A) = q − 1.

F (y) =

∑nk=1(yk − µk)2 −


k=1(yk − µk)2

n −m

r.

=

∑qj=1 nj(y j − y)2/(q − 1)∑qj=1(nj − 1)s2

j /(n − q)=

SSbetween/(q − 1)

SSresidual/(n − q)

14 / 27


Sums of SquaresThis analysis yields a decomposition of the variation

SStotal = SSresidual + SSbetween

with

SSresidual =

q∑j=1

nj∑i=1

(yij − y j)2 =

q∑j=1

(nj − 1)s2j and SSbetween =

q∑j=1

nj(y j − y)2.

For the rain forest example, we find that

SSresidual = (12− 1) · 5.0652 + (12− 1) · 4.9812 + (9− 1) · 5.7612 = 820.6234

and

SSbetween = 12 · (23.750− y)2 + 12 · (14.083− y)2 + 9 · (15.778− y)2) = 625.1793

15 / 27


Sums of Squares

source of degrees of sums of meanvariation freedom squares square

between groups q − 1 SSbetween s2between = SSbetween/(q − 1)

residuals n − q SSresidual s2residual = SSresidual/(n − q)

total n − 1 SStotal

• The q − 1 degrees of freedom between groups is derived from the q groups minus1 degree of freedom used to compute y .

• The n − q degrees of freedom within the groups is derived from the nj − 1 degreeof freedom used to compute the variances s2

j .

16 / 27


Sums of Squares

The analysis of variance information for the Borneo rain forest data is summarized inthe table below.

source of degrees of sums of meanvariation freedom squares square

between groups 2 625.2 312.6

residuals 30 820.6 27.4

total 32 1445.8

17 / 27


The F Statistic

The test statistic is

F =s2between

s2residual

=SSbetween/(q − 1)

SSresidual/(n − q).

• Under the null hypothesis, F is a constant multiple of the ratio of twoindependent χ2 random variables, namely SSbetween and SSresidual .

• This ratio is called an F random variable with q− 1 numerator degrees of freedomand n − q denominator degrees of freedom and written Fq−1,n−q

18 / 27


F Statistic.

For the rain forest data,

F =s2between

s2residual

=312.6

27.4= 11.43.

The critical value for an 0.01 leveltest is 5.390. So, we reject H0 stat-ing mean number of trees does notdepend on logging history.

> 1-pf(11.43,2,30)

[1] 0.0002041322

> qf(0.99,2,30)

[1] 5.390346

0 1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

0.8

1.0

x

df(

x,

2,

30

)

0 1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

0.8

1.0

0 1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

0.8

1.0

0 1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

0.8

1.0

0 1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

0.8

1.0

0 1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

0.8

1.0

Figure: Upper tail critical values. The density for anF2,30 random variable. The indicated values 3.316,4.470, and 5.390 are critical values for significance levelsα = 0.05, 0.02, and 0.01, respectively.

Exercise. Use R to determine these critical values.19 / 27


Confidence IntervalsConfidence intervals are determined using the data from all of the groups as anunbiased estimate s2

residuals = SSresiduals/(n − q) for the variance, σ2. This allows us toincrease the degrees of freedom in the t distribution and reduce the margin of error.Thus, the γ-level confidence interval for µj is

yj ± t(1−γ)/2,n−qsresidual/√nj .

The interval for the difference in µj − µk is similar to that for a pooled two-sample tconfidence interval,

yj − yk ± t(1−γ)/2,n−qsresidual

√1

nj+

1

nk.

The 95% confidence interval for mean number of trees on a lot logged 1 year ago

14.083± 2.042

√27.4√12

= 14.083± 4.714 = (9.369, 18.979).

Exercise. Give the 95% confidence interval for the difference in trees between plotsnever logged plots versus logged 8 years ago.

20 / 27


Honey Bee Queen Development Time• The development time for a

European queen in a honey beehive is suspected to depend onthe temperature of the hive.

• To examine this, queens arereared in a low (31.1◦ C), amedium (32.8◦ C) and a hightemperature hive (34.4◦ C).

• The hypothesis is that highertemperatures increase metabolismrate and thus reduce the timeneeded from the time the egg islaid until an adult queen honeybee emerges from the cell.

Figure: Emerging adult honey bee queen

21 / 27


Honey Bee Queen Development TimeThe hypothesis is

H0 : µlow = µmed = µhigh versus H1 : µlow , µmed , µhigh differ

where µlow ,µmed , and µhigh are, respectively, the mean development time in days forqueen eggs reared in a low, a medium, and a high temperature hive.

Here are the data and a boxplot:

> ehblow<-c(16.2,14.6,15.8,15.8,15.8,15.8,16.2,16.7,15.8,16.7,15.3,14.6,

15.3,15.8)

> ehbmed<-c(14.5,14.7,15.9,15.5,14.7,14.7,14.7,15.5,14.7,15.2,15.2,15.9,

14.7,14.7)

> ehbhigh<-c(13.9,15.1,14.8,15.1,14.5,14.5,14.5,14.5,13.9,14.5,14.8,14.8,

13.9,14.8,14.5,14.5,14.8,14.5,14.8)

> boxplot(ehblow,ehbmed,ehbhigh)

22 / 27


Honey Bee Queen Development Time

> ehb<-c(ehblow,ehbmed,ehbhigh)

> temp<-c(rep(1,length(ehblow)),

rep(2,length(ehbmed)),

rep(3,length(ehbhigh)))

> ftemp<-factor(temp,c(1:3))

> anova(lm(ehb~ftemp))

1 2 3

14.0

14.5

15.0

15.5

16.0

16.5

Analysis of Variance Table

Response: ehb

Df Sum Sq Mean Sq F value Pr(>F)

ftemp 2 11.222 5.6111 23.307 1.252e-07 ***

Residuals 44 10.593 0.2407

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

23 / 27


Contrasts

After completing a one way analysis of variance, resulting, as above, in rejecting thenull hypotheses, a typical follow-up procedure is the use of contrasts. Contrasts use asa null hypothesis that some linear combination of the means equals to zero.

To see if the mean queen development time for medium hive temperature is midwaybetween the time for the high and low temperature hives, we have the contrast,

H0 :1

2(µlow + µhigh) = µmed versus H1 :

1

2(µlow + µhigh) 6= µmed

or

H0 :1

2µlow − µmed +

1

2µhigh = 0 versus H1 :

1

2µlow − µmed +

1

2µhigh 6= 0.

24 / 27


ContrastsNotice that, under the null hypothesis, the mean

E

[1

2Ylow − Ymed +

1

2Yhigh

]=

1

2µlow − µmed +

1

2µhigh = 0

and the variance

Var

(1

2Ylow − Ymed +

1

2Yhigh

)=

1

4

σ2

nlow+

σ2

nmed+

1

4

σ2

nhigh.

This leads to the test statistic

t =12 ylow − ymed + 1

2 yhigh

sresidual√

14nlow

+ 1nmed

+ 14nhigh

=12 15.743− 15.043 + 1

2 14.563

0.4906√

14·14 + 1

14 + 14·19

= 0.7005.

The p-value,

> 2*(1-pt(0.7005,44))

[1] 0.487303

again, is considerably too high to reject the null hypothesis. 25 / 27


ContrastsIf we want to see if the rain forest has seen a change in logged areas over the past 8years in the mean number of trees. This can be written as

H0 : µ2 = µ3 versus H1 : µ2 6= µ3

orH0 : µ2 − µ3 = 0 versus H1 : µ2 − µ3 6= 0

Under the null hypothesis, the test statistic has a t-distribution with n − q= 33− 3 = 30 degrees of freedom. Here

t =y2 − y3

sresidual

√1n2

+ 1n3

=14.083− 15.778

5.234√

112 + 1

9

= −0.7344,

Exercise. Compute the p-value for this two-sided test and comment on the strength ofthe evidence against the null hypothesis.

26 / 27


ContrastsExercise. Under the null hypothesis appropriate for one way analysis of variance, withnj observations in group j = 1, . . . , q and Yj =

∑nji=1 Yij/nj ,

E [c1Y1 + · · ·+ cqYq] = c1µ1 + · · ·+ cqµq, Var(c1Y1 + · · ·+ cqYq) =c2

1σ2

n1+ · · ·+

c2qσ

2

nq.

In general, a contrast begins with a linear combination of the means

ψ = c1µ1 + · · ·+ cqµq.

The hypothesis isH0 : ψ = 0 versus H1 : ψ 6= 0.

For sample means, y1, . . . , yq, the test statistic is

t =c1y1 + · · ·+ cq yq

sresidual

√c2

1

n1+ · · ·+ c2

q

nq

.

which, under the null hypothesis, has a t distribution with n − q degrees of freedom.27 / 27

Documents

Chapter 11 Asymptotic Evaluationsjwatkins/J3_anova.pdf · 2020. 4. 28. · Chapter 11 Asymptotic Evaluations Analysis of Variance 1/27. Linear Models One Way Analysis of VarianceCon