153

Stat 502 Design and Analysis of Experiments One-Factor …faculty.washington.edu/fscholz/FallStat502_2013/Stat502ANOVA.pdfStat 502 Design and Analysis of Experiments One-Factor ANOAV

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

  • Stat 502Design and Analysis of Experiments

    One-Factor ANOVA

    Fritz Scholz

    Department of Statistics, University of Washington

    January 7, 2014

    1

  • One-Factor ANOVA

    ANOVA is an acronym for Analysis of Variance.

    The primary focus is the di�erence in means of severalpopulations or the di�erence in mean response under severaltreatments

    Variance in ANOVA alludes to the analysis technique.

    The overall data variation is decomposed into several variationcomponents.

    How much of that variation is due to changing the sampledpopulation or changing the treatment?

    How much variation cannot not be attributed to suchsystematic changes?

    2

  • ANOVA Illustrated

    ●●●

    ●●

    ●●●

    mea

    sure

    men

    ts

    2530

    35

    full data set

    ●●●

    ●●

    ●●●●

    ●●●

    grand mean

    ●●●

    ●●

    ●●●

    ●●

    samplemeans

    residual variation

    broken down intoindividual samples

    decomposedvariation

    3

  • The Notion of Factor in One-Factor ANOVA

    It is di�cult to explain the notion of 2-dimensional space tosomeone who has lived only in 1-dimensional space, or3-dimensional space to someone who lives in �atland or4-dimensional space to us in the �real� 3-dimensional world.

    The term Factor similarly alludes to di�erent possibledirections/dimensions in which changes can take place inpopulations or in treatments.

    Example: In soldering circuit boards we could have severaltypes of �ux (say 3) and also several methods of cleaning theboards (say 4).

    Combining each with each, we thus could have 3× 4 = 12distinct treatments.

    However, it is more enlightening to view the e�ects of �ux andcleaning method separately. Each would be called a factor, the�ux factor and the cleaning factor.

    We can then ask which factor is responsible for changes in themean response.

    4

  • More Than 2 Treatments or Populations

    Again we deal with circuit boards.

    Now we investigate 3 types of �uxes: X, Y, Z.

    With 18 circuit boards, randomly assign each �ux to 6 boards.

    In principle, this gives us the randomization referencedistribution and thus a logical basis for a test of the hypothesisH0 : no �ux di�erences.

    Randomize the order of soldering/cleaning, coating, andhumidity chamber slots. These randomizations avoidunintended biases from hidden factors (dimensions).(186

    )(126

    )(66

    )= 18, 564 · 924 · 1 = 17, 153, 136 �ux allocations.

    Note growth in number of splits, dividing 18 into 3 groups of 6.

    The full randomization reference distribution may be pushingthe computing limits =⇒ simulated reference distribution.

    5

  • The Flux3 Data

    SIR Responses

    X Y Z

    9.9 10.7 10.99.6 10.4 11.09.6 9.5 9.59.7 9.6 10.09.5 9.8 11.710.0 9.9 10.2

    units log10(Ohm)X Y Z

    9.5

    10.0

    10.5

    11.0

    11.5

    Flux

    SIR

    ( l

    og10

    (Ohm

    ) )

    ●●

    6

  • Di�erences in the Fluxes?

    To examine whether the �uxes are in some way di�erent intheir e�ects we could again focus on di�erences between themeans of the SIR responses.

    Denote these means by µ1 = µX , µ2 = µY , and µ3 = µZ .

    Mathematically, X ≡ Y and Y ≡ Z =⇒ X ≡ Z .It would seem that testing H0,XY : X ≡ Y andH0,YZ : Y ≡ Z might su�ce.Statistically, X ≈ Y and Y ≈ Z allows for thepossibility that X and Z are su�ciently di�erent.

    To guard against this we could perform all 3 possible 2-sampletests for the respective hypothesis testing problems:

    H0,XY : X ≡ Y vs. H1,XY : µX 6= µYH0,YZ : Y ≡ Z vs. H1,YZ : µY 6= µZH0,XZ : X ≡ Z vs. H1,XZ : µX 6= µZ

    7

  • Probability of Overall Type I Error?

    If we do each such test at level α, what is our chance ofgetting a rejection by at least one of these tests when in factall 3 �uxes are equivalent? (2 versus 4 engines on aircraft,controversy between Boeing and Airbus)

    If these 3 tests are independent of each other we would have

    P0(Overall Type I Error)

    = P0(reject at least one of the hypotheses)

    = 1− P0(accept all of the hypotheses)= 1− P0( accept H0,XY ∩ accept H0,XZ ∩ accept H0,YZ )= 1− (1− α)3 = 0.142625 for α = .05 .

    P0 indicates that all 3 �uxes are the same and that we aredealing with the null or randomization reference distribution.

    8

  • Engine Failure

    If pF = probability of in �ight shutdown = 1/10000, thechance of at least one shutdown on a �ight with k engines is

    P(at least one shutdown) = 1− (1− pF )k ≈ k × pF .

    k 1− (1− pF )k k × pF2 .00019999 .00024 .00039994 .0004

    (1− pF )k assumes that engine shutdowns are independent.This independence is the goal of ETOPS (Extended-rangeTwin-engine Operational Performance Standards)http://en.wikipedia.org/wiki/ETOPS

    E.g., di�erent engines are serviced by di�erent mechanics, orat separate time maintenance events.

    9

    http://en.wikipedia.org/wiki/ETOPS

  • The Multiple Comparison Issue

    If you expose yourself to multiple rare opportunities of makinga wrong decision, the chance of making a wrong decision atleast once (the overall type I error) is much higher thanplanned for in the individual tests.

    This problem is referred to as the multiple comparison issue.

    How much higher is it?

    Calculation based on independence is not correct.

    Any 2 comparisons involve a common sample ⇒ dependence.Boole's inequality bounds the overall type I error probability:

    π0 = P0(Overall Type I Error )

    = P0(reject H0,XY ∪ reject H0,XZ ∪ reject H0,YZ )≤ P0(reject H0,XY ) + P0(reject H0,XZ ) + P0(reject H0,YZ )= 3α = .15 when α = .05

    = expected number of false rejections

    How much smaller than this upper bound is the true π0?

    10

  • Overall Type I Error Probability

    Evaluate it for the randomization reference distribution.

    Get the randomization reference distribution of X̄ − Ȳ forsplits of the 18 SIR values into 3 groups of 6.

    Take the di�erence of averages for the �rst two groups.

    Do this by simulation: Nsim0 = 10000 times.

    For α = .05 get the .95-quantile tcrit of this simulated|X̄ − Ȳ | reference distribution. It serves equally well for testsbased on |X̄ − Z̄ | or |Ȳ − Z̄ |. Why?Then simulate another Nsim1 = 10000 such splits,computing |X̄ − Ȳ |, |X̄ − Z̄ |, and |Ȳ − Z̄ | each time, and tallythe proportions of each individually exceeding tcrit and theproportion of at least one of them exceeding tcrit.

    The resulting proportions are: 0.0451 0.0460 0.0491for the individual tests (≈ the targeted α = .05) and 0.1186for the overall type I error rate.

    The code typeIerror.rateRand is posted on web.

    11

  • A Global Testing View

    Rather than using all 3 pairwise tests statistics |X̄ − Ȳ |,|X̄ − Z̄ |, and |Ȳ − Z̄ | separately, we will address this in aglobal way, using a single discrepancy statistic.

    For now we will focus on the population view.

    In the context of a 3 population model we will test thehypothesis H0 : µ1 = µ2 = µ3 (common value unspeci�ed=⇒ composite hypothesis) against the alternativeH1 : µi 6= µj for some i 6= j .More generally we may have t treatments and ni observationsYi ,1, . . . ,Yi ,ni for the i

    th treatment, i = 1, . . . , t.

    Test H0 : µ1 = . . . = µt against H1 : µi 6= µj for some i 6= j .For Flux3 data: t = 3, n1 = n2 = n3 = 6, a balanced design.

    When the ni are not all the same =⇒ unbalanced design.

    12

  • Useful Models for Treatment Variation

    We have measurements Yij , the jth response under the i th

    treatment, j = 1, . . . , ni and i = 1, . . . , t.

    A total of N = n1 + . . .+ nt measurements.

    Treatment Means Model:

    Yij = µi + �ij with E (�ij) = 0 and var(�ij) = σ2

    View �ij (i.i.d.) as response variation/error/noise.

    Treatment E�ects Model:

    Yij = µ+ τi + �ij with E (�ij) = 0 and var(�ij) = σ2

    µ = µ̄ =∑

    ij µi/N =∑

    i niµi/N = grand meanor ni/N-weighted average of the µi

    τi = µi − µ = µi − µ̄ is the i th treatment e�ect�ij (i.i.d.) = within treatment variation,E (�ij) = 0, var(�ij) = σ2.

    Note: the τi satisfy the constraint:∑

    ij τi =∑

    i niτi = 0.

    13

  • The Reduced Model

    In contrast to the full model with varying treatment means, asdiscussed on the previous slide, we assume in the reducedmodel a single mean for all observations:

    Yij = µ+ �ij with E (�ij) = 0 with var(�ij) = σ2 ,

    i.e., there is no variation or change due to treatments.

    The reduced model corresponds to our stated hypothesis

    H0 : µ1 = . . . = µt or equivalently H0 : τ1 = . . . = τt = 0

    This is a special case of our previous full population model.

    Test this hypothesis by �tting the full model and the reducedmodel to the data and compare the quality of �ts relative toeach other via some discrepancy metric.

    14

  • Full Model Fitting by Least Squares (Gauss/Legendre)

    Minimize the Sum of Squares criterion

    SS(µ1, . . . , µt) =t∑

    i=1

    ni∑j=1

    (Yij − µi )2 over µ = (µ1, . . . , µt) .

    Using Ȳi. =∑ni

    j=1 Yij/ni and the fact∑ni

    j=1(Yij − Ȳi.) = 0:SS(µ1, . . . , µt) =

    t∑i=1

    ni∑j=1

    (Yij−Ȳi.+Ȳi.−µi )2 (a+b)2 = a2+b2+2ab

    =t∑

    i=1

    ni∑j=1

    (Yij − Ȳi.)2 +t∑

    i=1

    ni∑j=1

    (Ȳi. − µi )2

    +2t∑

    i=1

    ni∑j=1

    (Yij − Ȳi.)(Ȳi. − µi )

    =t∑

    i=1

    ni∑j=1

    (Yij − Ȳi.)2 +t∑

    i=1

    ni∑j=1

    (Ȳi. − µi )2 ≥t∑

    i=1

    ni∑j=1

    (Yij − Ȳi.)2

    least squares estimates (LSE) µ̂i = Ȳi. minimizeSS(µ1, . . . , µt) =⇒ SS(µ̂1, . . . , µ̂t) =

    ∑ti=1

    ∑nij=1(Yij − Ȳi.)2.

    15

  • The Dot Notation

    If a1, . . . , an are n numbers then

    a. =n∑

    i=1

    ai and ā. =n∑

    i=1

    ai/n .

    For an array of numbers aij , i = 1, . . . ,m, j = 1, . . . , n, write

    a.j =m∑i=1

    aij ā.j =m∑i=1

    aij/m ai. =n∑

    j=1

    aij āi. =n∑

    j=1

    aij/n

    a.. =m∑i=1

    n∑j=1

    aij and ā.. =m∑i=1

    n∑j=1

    aij/(mn)

    Similarly for higher dimensional arrays aijk , i = 1, . . . ,m,j = 1, . . . , n, k = 1, . . . , `

    aij. =∑̀k=1

    aijk and āij. =∑̀k=1

    aijk/` and so on.

    16

  • Reduced Model Fitting by Least Squares

    Minimize the SS-criterion SS(µ) =∑t

    i=1

    ∑nij=1(Yij − µ)2

    Ȳ.. =∑ij

    Yij/∑i

    ni =∑i

    (ni/N)Ȳi. =⇒∑i

    ∑j

    (Yij − Ȳ..) = 0

    =⇒ SS(µ) =∑ij

    (Yij − µ)2 =∑ij

    (Yij − Ȳ.. + Ȳ.. − µ)2

    =∑ij

    (Yij − Ȳ..)2 +∑ij

    (Ȳ.. − µ)2

    +2∑ij

    (Yij − Ȳ..)(Ȳ.. − µ)

    =∑ij

    (Yij − Ȳ..)2 +∑ij

    (Ȳ.. − µ)2 ≥∑ij

    (Yij − Ȳ..)2

    The least squares estimate (LSE) µ̂ = Ȳ.. minimizes SS(µ)=⇒ SS(µ̂) =

    ∑ij(Yij − Ȳ..)2.

    17

  • Means and Variances of Least Squares Estimates

    E (Ȳi.) = E 1ni

    ni∑j=1

    Yij

    = 1ni

    ni∑j=1

    E (Yij) =1

    ni

    ni∑j=1

    µi = µi

    var(Ȳi.) = var 1ni

    ni∑j=1

    Yij

    = 1n2i

    ni∑j=1

    var(Yij) =1

    n2i

    ni∑j=1

    σ2 =σ2

    ni

    E (Ȳ..) = E 1N

    t∑i=1

    ni∑j=1

    Yij

    = E ( t∑i=1

    niN

    Ȳi.)

    =t∑

    i=1

    niNµi = µ̄

    var

    (Ȳ..)

    = var

    (t∑

    i=1

    niN

    Ȳi.)

    =t∑

    i=1

    (niN

    )2var(Ȳi.) =

    t∑i=1

    niN2

    σ2 =σ2

    N

    18

  • Sum of Squares (SS) Decomposition

    Using∑ni

    j=1(Yij − Ȳi.) = 0 =⇒ sum of squares decomposition

    SST =t∑

    i=1

    ni∑j=1

    (Yij − Ȳ..)2 =t∑

    i=1

    ni∑j=1

    (Yij − Ȳi. + Ȳi. − Ȳ..)2

    =t∑

    i=1

    ni∑j=1

    (Yij − Ȳi.)2 +t∑

    i=1

    ni∑j=1

    (Ȳi. − Ȳ..)2

    +2t∑

    i=1

    ni∑j=1

    (Yij − Ȳi.)(Ȳi. − Ȳ..)

    =t∑

    i=1

    ni∑j=1

    (Yij − Ȳi.)2 +t∑

    i=1

    ni∑j=1

    (Ȳi. − Ȳ..)2

    = SSE + SSTreat

    ANOVA decomposition of total SS variation SST :SST = SSE + SSTreat = SSW + SSB .SST = error variation+treatment variationSST = variation within samples + variation between samples.

    19

  • How to Compare the Model Fits?

    How should we compare the two model �ts

    SSE =t∑

    i=1

    ni∑j=1

    (Yij−Ȳi.)2 and SST =t∑

    i=1

    ni∑j=1

    (Yij−Ȳ..)2 ?

    Under H0 (reduced model) both �ts should be somewhatcomparable, except that the full model �t gave us morefreedom in minimizing the sum of squares.The previous slide showed

    SSTreat+SSE = SST =t∑

    i=1

    ni∑j=1

    (Yij−Ȳ..)2 ≥t∑

    i=1

    ni∑j=1

    (Yij−Ȳi.)2 = SSE

    with SST − SSE = SSTreat =t∑

    i=1

    ni∑j=1

    (Ȳi. − Ȳ..)2 .

    For fair comparison make allowances for this extra freedom.

    Understand E (SST ) and E (SSE ) when H0 is true or false.

    20

  • Unbiasedness of s2: E (s2) = σ2

    Assume X1, . . . ,Xn are i.i.d. with mean µ and variance σ2.

    E

    (1

    n − 1

    n∑i=1

    (Xi − X̄ )2)

    = E (s2) = σ2 ⇒ s2 is unbiased

    Using E (Y 2) = var(Y ) + [E (Y )]2

    ⇒ E ((n − 1)s2) = E

    (n∑

    i=1

    (Xi − X̄ )2)

    = E

    (n∑

    i=1

    (X 2i − 2Xi X̄ + X̄ 2

    ))

    = E

    (n∑

    i=1

    X 2i − nX̄ 2)

    = n(σ2 + µ2)− n(var(X̄ ) + [E (X̄ )]2)= n(σ2 + µ2)− n(σ2/n + µ2) = (n − 1)σ2

    =⇒ E (s2) = σ2

    21

  • E (MSE ) = σ2

    s2i =

    ni∑j=1

    (Yij − Ȳi.)2/(ni − 1) =⇒t∑

    i=1

    (ni − 1)s2i = SSE

    and the result from the previous slide shows

    E

    t∑i=1

    ni∑j=1

    (Yij − Ȳi.)2 = E ( t∑

    i=1

    (ni − 1)s2i

    )=

    t∑i=1

    (ni−1)σ2 = (N−t)σ2

    or the Mean Square for Error

    MSE =SSEN − t

    =

    ∑ti=1

    ∑nij=1(Yij − Ȳi.)2N − t

    is an unbiased estimate for σ2

    True whether H0 : µ1 = . . . = µt holds or not (without normality).

    22

  • E (MSTreat) = σ2+?

    SSTreat =t∑

    i=1

    ni (Ȳi. − Ȳ..)2 =t∑

    i=1

    ni (Ȳ2

    i. − 2Ȳi.Ȳ.. + Ȳ 2..)

    =t∑

    i=1

    ni Ȳ2

    i. − NȲ 2..

    =⇒ E (SSTreat) =t∑

    i=1

    niE (Ȳ2

    i.)− N E (Ȳ 2..) (with/without normality)

    =t∑

    i=1

    ni (var(Ȳi.) + [E (Ȳi.)]2)− N(var(Ȳ..) + [E (Ȳ..)]2)

    =t∑

    i=1

    ni (σ2/ni + µ

    2

    i )− N(σ2/N + µ̄2)

    = (t − 1)σ2 +t∑

    i=1

    ni (µi − µ̄)2

    E (MSTreat) = E

    (SSTreatt − 1

    )= σ2 +

    t∑i=1

    ni(µi − µ̄)2

    t − 1= σ2 +

    t∑i=1

    niτ2i

    t − 1.

    23

  • A Test Statistic for H0

    Under H0 both MSTreat and MSE are unbiased estimates of σ2

    H0 is false=⇒

    ∑ti=1 ni (µi−µ̄)2/(t−1) > 0 =⇒ E (MSTreat) > E (MSE )

    MSTreat will generally be somewhat larger than MSE .

    More so when the µi are more dispersed.

    The ni act as magni�ers!

    This suggests F = MSTreat/MSE as a plausible test statistic.

    Looking at ratio makes more sense than looking at di�erence.

    Any such di�erence should be viewed relative to MSE .

    Use this test statistic also in our randomization test.

    24

  • Equivalent Form for the F -Statistic under Randomization

    In SST = SSTreat + SSE the sum SST stays constant over allpartitions of the full data set into t groups of sizes n1, . . . , nt .

    SSTreat =t∑

    i=1

    ni Ȳ2

    i. − NȲ 2.. = Fequiv − NȲ 2..

    Fequiv =∑t

    i=1 ni Ȳ2i. varies with partition splits.

    Ȳ.. also stays constant over all such partition splits.

    F =N − tt − 1

    SSTreatSSE

    =N − tt − 1

    SSTreatSST − SSTreat

    =N − tt − 1

    Fequiv − NȲ 2..SST − (Fequiv − NȲ 2..)

    ↗ in Fequiv

    Thus the randomization distribution of F is in 1-1correspondence with the randomization distribution of Fequiv .

    We can then take it as an alternate and more easily calculabletest statistic for computing p-values under H0.

    25

  • Randomization Distribution for Flux3

    Simulated Randomization Distribution

    F−equivalent Test Statistic

    Fre

    quen

    cy

    1830 1831 1832 1833 1834 1835

    010

    0020

    0030

    0040

    0050

    00

    F−equivalent test statistic 1832.3

    p−value = 0.04296

    based on 1e+05 simulations

    26

  • R Code for Randomization Distribution

    Ftest.rand

  • F -Distribution as Approximation to the Randomization Distribution

    As in the case of the 2-sample problem one �nds that theFt−1,N−t distribution often provides a good approximation tothe randomization distribution of F .

    The randomization distribution of F is obtained from that ofFequiv via

    F =N − tt − 1

    Fequiv − NȲ 2..SST − (Fequiv − NȲ 2..)

    The next slide shows the quality of this approximation for theFlux3 data set.

    28

  • Randomization Distribution for Flux3

    Simulated Randomization Distribution

    F−Statistic

    Den

    sity

    0 2 4 6 8 10

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    F−statistic 3.6452

    p−value = 0.04296 & p−value = 0.05126 from F−distribution

    based on 1e+05 simulations

    superimposed F−density

    29

  • Assuming Normality

    We now assume that the Yij are independent, normal r.v.'swith the previously indicated model parameters.

    For H0 : µ1 = . . . = µt true or not ⇒ (ni − 1)s2i ∼ σ2χ2ni−1.Further, s21 , . . . , s

    2t are independent and thus

    SSE =t∑

    i=1

    (ni − 1)s2i ∼ σ2χ2n1−1 + . . .+ σ2χ2nt−1 ∼ σ

    2χ2N−t

    SSE is independent of Ȳ1., . . . , Ȳt., since s2i and Ȳi. areindependent for all i and all pairs (s2i , Ȳi.) are independent.=⇒ SSE and SSTreat are independent.Is SSTreat =

    ∑ti=1 ni Ȳ

    2i. − NȲ 2.. ∼ σ2χ2?

    What degrees of freedom f ?

    Under H0 we expect f = t − 1 sinceE (MSTreat) = E (SSTreat/(t − 1)) = σ2.

    30

  • The Distribution of F

    The previous slide and Appendix A, slide 148, establish thefollowing: SSE and SSTreat are independent and

    SSE/σ2 ∼ χ2N−t and SSTreat/σ2 ∼ χ2t−1,λ with λ =

    t∑i=1

    ni (µi−µ̄)2/σ2

    =⇒ F = SSTreat/(t − 1)SSE/(N − t)

    ∼ Ft−1,N−t,λ

    For H0 : µ1 = . . . = µt this becomes the Ft−1,N−t distribution.

    We reject H0 wheneverF ≥ Ft−1,N−t(1− α) = Fcrit = qf(1− α,t− 1,N− t)= (1− α)-quantile of the Ft−1,N−t distribution.Power function: β(λ) = Pλ(F ≥ Ft−1,N−t(1− α)) =

    1− pf(Fcrit,t− 1,N− t, λ)

    31

  • R's anova and lm Applied to Flux3

    > SIR SIR[1] 9.9 9.6 9.6 9.7 9.5 10.0 10.7 10.4 9.5 9.6

    [11] 9.8 9.9 10.9 11.0 9.5 10.0 11.7 10.2> FLUX FLUX[1] "X" "X" "X" "X" "X" "X" "Y" "Y" "Y" "Y" "Y" "Y"

    [13] "Z" "Z" "Z" "Z" "Z" "Z"> anova(lm(SIR~as.factor(FLUX))) # see ?anova & ?lmAnalysis of Variance Table

    Response: SIRDf Sum Sq Mean Sq F value Pr(>F)

    as.factor(FLUX) 2 2.1733 1.0867 3.6452 0.05126 .Residuals 15 4.4717 0.2981---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

    32

  • Discussion of Noncentrality Parameter λ

    The power of the ANOVA F -test is a monotone function ofλ =

    ∑ti=1 ni (µi − µ̄)2/σ2 (See Appendix B, slide 151).

    Let us consider the drivers in λ.

    λ↗ as σ ↘, provided the µi are not all the same.The more di�erence between the µi the higher λ.

    Increasing the sample sizes will magnify ni (µi − µ̄)2.For �xed σ and µi not all equal, the power increases strictly

    ∂λσ2

    ∂ni= (µi − µ̄)2 −

    ∑j

    2nj(µj − µ̄)(µi − µ̄)

    N= (µi − µ̄)2 > 0

    since∂µ̄

    ∂ni=∂(∑

    j njµj/∑

    j nj

    )∂ni

    =(µi − µ̄)

    N

    The sample sizes we can plan for.

    Later: Blocking units into homogeneous groups ⇒ smaller σ.33

  • Optimal Allocation of Sample Sizes?

    We have N experimental units available for testing the e�ectsof t treatments and suppose that N is a multiple of t, sayN = n × t (n and t integer).It would seem best to use samples of equal size n for each ofthe t treatments i.e., we would opt for a balanced design.

    Then we would not emphasize one treatment over any other.

    Optimality criterion that could be used as justi�cation?

    Plan for a balanced design upfront. How large should n be?

    Then something goes wrong with a few observations and theyhave to be discarded from analysis.

    Thus we need to be prepared for unbalanced designs.

    34

  • A Sample Size Allocation Rationale

    We may be concerned with alternatives where all means butone are the same.

    We want to achieve a given power β against such a mean,which deviates by ∆ from the other means (which coincide).

    Since we won't know upfront which mean sticks out, we wouldwant to maximize the minimum power against all suchcontingencies. Max-Min Strategy!

    If µ1 = µ+ ∆ and µ2 = . . . = µt = µ then µ̄ = µ+ n1∆/N.

    With a bit of algebra we get

    λ1 =t∑

    i=1

    ni (µi − µ̄)2/σ2 =N∆2

    σ2n1N

    (1− n1

    N

    )=

    N∆2

    σ2R1

    similarly λi =N∆2

    σ2niN

    (1− ni

    N

    )=

    N∆2

    σ2Ri for the other cases.

    35

  • The Max-Min Solution

    For �xed σ > 0 and ∆ 6= 0 the following maxmin power

    maxn1,...,nt

    min1≤i≤t

    [λi ] = maxn1,...,nt

    min1≤i≤t

    [N∆2

    σ2Ri

    ]is achieved when n1 = . . . = nt . Here Ri = (ni/N)(1− ni/N).Reason: Ri = (ni/N)(1− ni/N) increases for ni/N ≤ 1/2.Since n1 + . . .+ nt = N = nt is �xed, can increase thesmallest Ri only at the expense of lowering some higher Rj .

    This increase only happens when something is left to lower.

    ⇒ maxn1,...,nt

    min1≤i≤t

    [λi ] =N∆2

    σ2n

    N

    (1− n

    N

    )for n = n1 = . . . = nt

    = n∆2

    σ2

    (1− n

    nt

    )= n

    ∆2

    σ2t − 1t

    = n · λ0 .

    Interpret λ0 =∆2

    σ2t−1t more generally as

    ∑(µi − µ̄)2/σ2.

    36

  • An Alternate Rationale (Dean and Voss, p. 52)

    Let C∆ = {µ = (µ1, . . . , µt) : max(µ)−min(µ) ≥ ∆} for ∆ > 0.For �xed σ > 0,∆ > 0 �nd sample sizes n1, . . . , nt (with∑

    ni = nt = N �xed), to maximize the power, i.e.,

    minµ∈C∆

    λ(µ) = N minµ∈C∆

    ∑ti=1 pi (µi − µ̄)2

    σ2with pi =

    niN

    It can again be shown that equal sample size allocation, i.e.,n1 = . . . = nt = n, is the optimal (max-min) strategy.Suppose µ1 ≤ µ2, . . . , µt−1 ≤ µt = µ1 + ∆, then λ(µ) isminimized over the restricted µ2, . . . , µt−1 when these are = µ̄

    µ̄ = (p2 + . . .+ pt−1)µ̄+ p1µ1 + pt(µ1 + ∆)

    =⇒ µ̄ = p1p1 + pt

    µ1 +pt

    p1 + pt(µ1 + ∆) = p̃1µ1 + p̃t(µ1 + ∆)

    =⇒ µt − µ̄ = ∆p̃1 and µ1 − µ̄ = −p̃t∆

    minµ∈C∆

    λ(µ) =N(p1 + pt)∆

    2

    σ2(p̃1p̃

    2

    t + p̃2

    1p̃t) =

    N∆2

    σ2p1pt

    p1 + pt

    For �xed p1 + pt maximized by p1 = pt , i.e., n1 = nt q.e.d.37

  • Some Discussion of maxmin Results

    The previous two maxmin situations are di�erent.

    In the �rst we stipulated one mean di�erent from the others,the latter assumed without treatment e�ect, i.e., the same.

    In the second we assumed two means to di�er by ∆ > 0, whilethe others were somewhere in between.

    i.e., we focus on maximum treatment di�erence.

    In either case nothing was known about the indices of thedi�ering treatment e�ects.

    My hunch is that more general results like this can beformulated and solved with the same resolution.

    Research problem for formulation and resolution!?

    38

  • sample.sizeANOVA (see web page)

    Just as in the case of planning appropriate sample sizes for thetwo-sample situation, the F -test encounters the samedi�culties in terms of the varying impacts of the commonsample size n per treatment.

    n a�ects the critical point of the level α F -test throughtcrit=qf(1-alpha,t-1,N-t)=qf(alpha,t-1,n*t-t).

    n also enters the power function1-pf(tcrit,t-1,n*t-t,lambda) and n enters thepower function through λ. Here λ = n(∆/σ)2(t − 1)/t orλ = n(∆/σ)2/2.

    In either case we should have a reasonable upper bound σu forσ, or express ∆ not in absolute terms but in relation to theunknown σ by specifying ∆/σ.

    To facilitate the choice of appropriate n per treatment, thefunction sample.sizeANOVA is given on class web page.

    39

  • Usage of sample.sizeANOVA

    function (delta.per.sigma=.5,t.treat=3, nrange=2:30,alpha=.05,power0=NULL)

    {# delta.per.sigma is the ratio of delta over sigma for which# one wants to detect a delta shift in one mean while all other# means stay the same, or delta is the maximum difference# between any two means to be detected. t.treat is the number of# treatments. alpha is the desired significance level. nrange is a# range of sample sizes over which the power will be calculated# for that delta.per.sigma. power0 is an optional value for the# target power that will be highlighted on the plot.....}

    40

  • Example Usage of sample.sizeANOVA

    The following three function calls invoke the defaultt.treat=3 to produce the plots on the next three slides.

    > sample.sizeANOVA()> sample.sizeANOVA(nrange=30:100)> sample.sizeANOVA(nrange=70:100,power0=.9)

    n = 77 the minimal sample size under the �rst rationale.

    n = 103 the minimal sample size under the alternate rationale.

    41

  • Sample Size Determination

    5 10 15 20 25 30

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    minimum sample size n per each of t = 3 treatments

    ββ((λλ))

    = ββ

    ((n××

    λλ 0))

    ββ((λλ))

    = ββ

    ((n××

    λλ 1))

    ∆∆σσ

    == 0.5 , αα == 0.05 , λλ0 ==∆∆σσ

    2

    ××t −− 1

    t , λλ1 ==

    1

    2××

    ∆∆σσ

    2

    all µµi are same, except for one, differing by ±± ∆∆max ( µµ1,, ……,, µµt ) −− min ( µµ1,, ……,, µµt ) == ∆∆

    42

  • Sample Size Determination (increased n)

    30 40 50 60 70 80 90 100

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    minimum sample size n per each of t = 3 treatments

    ββ((λλ))

    = ββ

    ((n××

    λλ 0))

    ββ((λλ))

    = ββ

    ((n××

    λλ 1))

    ∆∆σσ

    == 0.5 , αα == 0.05 , λλ0 ==∆∆σσ

    2

    ××t −− 1

    t , λλ1 ==

    1

    2××

    ∆∆σσ

    2

    all µµi are same, except for one, differing by ±± ∆∆max ( µµ1,, ……,, µµt ) −− min ( µµ1,, ……,, µµt ) == ∆∆

    43

  • Sample Size Determination (magni�ed)

    70 80 90 100 110

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    minimum sample size n per each of t = 3 treatments

    ββ((λλ))

    = ββ

    ((n××

    λλ 0))

    ββ((λλ))

    = ββ

    ((n××

    λλ 1))

    ∆∆σσ

    == 0.5 , αα == 0.05 , λλ0 ==∆∆σσ

    2

    ××t −− 1

    t , λλ1 ==

    1

    2××

    ∆∆σσ

    2

    ββ == 0.9

    all µµi are same, except for one, differing by ±± ∆∆max ( µµ1,, ……,, µµt ) −− min ( µµ1,, ……,, µµt ) == ∆∆

    44

  • The E�ect of t

    Even though the number of treatments does not a�ect λ1 ita�ects the power function through the degrees of freedom

    tcrit

  • Sample Size Determination (magni�ed)

    70 80 90 100 110 120 130 140

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    minimum sample size n per each of t = 6 treatments

    ββ((λλ))

    = ββ

    ((n××

    λλ 0))

    ββ((λλ))

    = ββ

    ((n××

    λλ 1))

    ∆∆σσ

    == 0.5 , αα == 0.05 , λλ0 ==∆∆σσ

    2

    ××t −− 1

    t , λλ1 ==

    1

    2××

    ∆∆σσ

    2

    ββ == 0.9

    all µµi are same, except for one, differing by ±± ∆∆max ( µµ1,, ……,, µµt ) −− min ( µµ1,, ……,, µµt ) == ∆∆

    46

  • Degrees of Freedom and Geometry � Single Sample

    X1X2.........Xn

    =

    X̄X̄.........X̄

    ⊥+

    X1 − X̄X2 − X̄

    ...

    ...

    ...Xn − X̄

    ⊥ because (X̄ , . . . , X̄ ) ·

    X1 − X̄X2 − X̄

    ...

    ...

    ...Xn − X̄

    = X̄ ·

    n∑i=1

    (Xi − X̄ ) = 0

    (X̄ , . . . , X̄ ) varies in one dimension, along 1′ = (1, . . . , 1)The residual vector (X1 − X̄ , . . . ,Xn − X̄ ) varies in its(n − 1)-dimensional orthogonal complement.The n residuals thus have n − 1 degrees of freedom.

    47

  • Orthogonal Decomposition of Sample Vector

    x1

    x2

    x = ( x , x )

    x = ( x1 , x2 )

    x − x = ( x1 − x , x2 − x )

    Pythagoras: |x|2 = |x̄|2 + |x− x̄|2 =∑i

    x̄2 +∑i

    (xi − x̄)2

    = nx̄2 +∑i

    (xi − x̄)2 our previous SS decomposition

    48

  • Degrees of Freedom and Geometry in t Samples

    Decomposition of total dimension N =∑

    ni into subspace dimensions

    N = 1 +∑

    (ni − 1) + t − 1

    Y11...

    Y1n1......

    Yt1...

    Ytnt

    =

    Ȳ.....

    Ȳ........

    Ȳ.....

    Ȳ..

    ⊥+

    Y11 − Ȳ1....

    Y1n1 − Ȳ1.......

    Yt1 − Ȳt....

    Ytnt − Ȳt.

    ⊥+

    Ȳ1. − Ȳ.....

    Ȳ1. − Ȳ........

    Ȳt. − Ȳ.....

    Ȳt. − Ȳ..

    ∑i

    ∑j

    Y 2ij =∑i

    ∑j

    Ȳ 2.. +∑i

    ∑j

    (Yij − Ȳ..)2

    =∑i

    ∑j

    Ȳ 2.. +∑i

    ∑j

    (Yij − Ȳi.)2 +∑i

    ∑j

    (Ȳi. − Ȳ..)2

    49

  • Orthogonalities

    ∑i

    ∑j

    Ȳ..(Ȳi. − Ȳ..) = Ȳ..∑i

    ni (Ȳi. − Ȳ..)

    = Ȳ..(∑i

    ∑j

    Yij − NȲ..) = 0

    ∑i

    ∑j

    Ȳ..(Yij − Ȳi.) = Ȳ..∑i

    (ni Ȳi. − ni Ȳi.) = 0

    ∑i

    ∑j

    (Ȳi. − Ȳ..)(Yij − Ȳi.) =∑i

    (Ȳi. − Ȳ..)∑j

    (Yij − Ȳi.) = 0

    50

  • Dimensions of Subspaces or Degrees of Freedom

    Let 1′n = (1, 1, . . . , 1) denote an n-vector of 1's. The vectors

    Ȳ1. − Ȳ.....

    Ȳ1. − Ȳ........

    Ȳt. − Ȳ.....

    Ȳt. − Ȳ..

    = (Ȳ1. − Ȳ..)

    1n10...0

    + . . .+ (Ȳt. − Ȳ..)

    0...01nt

    = (Ȳ1. − Ȳ..)E1 + . . .+ (Ȳt. − Ȳ..)Et = D

    span a (t − 1)-dimensional subspace M ⊂ RN , because E1, . . . ,Et spana t-dimensional subspace of RN and D is always orthogonal to 1N ∈M ,since 1′ND =

    ∑ti=1 ni (Ȳi. − Ȳ..) = 0.

    Note that∑t

    i=1 aiEi ⊥ 1N = (E1 + . . .+ Et) ⇐⇒∑t

    i=1 niai = 0.

    51

  • More on Dimensions and Degrees of Freedom

    Using the standard orthonormal basis vectors eij (with 1 in vectorposition (i , j) and 0 in all other positions) we have that

    R =

    Y11 − Ȳ1....

    Y1n1 − Ȳ1.......

    Yt1 − Ȳt....

    Ytnt − Ȳt.

    =

    (Y11 − Ȳ1.)e11 + . . .+ (Y1n1 − Ȳ1.)e1n1+. . .. . .

    +(Yt1 − Ȳt.)et1 + . . .+ (Ytnt − Ȳt.)etnt⊥ Ei ∀i

    because∑ni

    j=1(Yij − Ȳi.) = 0 for all i . Thus R lives in the N − tdimensional orthogonal complement MN−t of E1, . . . ,Et .Any vector v ∈ MN−t is of form v = a11e11 + . . .+ atntetnt with∑ni

    j=1 aij = 0 for i = 1, . . . , t. Thus the R vectors span MN−t .

    52

  • Orthogonal Decomposition of Sample Space

    ( Y..

    ,, ……,, Y

    .. )

    ( Y1 . −− Y.. ,, ……,, Yt . −− Y.. )

    dimension = t−1

    dim

    ensi

    on =

    1

    ( Y11−− Y1 .

    ,, ……,, Yt nt−−

    Yt . )

    dimension

    = N−t

    |(Y11, . . . ,Ytnt )|2 = |(Ȳ.., . . . , Ȳ..)|2 + |(Y11 − Ȳ1., . . . ,Ytnt − Ȳt.)|2+|(Ȳ1. − Ȳ.., . . . , Ȳ1. − Ȳ.., . . . , Ȳt. − Ȳ.., . . . , Ȳt. − Ȳ..)|2

    ∑i

    ∑j

    Y 2ij =∑i

    ∑j

    Ȳ 2.. +∑i

    ∑j

    (Yij − Ȳi.)2 +∑i

    ∑j

    (Ȳi. − Ȳ..)2

    53

  • Coagulation Example

    Import data: coag coag$ctime[1] 59 60 62 63 63 64 65 66 67 71 66 67 68 68 68[16] 71 56 59 60 61 62 63 63 64> coag$diet[1] A A A A B B B B B B C C C C C C D D D D D D D[24] DLevels: A B C D

    54

  • Plot for Coagulation Example

    AA

    AA B

    BBBB

    B

    CCCCC

    C

    D

    DDDDDDD

    diet

    coag

    ulat

    ion

    time

    (sec

    )

    5055

    6065

    7075

    80

    nA == 4 nB == 6 nC == 6 nD == 8

    55

  • ANOVA for Coagulation Example

    The plot used jitter(coag$ctime) to plot ctime in thevertical direction and to plot its horizontal mean lines.

    This perturbs observations a small random amount to maketied observations more visible.

    The means for diet A and D would have coincided otherwise.

    > anova(lm(ctime~diet,data=coag))# assumes data frame coag with variables ctime and diet# is in the work spaceAnalysis of Variance Table

    Response: ctimeDf Sum Sq Mean Sq F value Pr(>F)

    diet 3 228 76.0 13.571 4.658e-05 ***Residuals 20 112 5.6---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

    56

  • lm for Coagulation Example

    > out names(out)[1] "coefficients" "residuals" "effects"[4] "rank" "fitted.values" "assign"[7] "qr" "df.residual" "contrasts"

    [10] "xlevels" "call" "terms"[13] "model"> out$coefficients # or out$coef

    (Intercept) dietB dietC dietD6.100000e+01 5.000000e+00 7.000000e+00 -2.515253e-15

    Note that these are the estimates µ̂A = 61 (Intercept),µ̂B − µ̂A = 5, µ̂C − µ̂A = 7, µ̂D − µ̂A = 0.

    57

  • Residuals from lm for Coagulation Example

    > out$residuals1 2 3 4

    -2.000000e+00 -1.000000e+00 1.000000e+00 2.000000e+005 6 7 8

    -3.000000e+00 -2.000000e+00 -1.000000e+00 1.111849e-169 10 11 12

    1.000000e+00 5.000000e+00 -2.000000e+00 -1.000000e+0013 14 15 16

    -5.534852e-17 -5.534852e-17 -5.534852e-17 3.000000e+0017 18 19 20

    -5.000000e+00 -2.000000e+00 -1.000000e+00 -1.663708e-1621 22 23 24

    1.000000e+00 2.000000e+00 2.000000e+00 3.000000e+00

    Numbers such as -5.534852e-17 should be treated as 0(computing quirks).

    58

  • Rounded Residuals from lm for Coagulation Example

    > round(out$resid,4)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

    -2 -1 1 2 -3 -2 -1 0 1 5 -2 -1 0 0 0 3 -5 -2 -1 0

    21 22 23 241 2 2 3

    59

  • Fitted Values from lm for Coagulation Example

    > out$fitted.values1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

    61 61 61 61 66 66 66 66 66 66 68 68 68 68 68 68 61 61 61

    20 21 22 23 2461 61 61 61 61

    60

  • Randomization Test for Coagulation Example

    Simulated Randomization Distribution

    F−equivalent Test Statistic

    Fre

    quen

    cy

    98300 98400 98500 98600

    050

    010

    0015

    0020

    0025

    0030

    0035

    00

    F−equivalent test statistic 98532

    p−value = 7e−05

    based on 1e+05 simulations

    61

  • F -Approximation to Coagulation Randomization Test

    Simulated Randomization Distribution

    F−Statistic

    Den

    sity

    0 5 10 15 20 25

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    F−statistic 13.571

    p−value = 7e−05

    p−value = 4.658e−05 from F−distribution

    based on 1e+05 simulations

    62

  • Comparing Treatment Means Ȳi.

    When H0 : µ1 = . . . = µt is not rejected at level α, there islittle purpose looking closer at di�erences between the Ȳi.for the various treatments.

    Any such perceived di�erences could easily have come aboutby simple random variation, even when the hypothesis is true.

    Why then read something into randomness?

    It would be like reading tea leaves!

    However, when the hypothesis is rejected it is quite natural toask in which way the hypothesis was contradicted.

    63

  • Con�dence Intervals for µi

    A �rst step in understanding di�erences in the µi is to look attheir estimates µ̂i = Ȳi. and their con�dence intervals.In any such con�dence interval we can now use the pooledvariance s2 from all t samples and not just the variance s2ifrom the i th sample, i.e. we get

    µ̂i ± tN−t,1−α/2 ×s√ni

    as our 100(1− α)% con�dence interval for µi .This follows as before from the independence of µ̂i and s,

    the fact that (µ̂i − µi )/(σ/√ni ) ∼ N (0, 1)

    and s2/σ2 ∼ χ2N−t/(N − t) and combining this to

    µ̂i − µis/√ni

    =(µ̂i − µi )/(σ/

    √ni )

    s/σ∼ tN−t

    64

  • Validity of Pooling?

    Using s2 instead of s2i improves (narrows) the con�denceintervals for µi .

    This narrowing comes about because tN−t,1−α/2 then usesmuch higher degrees of freedom (N − t � ni − 1) and thusshrinks, up to a point (see later plot).

    The validity of this improvement depends strongly on theassumption that the population variances σ2 behind all tsamples are the same, or at least approximately so.

    Recall earlier discussion of this issue for 2-sample t-test.

    65

  • Standard Errors SE (θ̂)

    Suppose θ̂ is an estimator for a parameter θ of interest. Wedenote by σ2

    θ̂= var(θ̂) = g(θ, ψ) its sampling variance and by

    σθ̂ =√g(θ, ψ) its sampling standard deviation.

    The estimated sampling standard deviation of θ̂, i.e.,

    σ̂θ̂ =√

    g(θ̂, ψ̂) = SE (θ̂), is the standard error of θ̂.

    Example 1: µ̂ = X̄ as estimate of µ has variancevar(µ̂) = σ2/n ⇒ SE (µ̂) = s/

    √n.

    Example 2: s2 ∼ σ2χ2n−1/(n − 1) as estimate of σ2 hassampling variance

    var(s2) =σ4 2(n − 1)

    (n − 1)2=

    2σ4

    n − 1=⇒ SE (s2) = s2

    √2

    n − 1

    Note the di�erent roles of (θ, ψ) in these two examples.

    Example 1: θ = µ and ψ = σ2 and we only use ψ̂ in SE (θ̂).

    Example 2: θ = σ2 and there is no ψ. We only use θ̂ in SE (θ̂).

    66

  • 95%-Rule of Thumb Using SE s

    If θ̂ ≈ N (θ, σ2θ̂) ≈ N (θ,SE 2(θ̂)), as is often the case, then

    θ̂ ± 2× SE (θ̂) is an approximate 95% con�dence interval for θThis results from z.975 = qnorm(.975) = 1.959964 ≈ 2.This works especially well for Student-t based intervals

    µ̄i ± tf ,.975 ×s√ni

    = Ȳi. ± tN−t,.975 ×s√ni

    because tf ,.975 ≈ z.975 for large f , see next slide.

    67

  • tf ,.975 → z.975 = 1.96 ≈ 2

    20 40 60 80 100 120 140

    1.6

    1.8

    2.0

    2.2

    2.4

    t 0.9

    75 ,

    f

    degrees of freedom f

    68

  • Why Rule of Thumb Works for s2

    Why should the rule of thumb work for s2 as estimator of σ2?

    Recall: s2 ∼ σ2χ2n−1/(n − 1).CLT =⇒ approximate normality for s2 since

    (n − 1)s2

    σ2= χ2n−1 =

    n−1∑i=1

    Z 2i ≈ N (n − 1, 2(n − 1))

    =⇒ s2 ≈ N(σ2, 2σ4/(n − 1)

    )=⇒ s2 ± 2× SE (s2) = s2 ± 2× s2

    √2

    n − 1

    since SE (s2) = s2√2/(n − 1) is the estimate of

    σ2√2/(n − 1), the sampling standard deviation of s2.

    69

  • Table of Con�dence Intervals for Flux3 Data

    Although for testing H0 : µ1 = µ2 = µ3 in the case of the Flux3data the p-value of .05126 was not signi�cant at level α = .05 weillustrate the concepts of the di�erent types of con�dence intervalsfor the means.

    95% intervals 95% intervalsFlux µ̂i si s using si using s

    X 9.717 0.194 0.546 [9.513, 9.920] [ 9.242, 10.192]

    Y 9.983 0.471 0.546 [9.489, 10.477] [ 9.508, 10.458]

    Z 10.550 0.797 0.546 [9.714, 11.386] [10.075 , 11.025]

    70

  • Plots of Con�dence Intervals for Flux3 DataS

    IR

    Flux X Flux Y Flux Z8.5

    9.0

    9.5

    10.0

    10.5

    11.0

    11.5

    12.0

    using pooled s2 == ∑∑i==1

    tsi

    2((ni −− 1)) ((N −− t))

    using individual si2

    ●●

    71

  • Tables of Con�dence Intervals for the Coagulation Data

    For testing H0 : µ1 = µ2 = µ3 = µ4 in the case of thecoagulation data the p-value of 4.7 · 10−5 is highly signi�cant.We again illustrate the concepts of the di�erent types ofcon�dence intervals for the means.

    95% intervals 95% intervalsDiet µ̂i si s using si using s

    A 61 1.9 2.2 [57.9, 64.1] [58.7, 63.3]

    B 66 2.1 2.2 [63.8, 68.2] [64.1, 67.9]

    C 68 1.5 2.2 [66.4, 69.6] [66.1, 69.9]

    D 61 2.6 2.2 [58.8, 63.2] [59.4, 59.4]

    72

  • Plots of Con�dence Intervals for Coagulation Dataco

    agul

    atio

    n tim

    e (s

    ec)

    diet A diet B diet C diet D

    5560

    6570

    7580

    ●●

    using pooled s2 == ∑∑i==1

    tsi

    2((ni −− 1)) ((N −− t))

    using individual si2

    ●●

    ●●●

    ●●

    73

  • Simultaneous Con�dence Intervals

    When constructing intervals of the type:

    µ̂i ± tN−t,1−α/2s√ni

    or µ̂i ± tni−1,1−α/2si√ni

    for i = 1, . . . , t

    we should be aware that these intervals don't simultaneouslycover their respective targets µi with probability 1− α.They do so individually. For example

    P

    (µi ∈ µ̂i ± tni−1,1−α/2

    si√ni, i = 1, . . . , t

    )

    =t∏

    i=1

    P

    (µi ∈ µ̂i ± tni−1,1−α/2

    si√ni

    )= (1− α)t < 1− α.

    To get simultaneous 1− α coverage probability we shouldchoose 1−α? for individual interval coverage probability to get

    (1− α?)t = 1− α or α? = 1− (1− α)1/t ≈ αt

    = α̃t .

    A problem in using a pooled estimate s: No independence!

    74

  • α? = 1− (1− α)1/t ≈ α/t

    0.00 0.02 0.04 0.06 0.08 0.10

    0.00

    0.01

    0.02

    0.03

    0.04

    desired overall α

    α t*=

    1−

    (1−

    α)(1

    t) o

    r α~

    t=

    αt

    t = 10

    t = 3

    t = 5

    t = 6

    αt* = 1 − (1 − α)(1 t)

    α~t = α t

    75

  • Dealing with Dependence from Using Pooled s

    Using a pooled estimate s for the standard deviation σ, theprevious con�dence intervals are no longer independent.

    However, it can be shown that

    P

    (µi ∈ µ̂i ± tN−t,1−α?/2

    s√ni, i = 1, . . . , t

    )

    ≥t∏

    i=1

    P

    (µi ∈ µ̂i ± tN−t,1−α?/2

    s√ni

    )= (1− α?)t = 1− α

    This comes from the positive dependence between con�denceintervals through s.

    If one interval is more (less) likely to cover µi due to s, so arethe other intervals more (less) likely to cover their µj .

    Using the same compensation as in the independence casewould let us err on the conservative side, i.e., give us highercon�dence than the targeted 1− α.

    76

  • Boole's and Bonferroni's Inequality

    For any m events E1, . . . ,Em Boole's inequality states

    P(E1 ∪ . . . ∪ Em) ≤ P(E1) + . . .+ P(Em)

    For any m events E1, . . . ,Em Bonferroni's inequality states

    P(E1 ∩ . . . ∩ Em) ≥ 1−m∑i=1

    (1− P(Ei )) =m∑i=1

    P(Ei )− (m − 1)

    The statements are equivalent by taking complements.

    If Ei ={µi ∈ µ̂i ± tN−t,1−α̃/2 s√ni

    }with P(Ei ) = 1− α̃, then

    the simultaneous coverage probability is bounded from below

    P

    (t⋂

    i=1

    Ei

    )≥ 1−

    t∑i=1

    (1−P(Ei )) = 1−tα̃ = 1−α if α̃ = α̃t = α/t ,

    We can achieve at least 1−α probability coverage by choosingthe individual coverage appropriately, namely 1− α̃ = 1− α/t.Almost same adjustment.

    77

  • Decomposing the Mean Vector µ

    The µi variation is best understood via familiar decomposition:

    µ =

    µ1...µ1......µt...µt

    = µ̄ · 1N +

    µ1 − µ̄...

    µ1 − µ̄......

    µt − µ̄...

    µt − µ̄

    The two vectors on the right are orthogonal to each other.

    The �rst vector represents the projection of µ onto 1N .

    The second represents the projection of µ onto Vt−1, a(t − 1)-dimensional subspace of the (N − 1)-dimensionalorthogonal complement VN−1 to 1N . See slide 51.

    The second vector captures all aspects of variation in µ.78

  • Motivating Contrasts

    Any linear function of (µ1 − µ̄, . . . , µt − µ̄) has to be of theform C =

    ∑ti=1 ciµi with

    ∑ti=1 ci = 0.

    t∑i=1

    ai (µi − µ̄) =t∑

    i=1

    aiµi −t∑

    i=1

    ai

    t∑j=1

    njNµj

    =t∑

    i=1

    aiµi −t∑

    i=1

    niNµi

    t∑j=1

    aj =t∑

    i=1

    ciµi

    with ci = ai −niN

    t∑j=1

    aj

    where

    t∑i=1

    ci =t∑

    i=1

    ai −t∑

    i=1

    niN

    t∑j=1

    aj =t∑

    i=1

    ai −t∑

    j=1

    aj = 0 .

    Such a function C =∑t

    i=1 ciµi of the µi , with∑t

    i=1 ci = 0,is called a contrast.

    79

  • Examples of Contrasts

    Say we have 4 treatments with respective means µ1, . . . , µ4.

    We may be interested in contrasts of the following formC12 = µ1 − µ2 with c′ = (c1, . . . , c4) = (1,−1, 0, 0).Similarly for the other di�erences Cij = µi − µj . There are(42

    )= 6 such contrasts.

    Sometimes one of the treatments, say the �rst, is singled outas the control. We may then be interested in just the 3contrasts C12,C13 and C14 or we may be interested inC1.234 = µ1 − µ2+µ3+µ43 with c

    ′ = (1,−13 ,−13 ,−

    13).

    Sometimes the �rst 2 treatments share something in commonand so do the last 2. One might then try:C12.34 =

    µ1+µ22− µ3+µ4

    2with c′ = ( 1

    2, 12,− 1

    2,− 1

    2)

    di�erence of average treatment e�ect between between the 2camps.

    80

  • Estimates and Con�dence Intervals for Contrasts

    A natural estimate of C =∑t

    i=1 ciµi is Ĉ =∑t

    i=1 ci µ̂i =∑t

    i=1 ci Ȳi..

    We have E (Ĉ ) = E

    (t∑

    i=1

    ci Ȳi.)

    =t∑

    i=1

    ciE(Ȳi.)

    =t∑

    i=1

    ciµi = C

    and var(Ĉ ) = var

    (t∑

    i=1

    ci Ȳi.)

    =t∑

    i=1

    c2i var(Ȳi.)

    =t∑

    i=1

    c2i σ2/ni .

    Under the normality assumption for the Yij we have

    Ĉ − C

    s√∑t

    i=1 c2

    i /ni

    ∼ tN−t where s2 =∑t

    i=1(ni − 1)s2iN − t

    =

    ∑ij(Yij − Ȳi.)2N − t

    = MSE .

    =⇒ Ĉ ± tN−t,1−α/2 · s ·

    √√√√ t∑i=1

    c2i /ni

    is a 100(1− α)% con�dence interval for C .

    81

  • Testing H0 : C = 0

    Based on the duality of testing and con�dence intervals we cantest the hypothesis H0 : C = 0 by rejecting it whenever theprevious con�dence interval does not contain C = 0.

    Similarly, reject H0 : C = C0 by rejecting it whenever theprevious con�dence interval does not contain C = C0Another notation for this interval is

    Ĉ ± tN−t,1−α/2 · SE (Ĉ ) where SE (Ĉ ) = s ·

    √√√√ t∑i=1

    c2i /ni .

    SE (Ĉ ) is the standard error of Ĉ .

    82

  • Simultaneous Con�dence Intervals for Contrasts

    As with simultaneous con�dence intervals for means we needto face the issue of simultaneous coverage probability inrelation to the individual coverage probability for each interval.

    We will introduce/compare several such procedures, althoughthere are still others.

    Multiple comparisons is a very active research area.

    Simultaneous Statistical Inference by Miller (1966)Multiple Comparison Procedures by Hochberg and Tamhane (1987)Multiple Comparisons: Theory and Methods by Hsu (1996)Multiple Comparisons and Multiple Tests by Westfall (2000)Multiple Comparisons Using R by Bretz, Hothorn, Westfall (2011)

    83

  • Paired Comparisons: Fisher's Protected LSD Method

    After rejecting H0 : µ1 = . . . = µt one is often interested inlooking at all

    (t2

    )pairwise contrasts Cij = µi − µj .

    The following procedure is referred to as Fisher's ProtectedLeast Signi�cant Di�erence (LSD) Method.

    It consists of possibly two stages:

    1) Perform α level F -test for testing H0. If H0 is not rejected,stop.

    2) If H0 is rejected, form all(t2

    )(1− α)-level con�dence intervals

    for Cij = µi − µj :

    Îij = µ̂i − µ̂j ± tN−t,1−α/2 · s ·

    √1

    ni+

    1

    nj

    and declare all µi − µj 6= 0 for which Îij does not contain zero.

    Here LSD = tN−t,1−α/2 · s ·√

    1ni

    + 1njis the Least Signi�cant Di�erence.

    84

  • Comments on Fisher's Protected LSD Method

    If H0 is true, the chance of making any statementscontradicting H0 is at most α.

    This is the protected aspect of this procedure.

    However, when H0 is not true there are many possiblecontingencies, some of which can give us a higher than desiredchance of pronouncing a signi�cant di�erence, when in factthere is none.

    E.g., if all but one mean (say µ1) are equal and µ1 is far awayfrom µ2 = . . . = µt our chance of rejecting H0 is almost 1.

    However, among the intervals for µi − µj , 2 ≤ i < j we may�nd a signi�cantly higher than α proportion of cases withwrongly declared di�erences.

    This is due to the multiple comparison issue.

    85

  • Pairwise Comparisons: Tukey-Kramer Method

    The Tukey-Kramer method is based on the distribution of

    Qt,f = max1≤i

  • Tukey-Kramer Method: Unequal Sample Sizes

    The simultaneous intervals for all pairwise mean di�erenceswas due to Tukey.

    It was limited by the requirement of equal sample sizes.

    This was addressed by Kramer in the following way.

    In the above con�dence intervals replace n in 1/√n by n?ij ,

    where n?ij is the harmonic mean of ni and nj , i.e.,1/n?ij = (1/ni + 1/nj)/2 or n

    ?ij = 2ninj/(ni + nj).

    Di�erent adjustment for each pair (i , j)!

    It was possible to show

    P(µi − µj ∈ µ̂i − µ̂j ± qt,f ,1−α s/

    √n?ij ∀ i < j

    )≥ 1− α

    simultaneous con�dence intervals with coverage ≥ 1− α.

    87

  • Tukey-Kramer Method for Coagulation Data

    coag.tukey

  • Tukey-Kramer Method for Coagulation Data (continued)

    s

  • Tukey-Kramer Results for Coagulation Data

    > coag.tukey()[,1] [,2]

    [1,] -9.275446 -0.7245544[2,] -11.275446 -2.7245544[3,] -4.056044 4.0560438[4,] -5.824075 1.8240748[5,] 1.422906 8.5770944[6,] 3.422906 10.5770944

    Declare signi�cant di�erences inµ1 − µ2, µ1 − µ3, µ2 − µ4 , and µ3 − µ4.Under H0 the risk of declaring any signi�cant di�erences ≤ α.

    PH0

    (0 6∈ µ̂i − µ̂j ± qt,f ,1−α s/

    √n?ij for some i < j

    )≤ α

    90

  • Bonferroni Con�dence Intervals for Pairwise Contrasts

    Applying Bonferroni's method for simultaneous con�dencestatement, use α̃ = α/

    (t2

    )for individual con�dence statements

    µi − µj ∈ µ̂i − µ̂j ± tN−t,1−α̃/2 · s ·

    √1

    ni+

    1

    nj

    The individual coverage probability is 1− α̃.The joint coverage probability for all pairwise contrasts is

    P(µi − µj ∈ µ̂i − µ̂j ± tN−t,1−α̃/2 · s ∀i < j)

    ≥ 1−(t

    2

    )(1− (1− α̃)) = 1−

    (t

    2

    )α̃ = 1− α

    91

  • Sche�é's Con�dence Intervals for All Contrasts

    Sche�é took the F -test for testing H0 : µ1 = . . . = µt andconverted it into a simultaneous coverage statement aboutcon�dence intervals for all contrasts c′µ =

    ∑ti=1 ciµi :

    P

    c′µ ∈ c′µ̂±√(t − 1) · Ft−1,N−t,1−α · s ·( t∑i=1

    c2i /ni

    )1/2∀ c

    = 1− α

    Coverage statement for an in�nite number of contrasts.

    It can be applied conservatively to all pairwise contrasts.

    The resulting intervals tend to be quite conservative.

    But it compares well with Bonferroni type intervals if appliedto many contrasts.

    92

  • Pairwise Comparison Intervals for Coagulation Data

    (simultaneous) 95%-Intervalsmean Fisher's Bonferroni Sche�é's all

    di�erence Tukey-Kramer protected LSD inequality contrasts methodµ1 − µ2 -9.28 -0.72 -8.19 -1.81 -9.47 -0.53 -9.66 -0.34µ1 − µ3 -11.28 -2.72 -10.19 -3.81 -11.47 -2.53 -11.66 -2.34µ1 − µ4 -4.06 4.06 -3.02 3.02 -4.24 4.24 -4.42 4.42µ2 − µ3 -5.82 1.82 -4.85 0.85 -6.00 2.00 -6.17 2.17µ2 − µ4 1.42 8.58 2.33 7.67 1.26 8.74 1.10 8.90µ3 − µ4 3.42 10.58 4.33 9.67 3.26 10.74 3.10 10.90

    Using any of the four methods declare signi�cant di�erences inµ1 − µ2, µ1 − µ3, µ2 − µ4 , and µ3 − µ4.

    93

  • Simultaneous Paired Comparisons (95%)

    −15

    −10

    −5

    05

    1015

    µµ1 −− µµ2µµ1 −− µµ3

    µµ1 −− µµ4µµ2 −− µµ3

    µµ2 −− µµ4µµ3 −− µµ4

    Pairwise Comparisons of Means (Coagulation Data): 1 −− αα == 0.95

    Tukey−Kramer pairwise comparisonsFisher's protected LSDBonferroni intervalsScheffe's intervals for all contrasts

    94

  • Simultaneous Paired Comparisons (99%)

    −15

    −10

    −5

    05

    1015

    µµ1 −− µµ2µµ1 −− µµ3

    µµ1 −− µµ4µµ2 −− µµ3

    µµ2 −− µµ4µµ3 −− µµ4

    Pairwise Comparisons of Means (Coagulation Data): 1 −− αα == 0.99

    Tukey−Kramer pairwise comparisonsFisher's protected LSDBonferroni intervalsScheffe's intervals for all contrasts

    95

  • Model Diagnostics

    Model: Yij = µi + �ij , j = 1, . . . , ni , i = 1, . . . , t.

    We made the following assumptions:

    A1: {�ij} are independent;A2: var(�ij) = var(Yij) = σ

    2 for all i , j(homogeneity of variances or homoscedasticity);

    A3: {�ij} are normally distributed.These assumption allow us to

    i perform the F -test for homogeneity of means,ii do power calculations,iii plan sample sizes to achieve a desired power,iv obtain simultaneous con�dence intervals for means/contrasts.

    We will examine A2 and A3.

    Won't deal with A1. Use judgment, examine serial correlation?

    96

  • Checking Normality

    Here we would like to check normality of�ij = Yij − µi , j = 1, . . . , ni , i = 1, . . . , t.Not knowing µi we estimate the error term �ij via�̂ij = Yij − µ̂i = Yij − Ȳi..If normality holds then a normal QQ-plot of all theseN = n1 + . . .+ nt estimated error terms (also called residuals)should look roughly linear with intercept near zero.

    qqnorm(residual.vector) =⇒ normal QQ-plot.Slope ≈ σ. We have done this before in the single samplesituation and won't show repeats.

    It is also possible to perform the formal EDF-based tests of �t(KS, CvM, and AD), but they would require minormodi�cations in the package nortest, not available now.

    97

  • Checking Normality by Simulation

    Can adapt the KS, CvM, and AD EDF test of �t criteria andsimulate their null distribution.

    Limiting results by Pierce (1978) support this.

    Use them to judge signi�cant non-normality in residuals.

    DKS = max

    {maxi

    [i

    N− U(i)

    ],max

    i

    [U(i) −

    i − 1N

    ]}DCvM =

    N∑i=1

    [U(i) −

    2i − 12N

    ]2+

    1

    12N

    DAD = −N −1

    N

    N∑i=1

    (2i − 1)[log(U(i)) + log(1− U(i))

    ]where Uij = Φ

    (Yij − Ȳi.

    s

    )and U(1) ≤ . . . ≤ U(N) are the Uij in increasing order.

    98

  • The Simulation

    The distribution of

    Uij = Φ

    (Yij − Ȳi.

    s

    )= Φ

    ((Yij − µi )/σ − (Ȳi. − µi )/σ

    s/σ

    )does not depend on any unknown parameters.

    Thus we may as well simulate the Yiji.i.d.∼ N (0, 1), compute

    Ȳi., i = 1, . . . , t and s and then Uij , sort these values andcompute the respective EDF criteria.

    Repeat this over and over, say Nsim = 10000 times, andcompare the EDF criteria for the actual data set against thesesimulated null distributions to obtain estimated p-values.

    View this as potential homework.

    It may be advantageous to modify the above EDF criteria ifsample sizes are quite di�erent (uncharted territory).

    99

  • Hermit Crab Count Data

    Hermit Crab counts were obtained at 6 di�erent coastline sites.

    Each site obtained counts at 25 randomly selected transects.

    Download the data �le crab.csv from the web into yourwork directory. crab

  • Plot of Hermit Crab Counts

    ●●

    ●●●●●

    ●●

    ●● ●●

    ●●●●●

    ●●

    ●●●

    ●●●

    ●●●●●●●●●

    ●●

    ●●● ●●●●●●●●●●●

    ●●

    ●●●● ●●●●●●●

    ●●●

    ●●

    ●●

    ●●

    ●●●● ●●●●●●●

    ●●●●

    ●●

    ●●

    ●●

    1 2 3 4 5 6

    010

    020

    030

    040

    0

    site

    coun

    t

    101

  • ANOVA for Hermit Crab Count Data

    > out.lm anova(out.lm)Analysis of Variance Table

    Response: crab$countDf Sum Sq Mean Sq F value Pr(>F)

    as.factor(crab$site) 5 76695 15339 2.9669 0.01401 *Residuals 144 744493 5170---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

    > qqnorm(out.lm$residuals)> qqline(out.lm$residuals)

    produced the (not so) normal QQ-plot for the ANOVA residuals onthe next slide.

    102

  • Normal QQ-Plot of Hermit Crab Count ANOVA Residuals

    ●●

    ●●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●●

    ●●

    ●● ●

    ●●●

    ●●●

    ●●

    ●● ●●

    ●●

    ● ●●

    ●●●●●● ●●●●●

    ● ●

    ●●●●● ●● ●●● ●

    ●●●

    ● ●

    ●●

    ●●

    ●●

    ●●●●● ● ●●●●

    ●●●

    ●●

    ●●

    ●●

    −2 −1 0 1 2

    010

    020

    030

    040

    0

    Normal Q−Q Plot

    Theoretical Quantiles

    Sam

    ple

    Qua

    ntile

    s

    103

  • Checking for Homoscedasticity

    The appropriate indicators for checking a constant varianceover all t treatment groups would seem to be s21 , . . . , s

    2t .

    There are various rules of thumb involvingFmin = min(s

    21 , . . . , s

    2t )/max(s

    21 , . . . , s

    2t ).

    For example, if Fmin > 1/3 the constant variance assumptionshould be OK while for Fmin < 1/7 we should deal with it.

    Where the 1/3 or 1/7 come from and what to do in between isnot clear.

    With R it is simple to simulate the distribution for Fmin.

    104

  • Fmin.test

    The R function Fmin.test → on the class web site.It simulates the Fmin distribution, assuming normal sampleswith equal variances.

    The sample sizes may vary.

    The documentation for Fmin.test inside function body.

    Use it to explore any desired rule of thumb, calculating theproportion of Fmin values ≤ to the rule of thumb value.If Fmin,observed is provided, it calculates the estimated p-valuefrom this simulated distribution.

    See the next two slides for examples.

    The validity of this test depends strongly on data normality.

    105

  • Fmin.test(k=3,n=8,a.recip=7)

    k = 3 , n = 8 , Nsim = 10000 , a = 1/ 7

    min ((s12,, ……,, sk

    2)) max ((s12,, ……,, sk

    2))

    Fre

    quen

    cy

    0.0 0.2 0.4 0.6 0.8 1.0

    050

    100

    150

    200

    0.04

    6

    106

  • Fmin.test(k=3,n=c(3,3,4),a.recip=7,

    Fmin.observed=.1)

    k = 3 , n = ( 3, 3, 4 ) , Nsim = 10000 , a = 1/ 7 , Fmin.observed = 0.1

    min ((s12,, ……,, sk

    2)) max ((s12,, ……,, sk

    2))

    Fre

    quen

    cy

    0.0 0.2 0.4 0.6 0.8 1.0

    050

    100

    150

    200

    250

    300

    350

    0.41

    3

    0.30

    7

    107

  • Diagnostic Plots for Checking Homoscedasticity

    One �rst diagnostic is to plot the residuals Yij − Ȳi. versus thecorresponding �tted values Ȳi. for j = 1, . . . , ni , i = 1, . . . , t.Compare the di�erence in information in the next two plots.

    The second display: ⇒ variability increases with �tted value.Often there is a relationship between variability and the mean.

    There are ways to deal with this by using variance stabilizingtransforms of the Yij .

    108

  • plot(crab$site,out.lm$residuals,col="blue",

    xlab="site",ylab="residuals",cex.lab=1.3)

    ●●

    ●●●●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●

    ●●●

    ●●●

    ●●●●●●●●●

    ●●

    ●●●

    ●●●●●●●●●●●

    ●●

    ●●●● ●●●●●●●

    ●●●

    ●●

    ●●

    ●●

    ●●●● ●●●●●●●

    ●●●●

    ●●

    ●●

    ●●

    1 2 3 4 5 6

    010

    020

    030

    040

    0

    site

    resi

    dual

    s

    109

  • plot(out.lm$fitted.values,out.lm$residuals,

    col="blue",xlab="fitted values",

    ylab="residuals",cex.lab=1.3)

    ●●

    ●●●●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●

    ●●●

    ●●●

    ●●●●●●●●●

    ●●

    ●●●

    ●●●●●●●●●●●

    ●●

    ●●●● ●●●●●●●

    ●●●

    ●●

    ●●

    ●●

    ●●●● ●●●●●●●

    ●●●●

    ●●

    ●●

    ●●

    10 20 30 40 50 60 70

    010

    020

    030

    040

    0

    fitted values

    resi

    dual

    s

    110

  • Levene's Test for Homoscedasticity

    The modi�ed Levene test looks at Xij = |Yij − Ỹi |, where Ỹidenotes the median of the i th treatment sample.

    Originally used Ȳi. in place of Ỹi , whence �modi�ed.�The idea is as follows: If the standard deviations in the tsamples Yi1, . . . ,Yini , i = 1, . . . , t are the same, then onewould expect to have roughly equal means for the Xij .

    Check this by performing an ANOVA F -test on the Xij values.

    The ANOVA F -test for means is not as sensitive to thenormality assumption as the F -test or Fmin.test forcomparing variances.

    111

  • Levene's Test for Crab Count Data

    crab.levene

  • A Multiplicative Error Model

    Variability in the crab count data seemed proportional to thecount averages.

    The variability did not show much normality.

    Some random phenomena are not so much driven by additiveaccumulation of random contributions but more so bymultiplicative accumulations.

    A crab colony could have started with a starting group size X0.

    This group produced a random number X0 · X1 of new crabs,where X1 represents the reproduction rate per crab.

    This rate is variable or random.

    The next generation would have X0 · X1 · X2 crabs, and so on.This motivates the following variation model:Y = µ× � = µ · (X1 · X2 · . . .), where the random term � hasmean µ� and standard deviation σ�.

    ⇒ var(Y ) = µ2 · var(�) and µY = E (Y ) = µ · E (�).σY is proportional to µY since both are proportional to µ.

    113

  • Variance Stabilization and Normality under log-Transform

    Multiplicative error model =⇒ σ ∝ µ.Using log(Y ) = log(µ) + log(�) breaks the link

    E (log(Y )) = log(µ) + E (log(�)) and var(log(Y )) = var(log(�))

    µ a�ects the mean E (log(Y )) but no longer var(log(Y )).

    An example of variance stabilization!

    There is further bene�t in viewing the multiplicative error term� as a product of several random contributors.

    By taking the transform log(Y ):

    V = log(Y ) = log(µ)+log(�) = log(µ)+log(X1)+log(X2)+. . .

    we can appeal to the CLT for the sum of the log(Xi ) terms.

    This justi�es a normal additive error model for V , i.e.,V = µ̃+ �̃ with �̃ ∼ N (0, σ2).Apply this to the count data ⇒ the following familiar model:

    Vij = log(Yij) = µ̃i + �̃ij with �̃iji.i.d.∼ N (0, σ2).

    114

  • The Problem of Zero Counts

    Some observed counts are zero ⇒ problem of log(0).We look at two ways of dealing with it.1. Adding a small fraction, say 1/6, to all counts.

    1/6 > 0 is somewhat arbitrary.This is a technical solution, keeping all the data.

    2. Eliminate all zero counts.

    This may be justi�ed if a zero count just means that there

    were no crabs in that transect to begin with.

    Not a matter of not seeing them since population size is small.

    This reduces the count data to 150− 33 = 117 counts.

    115

  • Box Plots for count and log(count+1/6)

    ●●

    ●●

    ●●●

    ●●

    1 2 3 4 5 6

    010

    020

    030

    040

    0co

    unt

    n1 == 25 n2 == 25 n3 == 25 n4 == 25 n5 == 25 n6 == 25

    ●●●

    ●● ●●●

    ●●

    ●●●

    1 2 3 4 5 6

    0.2

    1.0

    5.0

    20.0

    100.

    0

    n1 == 25 n2 == 25 n3 == 25 n4 == 25 n5 == 25 n6 == 25

    site

    coun

    t (lo

    g−sc

    ale)

    116

  • Normal QQ-Plots of 150 Residuals

    ● ● ● ● ● ● ● ● ●

    ●●●●●

    ●●●●●●●●●●●●●●●●●●●

    ●●●●●●

    ●●●●●●●●●●

    ●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●

    ●●●●●●●●●●●

    ●●●●●●●●●●●●●●

    ●●●●●●

    ●●●●●●●●

    ●●●●●

    ●●●●●●●●●●●●●

    ●●●

    ●●●● ●

    ● ●● ● ● ●

    ●●

    −2 −1 0 1 2

    −4

    −2

    02

    4

    Normal QQ−Plot

    Theoretical Quantiles

    Sam

    ple

    Qua

    ntile

    s

    ● ● ● ● ● ● ● ● ●

    ●●●●●

    ●●●●●●●●●●●●●●●●●●●

    ● ~ 0 counts

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    −2 −1 0 1 2

    −2

    −1

    01

    2

    Simulated Normal QQ−Plot

    Sam

    ple

    Qua

    ntile

    s

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    −2 −1 0 1 2

    −2

    −1

    01

    2

    Simulated Normal QQ−Plot

    Sam

    ple

    Qua

    ntile

    s

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    −2 −1 0 1 2

    −2

    −1

    01

    2

    Simulated Normal QQ−Plot

    Sam

    ple

    Qua

    ntile

    s

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    −2 −1 0 1 2

    −2

    −1

    01

    2

    Simulated Normal QQ−Plot

    Sam

    ple

    Qua

    ntile

    s

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    −2 −1 0 1 2

    −2

    −1

    01

    2

    Simulated Normal QQ−Plot

    Sam

    ple

    Qua

    ntile

    s

    117

  • ANOVA for log(count+1/6)

    Analysis of Variance Table

    Response: log(count + 1/6)Df Sum Sq Mean Sq F value Pr(>F)

    as.factor(site) 5 54.73 10.95 2.3226 0.04604 *Residuals 144 678.60 4.71---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

    118

  • Box Plots for count and log(count[count>0])

    ●●

    ●●

    ●●●

    ●●

    1 2 3 4 5 6

    010

    020

    030

    040

    0co

    unt

    n1 == 25 n2 == 25 n3 == 25 n4 == 25 n5 == 25 n6 == 25

    ●●

    ●●●

    ●●

    1 2 3 4 5 6

    12

    510

    5020

    0

    n1 == 20 n2 == 21 n3 == 20 n4 == 19 n5 == 21 n6 == 16

    site

    coun

    t (lo

    g−sc

    ale)

    119

  • Normal QQ-Plots of 117 Residuals

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    −2 −1 0 1 2

    −3

    −2

    −1

    01

    23

    Normal QQ−Plot

    Theoretical Quantiles

    Sam

    ple

    Qua

    ntile

    s

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    −2 −1 0 1 2

    −2

    −1

    01

    2

    Simulated Normal QQ−Plot

    Sam

    ple

    Qua

    ntile

    s

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    −2 −1 0 1 2

    −2

    −1

    01

    2

    Simulated Normal QQ−Plot

    Sam

    ple

    Qua

    ntile

    s

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    −2 −1 0 1 2

    −2

    −1

    01

    2

    Simulated Normal QQ−Plot

    Sam

    ple

    Qua

    ntile

    s●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    −2 −1 0 1 2

    −2

    −1

    01

    2

    Simulated Normal QQ−Plot

    Sam

    ple

    Qua

    ntile

    s

    ●●