GMS MS 700, Lecture 7-2

Embed Size (px)

Citation preview

  • 8/6/2019 GMS MS 700, Lecture 7-2

    1/38

    Hypothesis Testing

    Analysis of Variance (ANOVA)

    GMS MS 700/GMS AN 704

    Elementary Biostatistics

    March 23, 2011

  • 8/6/2019 GMS MS 700, Lecture 7-2

    2/38

    Hypothesis Testing

    continuous outcomes: z- ort-test

    one sample

    two samples

    paired samples (matched samples)

    discrete outcomes:2

    one sample (2goodness-of-fit test)

    two samples (2test of independence)

  • 8/6/2019 GMS MS 700, Lecture 7-2

    3/38

    Hypothesis Testing

    continuous outcomes: ANOVA

    more than two samples/groups

    several types of ANOVAs

    one-way (one-factor)

    extension of two-sample t-test

    randomized block (no interaction effects)

    multi-factor (possible interaction effects)

    repeated measures extension of paired-samples t-test

  • 8/6/2019 GMS MS 700, Lecture 7-2

    4/38

    One-Way ANOVA allows us to compare the means of2 ormore groups or categories (the independent variable) onone dependent variable to determine if the groups differsignificantly from one another on the DV.

    To use ANOVA, you must have a categorical (or nominal)variable that has at least two independent groups (e.g.treatment vs control, fuel 1 vs fuel 2) as the independentvariable and a continuous variable (interval or ratio) as thedependent variable.

    ANOVA is very similar to a t-test, particularly whencomparing only 2 groups. But when looking at 3 or moregroups, ANOVA is much more effective in determiningsignificant group differences.

    What is ANOVA?

  • 8/6/2019 GMS MS 700, Lecture 7-2

    5/38

    t-tests allow us to decide whether the observeddifference between two group means is large enoughnot to be due to chance (i.e., statistically significant).

    But the more ttests we run, the greater the chance ofrejecting the null hypothesis when it is true (Type 1error).

    ANOVA takes into account the number of groups beingcompared, and provides us with more certainty inconcluding significance when looking at 3 or moregroups.

    Rather than finding a simple difference between 2means as in a t-test, in ANOVA we find the averagedifference between means of multiple independentgroups using the squared value of the differencebetween the means.

    t-Tests vs. ANOVA

  • 8/6/2019 GMS MS 700, Lecture 7-2

    6/38

    H0: There is no difference in MPG between fuels.

    HA: There is a difference in MPG between fuels.

    (What is the IV? What is the DV?)Data Set 1

    Fuel 1 Fuel 2 Fuel 3

    40 50 5644 54 56

    42 52 54

    44 52 5840 52 56

    M1 = 42 M2= 52 M3 = 56

    Grand M= 50

    Data Set 2

    Fuel 1 Fuel 2 Fuel 3

    36 54 3448 40 74

    34 58 58

    44 62 42

    48 46 72

    M1 = 42 M2= 52 M3 = 56

    Grand M= 50

  • 8/6/2019 GMS MS 700, Lecture 7-2

    7/38

    One-Way (One-Factor) ANOVA (one IV):An Intuitive Decomposition of Sum of Squares/Variance

    Variance: the near average of the squared differences ofa set of observations around its mean

    One-Way ANOVA: Compare the between-group (between-factor) variance to the within-group (within-factor) variance

    In case of ANOVA, variance is referred to as the meansquare

    Fstatistic is determined by the ratio of these two variances

    1

    )( 22

    7!

    n

    XXs

  • 8/6/2019 GMS MS 700, Lecture 7-2

    8/38

    Hypothesis Testing for More than 2 Means:

    ANOVA

    Continuous outcome

    k Independent Samples, k > 2

    H0: Q!Q2!Q !Qk

    H1: Means are not all equalTest Statistic

    Find critical value in Table 4 Fdistribution

    df = (k -1), (N k)

    k)/(N)X(X

    1)/(k)XX(nF

    2j

    2

    jj

    !

  • 8/6/2019 GMS MS 700, Lecture 7-2

    9/38

    An Intuitive Decomposition of Sum of Squares

    Data Set 1: Decision Rule

    SSTOTAL = SSBETWEEN + SSWITHINFuel 1 Fuel 2 Fuel 3

    40 50 56

    44 54 56

    42 52 5444 52 58

    40 52 56

    M1 = 42 M2= 52 M3 = 56

    GrandM

    =50

    k 1 = 3 1 = 2; N k = 15 3 = 12

    F(2, 12) = 3.89 (E = .05; Table 4)

    Data Set 1

    k)/()X(X)/(k)XX(

    !

  • 8/6/2019 GMS MS 700, Lecture 7-2

    10/38

    An Intuitive Decomposition of Sum of Squares

    Data Set 1

    SSTOTAL = SSBETWEEN + SSWITHIN

    Fuel 1 Fuel 2 Fuel 3

    40 50 56

    44 54 56

    42 52 54

    44 52 58

    40 52 56

    M1 = 42 M2= 52 M3 = 56

    Grand M= 50

    SST = (40 - 50)2 + (44 - 50)2 + + (58 - 50)2 + (56 - 50)2

    = 552 units of variation

    Data Set 1

  • 8/6/2019 GMS MS 700, Lecture 7-2

    11/38

    An Intuitive Decomposition of Sum of Squares:

    Data Set 1

    SSTOTAL = SSBETWEEN + SSWITHIN

    Fuel 1 Fuel 2 Fuel 3

    40 50 56

    44 54 56

    42 52 54

    44 52 58

    40 52 56

    M1 = 42 M2= 52 M3 = 56

    Grand M= 50

    SSB = 5 [(42 - 50)2 + (52 - 50)2 + (56 - 50)2]

    = 5 [ 64 + 4 + 36]

    = 520 units of variation

    Data Set 1

  • 8/6/2019 GMS MS 700, Lecture 7-2

    12/38

    An Intuitive Decomposition of Sum of Squares

    Data Set 1SS

    TOTAL =SS

    BETWEEN+ SS

    WITHIN

    Fuel 1 Fuel 2 Fuel 3

    40 50 56

    44 54 56

    42 52 5444 52 58

    40 52 56

    M1 = 42 M2= 52 M3 = 56

    Grand M= 50

    SSW1 = (40 - 42)2 + + (40 - 42)2 = 16 for Fuel 1

    SSW2 = (50 - 52)2 + + (52 - 52)2 = 8 for Fuel 2

    SSW3 = (40 - 56)2 + + (40 - 56)2 = 8 for Fuel 3

    = 32 units of variation

    DataSe

    t1

  • 8/6/2019 GMS MS 700, Lecture 7-2

    13/38

    An Intuitive Decomposition of Sum of Squares

    Data Set 1: Conclusion

    Sources of

    Variation

    Sum of

    Squares

    df Mean

    Square

    F p

    Between Groups 520 2 260 97.5 .000

    Within Groups/Error 32 12 2.67

    Total 552 14

    Reject H0 because F= 97.5 > F= 3. 89 (E = .05).

    Conclude that there is a significant difference between fuels in

    MPG.

  • 8/6/2019 GMS MS 700, Lecture 7-2

    14/38

    SSTOTAL =

    SSBETWEEN

    + SSWITHIN

    Fuel 1 Fuel 2 Fuel 3

    36 54 34

    48 40 74

    34 58 58

    44 62 42

    48 46 72

    M1 = 42 M2= 52 M3 = 56

    Grand M= 50

    Data Set 2

    An Intuitive Decomposition of Sum of SquaresData Set 2: Decision Rule

    k 1 = 3 1 = 2; N k = 15 3 = 12

    F(2, 12) = 3.89 (E = .05; Table 4)

  • 8/6/2019 GMS MS 700, Lecture 7-2

    15/38

    SSTOTAL = SSBETWEEN + SSWITHIN

    Fuel 1 Fuel 2 Fuel 3

    36 54 34

    48 40 7434 58 58

    44 62 42

    48 46 72

    M1 = 42 M2= 52 M3 = 56

    Grand M= 50

    SST = (36 - 50)2 + (48 - 50)2 + + (42 - 50)2 + (72 - 50)2

    = 2280 units of variation

    Data Set 2

    An Intuitive Decomposition of Sum of SquaresData Set 2

  • 8/6/2019 GMS MS 700, Lecture 7-2

    16/38

    An Intuitive Decomposition of Sum of Squares

    Data Set 2

    SSB = 5 [(42 - 50)2 + (52 - 50)2 + (56 - 50)2]

    = 5 [ 64 + 4 + 36]

    = 520 units of variation (NOTE: Same as for Data Set 1)

    Data Set 2

    Fuel 1 Fuel 2 Fuel 3

    36 54 34

    48 40 74

    34 58 58

    44 62 42

    48 46 72

    M1 = 42 M2= 52 M3 = 56

    Grand M= 50

    SSTOTAL = SSBETWEEN + SSWITHIN

  • 8/6/2019 GMS MS 700, Lecture 7-2

    17/38

    An Intuitive Decomposition of Sum of Squares

    Data Set 2

    SSTOTAL =

    SSBETWEEN

    + SSWITHIN

    SSW1 = (36 - 42)2 + + (48 - 42)2 = 176 for Fuel 1

    SSW2 = (54 - 52)2 + + (46 - 52)2 = 320 for Fuel 2

    SSW3 = (34 - 56)2 + + (72 - 56)2 = 1264 for Fuel 3

    = 1760 units of variation

    Data Set 2

    Fuel 1 Fuel 2 Fuel 3

    36 54 34

    48 40 74

    34 58 5844 62 42

    48 46 72

    M1 = 42 M2= 52 M3 = 56

    Grand M= 50

  • 8/6/2019 GMS MS 700, Lecture 7-2

    18/38

    An Intuitive Decomposition of Sum of Squares

    Data Set 2: Conclusion

    Sources ofVariation

    Sum ofSquares

    df MeanSquare

    F p

    Between Groups 520 2 260 1.77 .212

    Within Groups/Error 1760 12 146.7

    Total 2280 14

    Accept H0 because F= 1.77 < F= 3. 89 (E = .05).

    Conclude that there is not a significant difference between fuels

    in MPG.

  • 8/6/2019 GMS MS 700, Lecture 7-2

    19/38

    One-Way (One-Factor) ANOVA:

    An Intuitive Decomposition of Sum of Squares/Variance

    Between-Group

    Variance

    Within-Group

    Variance

    Likely

    Statistical Outcome

    small small hard to say.

    small large factor has little or no

    effect. accept HO.

    large small factor has a large

    effect. reject HO.

    large large hard to say.

  • 8/6/2019 GMS MS 700, Lecture 7-2

    20/38

    Post-Hoc Tukey HSD Test between Means

    xsHSDTukey

    21

    ! 73.5

    67.2!!!

    g

    e

    x n

    MS

    s

    where ng = the number of cases in each group

    Tukey1-2 = (42 - 52)/.73 = 13.7 p < .01

    Tukey1-3 = (42 - 56)/.73 = 19.2 p < .01

    Tukey2-3 = (52 - 56)/.73 = 5.48 p < .01

    Critical value of Tukey statistic (seeTable D) is based on number of

    groups/factors (3 here) and the df of the error term (12 here) 3.77 for

    = .05 and 5.05 for = .01

    Each of the 3 means are significantly different from each other at .01 level of

    significance mileage for Fuel 3 > mileage for Fuel 2 > mileage for Fuel 1

  • 8/6/2019 GMS MS 700, Lecture 7-2

    21/38

    SPSS Input for Data Set 1

    Fuel Mileage

    1 40

    1 44

    1 42

    1 44

    1 40

    2 502 54

    2 52

    2 52

    2 52

    3 56

    3 56

    3 54

    3 58

    3 56

  • 8/6/2019 GMS MS 700, Lecture 7-2

    22/38

    SPSS Output for Data Set 1

    Test of Homogeneity of Variances

    Mileage

    Levene Statistic df1 df2 Sig.

    1.000 2 12 .397

    ANOVA

    M

    Sum of Squares df Mean Square F Sig.

    Between Groups 520.000 2 260.000 97.500 .000Within Groups 32.000 12 2.667

    Total 552.000 14

    Tests the H0 that the error

    variance of the dependent

    variable is equal across

    groups.

  • 8/6/2019 GMS MS 700, Lecture 7-2

    23/38

  • 8/6/2019 GMS MS 700, Lecture 7-2

    24/38

    An Intuitive Decomposition of SS: Practice

    Decision Rule

    Data Set 3

    Fuel 1 Fuel 2 Fuel 320 25 28

    22 27 28

    21 26 27

    22 26 29

    20 26 28

    M1 = 21 M2= 26 M3 = 28

    Grand M= 25

  • 8/6/2019 GMS MS 700, Lecture 7-2

    25/38

    An Intuitive Decomposition of SS: Practice

    Between-Groups Variance

    Data Set 3

    Fuel 1 Fuel 2 Fuel 320 25 28

    22 27 28

    21 26 27

    22 26 29

    20 26 28

    M1 = 21 M2= 26 M3 = 28

    Grand M= 25

  • 8/6/2019 GMS MS 700, Lecture 7-2

    26/38

    An Intuitive Decomposition of SS: Practice

    Within-Groups Variance

    Data Set 3

    Fuel 1 Fuel 2 Fuel 320 25 28

    22 27 28

    21 26 27

    22 26 29

    20 26 28

    M1 = 21 M2= 26 M3 = 28

    Grand M= 25

  • 8/6/2019 GMS MS 700, Lecture 7-2

    27/38

    An Intuitive Decomposition of SS: Practice

    Data Set 3

    Fuel 1 Fuel 2 Fuel 3

    20 25 28

    22 27 28

    21 26 27

    22 26 29

    20 26 28

    M1 = 21 M2= 26 M3 = 28

    Grand M= 25

    Sources of Variation Sum of

    Squares

    df Mean

    Square

    F p

    Between Groups

    Within Groups/Error

    Total

  • 8/6/2019 GMS MS 700, Lecture 7-2

    28/38

    One-Way (One-Factor) ANOVA:

    Fishers Randomized Block Design

    In some cases, an extraneous factoris a systematic sourceof variance that increases the error term

    The goal of a randomized block design is to block theextraneous source of variance and to remove it from the errorterm, thus increasing the between-groups F value

    in effect, the randomized block design removes unexplainedvariance from the error term by associating it with anextraneous factor that is affecting the results

    Fisher (from whom we get ourFvalue) developed the blockdesign to account forextraneous variance in crop yieldassociated with farm location (e.g., northern vs. central vs.southern locales in England) in order to test whether therewere real differences in his main experimental factor, fertilizer

    type

  • 8/6/2019 GMS MS 700, Lecture 7-2

    29/38

    One-Factor Randomized Block Design

    SSTOTAL = SSBETWEEN + SSWITHIN

    Fertilizer 1 Fer tilizer 2

    38 50

    42 52

    29 3832 41

    18 27

    22 28

    M1 = 30.17 M2= 39.33

    Grand M= 34.75

    SST = (38 34.75)2 + (42 34.75)2 + + (27 34.75)2 + (28 34.75)2

    = 1232.25 units of variation

    Data Set Unblocked

  • 8/6/2019 GMS MS 700, Lecture 7-2

    30/38

    One-Factor Randomized Block Design

    SSTOTAL = SSBETWEEN + SSWITHIN

    Fertilizer 1 Fer tilizer 2

    38 50

    42 52

    29 3832 41

    18 27

    22 28

    M1 = 30.17 M2= 39.33

    Grand M= 34.75

    Data Set Unblocked

    SSB = 6 [(30.17 34.75)2 + (39.33 - 34.75)2]

    = 252.1 units of variation

  • 8/6/2019 GMS MS 700, Lecture 7-2

    31/38

    One-Factor Randomized Block Design

    SSTOTAL = SSBETWEEN + SSWITHIN

    Fertilizer 1 Fer tilizer 2

    38 50

    42 52

    29 3832 41

    18 27

    22 28

    M1 = 30.17 M2= 39.33

    Grand M= 34.75

    Data Set Unblocked

    SSW1 = (38 30.17)2 + + (22 - 30.17)2 for Fertilizer 1

    SSW2 = (50 39.33)2 + + (28 - 39.33)2 for Fertilizer 2

    = 980.17 units of variation

  • 8/6/2019 GMS MS 700, Lecture 7-2

    32/38

    One-Factor Randomized Block Design

    Sources of Variation Sum of

    Squares

    df Mean

    Square

    F p

    Between Groups 252.1 1 252.1 2.57 .140

    Within Groups/Error 980.2 10 98.02

    Total 1232.3 11 112.03

    Fertilizer 1 Fer tilizer 2

    38 50

    42 5229 38

    32 41

    18 27

    22 28

    M1 = 30.17 M2= 39.33Grand M= 34.75

    Data Set Unblocked

  • 8/6/2019 GMS MS 700, Lecture 7-2

    33/38

    One-Factor Randomized Block Design

    SSTOTAL = SSBETWEEN + SSBLOCK + SSWITHIN

    Blocked

    Variable

    Fertilizer 1 Fer tilizer 2 Sector Mean

    Northern Sector 38 50 MN

    = 45.5

    42 52Central Sector 29 38 M

    C= 35

    32 41

    Southern Sector 18 27 MS

    = 23.75

    22 28M1

    = 30.17 M2= 39.33 Grand M= 34.75

    SST = (38 34.75)2 + (42 34.75)2 + + (27 34.75)2 + (28 34.75)2

    = 1232.25 units of variation

    Data SetBlocked

  • 8/6/2019 GMS MS 700, Lecture 7-2

    34/38

    One-Factor Randomized Block Design

    SSTOTAL = SSBETWEEN + SSBLOCK + SSWITHIN

    Blocked

    variable

    Fertilizer 1 Fer tilizer 2 Sector Mean

    Northern Sector 38 50 MN

    = 45.5

    42 52

    Central Sector 29 38 MC

    = 35

    32 41

    Southern Sector 18 27 MS

    = 23.75

    22 28

    M1

    = 30.17 M2= 39.33 Grand M= 34.75

    Data SetBlocked

    SSB = 6 [(30.17 34.75)2 + (39.33 - 34.75)2]

    = 252.1 units of variation (NOTE: Same as forUnblocked Data Set)

  • 8/6/2019 GMS MS 700, Lecture 7-2

    35/38

    One-Factor Randomized Block Design

    SSTOTAL = SSBETWEEN + SSBLOCK + SSWITHIN

    Blocked

    variable

    Fertilizer 1 Fer tilizer 2 Sector Mean

    Northern Sector 38 50 MN

    = 45.5

    42 52

    Central Sector 29 38 MC

    = 35

    32 41

    Southern Sector 18 27 MS

    = 23.75

    22 28

    M1

    = 30.17 M2= 39.33 Grand M= 34.75

    Data SetBlocked

    SSBL = 4 [(45.5 34.75)2 + (35 - 34.75)2 + (23.75 - 34.75)2]

    = 946.5 units of variation

  • 8/6/2019 GMS MS 700, Lecture 7-2

    36/38

    One-Factor Randomized Block Design

    Sources of Variation Sum of

    Squares

    df Mean

    Square

    F p

    Blocked/Extraneous Factor 946.5 2 473.25 112.4 .000

    Between Groups 252.1 1 252.1 59.9 .000

    Within Groups/Error* 33.7 8 4.21

    Total 1232.3 11 112.03

    Blocked

    variable

    Fertilizer 1 Fer tilizer 2 Sector Mean

    Northern Sector 38 50 MN= 45.542 52

    Central Sector 29 38 MC

    = 35

    32 41

    Southern Sector 18 27 MS

    = 23.75

    22 28

    M1

    = 30.17 M2= 39.33 Grand M= 34.75

    *Was 980.2 Unblocked. 980.2 946.5 = 33.7

  • 8/6/2019 GMS MS 700, Lecture 7-2

    37/38

  • 8/6/2019 GMS MS 700, Lecture 7-2

    38/38

    SPSS Input for Blocked Data Set

    Fertilizer Plot Bushels

    1 1 381 1 42

    1 2 29

    1 2 32

    1 3 18

    1 3 22

    2 1 50

    2 1 52

    2 2 38

    2 2 41

    2 3 27

    2 3 28