Upload
brit
View
36
Download
0
Embed Size (px)
DESCRIPTION
Treatment comparisons. ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments measured on a continuous scale? Look at response surfaces (linear regression, polynomials) - PowerPoint PPT Presentation
Citation preview
Treatment comparisons ANOVA can determine if there are differences
among the treatments, but what is the nature of those differences?
Are the treatments measured on a continuous scale? Look at response surfaces (linear regression,
polynomials)
Is there an underlying structure to the treatments?Compare groups of treatments using orthogonal
contrasts or a limited number of preplanned mean comparison tests
Are the treatments unstructured?Use appropriate mean comparison tests
Comparison of Means Pairwise Comparisons
– Least Significant Difference (LSD)
Simultaneous Confidence Intervals– Dunnett Test (making all comparisons to a control)– Bonferroni Inequality
Other Multiple Comparisons - “Data Snooping”– Fisher’s Protected LSD (FPLSD)– Student-Newman-Keuls test (SNK)– Tukey’s honestly significant difference (HSD)– Waller and Duncan’s Bayes LSD (BLSD)– False Discovery Rate Procedure
Often misused - intended to be used only for data from experiments with unstructured treatments
Multiple Comparison Tests Fixed Range Tests – a constant value is used
for all comparisons– Application
• Hypothesis Tests• Confidence Intervals
Multiple Range Tests – values used for comparison vary across a range of means– Application
• Hypothesis Tests
Variety Trials In a breeding program, you need to examine
large numbers of selections and then narrow to the best
In the early stages, based on single plants or single rows of related plants. Seed and space are limited, so difficult to have replication
When numbers have been reduced and there is sufficient seed, you can conduct replicated yield trials and you want to be able to “pick the winner”
Least Significant Difference Calculating a t for testing the difference between
two means
– any difference for which the t > t would be declared significant
Further, is the smallest difference for which significance would be declared– therefore
– or with equal replication, where r is number of observations forming the mean
LSD t MSE r 2 /
1 2
21 2 Y Yt (Y Y ) / s
1 2
2Y Yt s
1 2
2Y YLSD t s
Do’s and Don’ts of using LSD LSD is a valid test when
– making comparisons planned in advance of seeing the data (this includes the comparison of each treatment with the control)
– Comparing adjacent ranked means
The LSD should not (unless F for treatments is significant) be used for– making all possible pairwise comparisons– making more comparisons than df for treatments
Pick the Winner A plant breeder wanted to measure resistance to
stem rust for six wheat varieties– planted 5 seeds of each variety in each of four pots– placed the 24 pots randomly on a greenhouse bench– inoculated with stem rust– measured seed yield per pot at maturity
Ranked Mean Yields (g/pot)
Mean Yield Difference
Variety Rank Yi Yi-1 - Yi
F 1 95.3
D 2 94.0 1.3
E 3 75.0 19.0
B 4 69.0 6.0
A 5 50.3 18.7
C 6 24.0 26.3
ANOVA
Source df MS F
Variety 5 2,976.44 24.80
Error 18 120.00
Compute LSD at 5% and 1%
LSD t MSE r 2 2 120 4 16.27/ 2.101 ( * ) /
LSD t MSE r 2 2.878 2 120 4 22.29/ ( * ) /
Back to the data...
Mean Yield Difference
Variety Rank Yi Yi-1 - Yi
F 1 95.3
D 2 94.0 1.3
E 3 75.0 19.0*
B 4 69.0 6.0
A 5 50.3 18.7*
C 6 24.0 26.3**
LSD=0.05 = 16.27
LSD=0.01 = 22.29
Pairwise Comparisons If you have 10 varieties and want to look at all
possible pairwise comparisons – that would be t(t-1)/2 or 10(9)/2 = 45– that’s a few more than t-1 df = 9
LSD would only allow 9 comparisons
Type I vs Type II Errors Type I error - saying something is different when it is really the
same (Paranoia)– the rate at which this type of error is made is the significance
level
Type II error - saying something is the same when it is really different (Sloth)– the probability of committing this type of error is designated – the probability that a comparison procedure will pick up a real
difference is called the power of the test and is equal to 1- Type I and Type II error rates are inversely related to each other
For a given Type I error rate, the rate of Type II error depends on– sample size– variance– true differences among means
Nobody likes to be wrong... Protection against Type I is choosing a significance level Protection against Type II is a little harder because
– it depends on the true magnitude of the difference which is unknown
– choose a test with sufficiently high power Reasons for not using LSD for more than t-1
comparisons– the chance for a Type I error increases dramatically as
the number of treatments increases– for example, with only 20 means - you could make a
type I error 95% of the time (in 95/100 experiments)
Comparisonwise vs Experimentwise Error
Comparisonwise error rate ( = C)– measures the proportion of all differences that are
expected to be declared real when they are not
Experimentwise error rate (E)– the risk of making at least one Type I error among the
set (family) of comparisons in the experiment– measures the proportion of experiments in which one
or more differences are falsely declared to be significant
– the probability of being wrong increases as the number of means being compared increases
Experimentwise error rate (E)Probability of no Type I errors = (1-C)x
where x = number of pairwise comparisons
Max x = t(t-1)/2 , where t=number of treatments
Probability of at least one Type I error
E = 1- (1-C)x
Comparisonwise error rate
C = 1- (1-E)1/x
if t = 10, Max x = 45, E = 90%
Comparisonwise vs Experimentwise Error
Fisher’s protected LSD (FPLSD) Uses comparisonwise error rate
Computed just like LSD but you don’t use it unless the F for treatments tests significant
So in our example data, any difference between means that is greater than 16.27 is declared to be significant
LSD = tα 2MSE / r
Waller-Duncan Bayes LSD (BLSD)
Do ANOVA and compute F (MST/MSE) with q and f df (corresponds to table nomenclature)
Choose error weight ratio, k– k=100 corresponds to 5% significance level– k=500 for a 1% test
Obtain t from table (A7 in Petersen) – depends on k, F, q (treatment df) and f (error df)
Compute
Any difference greater than BLSD is significant
Does not provide complete control of experimentwise Type I error
Reduces Type II error
BLSD = t 2MSE/r
Duncan’s New Multiple-Range Test Alpha varies depending on the number of means involved in
the test
Alpha 0.05 Error Degrees of Freedom 6 Error Mean Square 113.0833 Number of Means 2 3 4 5 6Critical Range 26.02 26.97 27.44 27.67 27.78 Means with the same letter are not significantly different. Duncan Grouping Mean N variety A 95.30 2 6 A A 94.00 2 4 A B A 75.00 2 5 B A B A 69.00 2 2 B B 50.30 2 1 C 22.50 2 3
Student-Newman-Keuls Test (SNK) Rank the means from high to low
Compute t-1 significant differences, SNKj , using the HSD
Compare the highest and lowest– if less than SNK, no differences are significant– if greater than SNK, compare next highest mean with
next lowest using next SNK
Uses experimentwise for the extremes and comparisonwise for adjacent
where j=1,2,..., t-1, k=2,3,...,tSNK Q MSE rj /
Using SNK with example data:
Mean YieldVariety Rank Yi
F 1 95.3
D 2 94.0
E 3 75.0
B 4 69.0
A 5 50.3
C 6 24.0
k 2 3 4 5 6
Q 2.97 3.61 4.00 4.28 4.49
SNK 16.27 19.77 21.91 23.44 24.59
5 4 3 2 1
= 15 comparisons
18 df for error
se= = SQRT(120/4) = 5.477
SNK=Q*se
Tukey’s honestly significant difference (HSD)
From a table of studentized range values, select a value of Q which depends on k (the number of means) and v (error df) (Appendix Table VII in Kuehl)
Compute HSD as
For any pair of means, if the difference is greater than HSD, it is significant
Uses an experimentwise error rate
Dunnett’s test is a special case where all treatments are compared to a control
HSD = Q MSE / r
Bonferroni InequalityE x * C
where x = number of pairwise comparisons
C = E / x
where E = maximum desired experimentwise error rate
Advantages– simple– strict control of Type I error
Disadvantage– very conservative, low power to detect differences
False Discovery Rate
False Positive Procedure
0.00
0.05
0.10
0.15
0.20
0.25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Rank
Pro
bab
ilit
y
Reject H0
Most Popular FPLSD test is widely used, and widely abused
BLSD is preferred by some because– It is a single value and therefore easy to use– Larger when F indicates that the means are homogeneous and
small when means appear to be heterogeneous
The False Discovery Rate has nice features, but is it widely accepted in the literature?
Tukey’s HSD test– widely accepted and often recommended by statisticians– may be too conservative if Type II error has more serious
consequences than Type I error