View
219
Download
0
Tags:
Embed Size (px)
Citation preview
PSY 1950Post-hoc and Planned Comparisons
October 6, 2008
PreamblePresentationsTutoringProblem 1e:
If you decide to reject the null hypothesis, you know the probability that you are making the wrong decision
Visual depiction of F-ratio
Subpopulations• Cournot (1843): “...it is clear that nothing limits... the number of features according to which one can distribute [natural events or social facts] into several groups or distinct categories.”
• e.g., the chance of a male birth:– Legitimate vs. illegitimate– Birth order– Parent age– Parent profession– Parent health– Parent religion
• “… usually these attempts through which the experimenter passed don’t leave any traces; the public will only know the result that has been found worth pointing out; and as a consequence, someone unfamiliar with the attempts which have led to this result completely lacks a clear rule for deciding whether the result can or can not be attributed to chance.”
Large Surveys and Observational Studies
• Abundant data• Limited a priori hypotheses• e.g., Genome Superstruct Project (GSP)– Genetic testing– Cognitive testing– Structural brain imaging– Functional brain imaging
ANOVA• One-way ANOVA
– k(k-1)/2 possible pairwise comparisons
– e.g., with 5 levels, 10 possible comparisons
• Factorial ANOVA– The issue above plus– Multiple possible main effects/interactions
– e.g., with a 2 x 2 x 2, 7 possible effects
Families• Set of hypotheses = Family• Type I error rate for a set of hypotheses = Familywise error rate– e.g., across pairwise comparisons in one-way ANOVA• If no mean differences exist, what is the chance of finding a significant one?
– e.g., across main effects/interactions in factorial ANOVA• If no main effects or interactions exist for a particular ANOVA, what is the chance of finding a significant one
– e.g., whole experiment with multiple ANOVAs• If no effects exist for the entire experiment, what is the chance of finding a significant one?
Family Size• "If these inferences are unrelated in terms of their content or intended use (although they may be statistically dependent), then they should be treated separately and not jointly”– Hochberg and Tamhane (1987)
• e.g., suicide rates for 50 states, with 1225 possible pairwise comparisons– From a federal perspective, how big is the family?
– How about from a state perspective?
Familywise • If family consists of two independent comparisons with = .05, AND if both corresponding null hypotheses are true:– The probability of NOT making a Type I error on both tests is: .95 x .95 = .9025
– The probability of making one or more type I errors is: 1 - .9025 = .0975
• If family consists of c independent comparisons with = .05, AND if all corresponding null hypotheses are true:– The probability of NOT making a Type I error on all tests is: (1 - .05)c
– The probability of making one or more Type I errors is: 1 - (1 - .05)c
A Priori vs. Post-hoc Comparisons
• A priori comparisons– Chosen before data collection– Limited, deliberate comparisons
• Post hoc (a posteriori) comparisons– Conducted after data collection– Exhaustive, exploratory comparisons
Significance of Overall F• Prerequisite for some tests (e.g., Fisher’s LSD)
• Efficient test of overall null hypothesis
• Need MSwithin for many tests
A Priori Comparisons• Single stage tests
– Multiple t-tests– Linear contrasts– Bonferroni t (Dunn’s test)– Dunn-Sidak test
• Multistage tests– Bonferroni/Holm
Multiple t-tests• Replace s2
pooled with MSwithin
• Use dfwithin
Linear Contrasts• Compare more than one mean with another mean
Bonferroni t (Dunn’s Test)• If c independent tests are performed
corrected = / cpcorrected = p x c
• Imprecise math– e.g., for pcorrected = .05 with c = 21, pcorrected 1.05
– pcorrected = 1 - (1 - .05)c
Bonferroni, C. E. (1936). Teoria statistica delle classi e calcolo delle probabilit. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 8, 3-62.
Perneger, T.V. (1998). What is wrong with Bonferroni adjustments. BMJ,136,1236-1238.
Dunn-Sidak Test• Identical to Bonferroni, except uses correct math
• Less conservative than Bonferroni– e.g., for pcorrected = .05 with c = 10:
•pBonferroni = .50
•pSidak = .40
Multistage Bonferroni (e.g., Holm)
• Calculate t for all c contrasts of interest
• Order results based on |t||t1| > |t2| > |t3|
• Apply different Bonferroni corrections for or p based on position in above sequence, stopping when t is insignificant– For t1, c1 = 3; if p1 > .05/3, then…– For t2, c2 = 2; if p2 > .05/2, then…– For t1, c1 = 3; use = .05/1
Post-hoc Comparisons• Fisher’s LSD• Tukey’s test• Newman-Keuls test• The Ryan procedure (REGWQ)• Scheffe’s test• Dunnett’s test
Fisher’s LSD Test• LSD = Least significant difference• Two-stage process:
– Conduct ANOVA• If F is nonsignificant, stop• If F is significant…
– Make pairwise comparisons using
• Ensures familywise = .05 for complete null
• Ensures familywise = .05 for partial null when c = 3
Studenized Range Statistic (q)
• If Ml and Ms represent the largest and smallest means and r is the number of means in the set:
• Order means from smallest to largest
• Determine r, calculate q, lookup p
Tukey’s HSD Test• Determines minimum difference between treatment means that is necessary for significance
• HSD = honestly significant difference
Scheffe• Not for post-hoc pairwise comparisons
• Not for a priori comparisons• Howell: “I can’t imagine when I would ever use it, but I have to include it here because it is such a standard test”
Newman-Keuls (S-N-K) Test• Readjusts r based upon means tests
• Doesn’t control for familywise = .05
Comparing Different Procedures
Which Test?• One contrast
– Simple: t-test– Complex: linear contrast
• Several contrasts– A priori: Multistage Bonferroni (e.g., Holm)
– Post-hoc: Fisher’s LSD• Many contrasts
– Ryan REGRQ or Tukey• Find critical values for different tests– with a control: Dunnett– planned: Bonferroni– not planned: Scheffé
Imaging Data• 200,000 tests on 200,000 voxels• 1000 false positives when = .05
• Bonferroni?– No, requires voxel independence