Introduction to Introduction to Biostatistics/Hypothesis Biostatistics/Hypothesis
TestingTesting
Brian Healy, PhDBrian Healy, PhD
Course objectivesCourse objectives
Introduction to concepts of Introduction to concepts of biostatisticsbiostatistics– Type of dataType of data– Hypothesis testingHypothesis testing– p-valuep-value– Choosing the best statistical testChoosing the best statistical test– Study designStudy design– When you should get helpWhen you should get help
Statistical thinking, not math proofsStatistical thinking, not math proofs
Office hourOffice hour
Tuesday 9-11 in Room 2.140 of the Tuesday 9-11 in Room 2.140 of the Simches buildingSimches building
If you plan to come, please email me If you plan to come, please email me (([email protected]@partners.org) with a brief ) with a brief description of your data so that I can description of your data so that I can prepareprepare
Beyond the scopeBeyond the scope
Tutorial for a specific statistical Tutorial for a specific statistical packagepackage– I will show output from some packages I will show output from some packages
(STATA, SAS, GraphPad)(STATA, SAS, GraphPad) Topics that will be mentioned, but Topics that will be mentioned, but
not focused onnot focused on– Mixed modelsMixed models– Principal components analysisPrincipal components analysis– ROC curvesROC curves
Class objectivesClass objectives
Introduction to biostatisticsIntroduction to biostatistics– Stages of a research studyStages of a research study– Types of dataTypes of data– Hypothesis testHypothesis test– t-test t-test – Wilcoxon testWilcoxon test
Questions and requests for next timeQuestions and requests for next time
Research studyResearch study
I.I. Study designStudy design• Experimental question- What are you trying to Experimental question- What are you trying to
learn? How will you prove this?learn? How will you prove this?• Sample selection- Who are you going to study?Sample selection- Who are you going to study?
II.II. Data collectionData collection• What should be collected?What should be collected?
III.III. Analysis of dataAnalysis of data• Results- Was there any effect?Results- Was there any effect?• Conclusions- What does this all mean? To Conclusions- What does this all mean? To
whom do results apply?whom do results apply?
How is statistics related to each How is statistics related to each stage?stage?
I.I. Study designStudy design• Experimental question- Define Experimental question- Define outcomeoutcome, ,
sources of variability, unitsources of variability, unit and and analysis plananalysis plan• Sample selection- Sample size, type of sampleSample selection- Sample size, type of sample
II.II. Data collectionData collection• What to collect?What to collect?
III.III. Analysis of dataAnalysis of data• Results- Results- Hypothesis testHypothesis test• Conclusion- Significance of effect/generalizabilityConclusion- Significance of effect/generalizability
Experimental question: What? How?
Sample selection: Who? How many?
Collect Data
Analysis: Is there an effect?
Conclusion: To whom?
ExampleExample
Multiple sclerosis is a progressive Multiple sclerosis is a progressive neurological disorder neurological disorder
We would like to find treatments that help We would like to find treatments that help patientspatients
Unfortunately, it is very difficult to Unfortunately, it is very difficult to determine a patient’s disease course determine a patient’s disease course because there are many things going onbecause there are many things going on
How do we measure the change in the How do we measure the change in the disease?disease?
What is the outcome?What is the outcome?
Outcome variablesOutcome variables
An An outcome variableoutcome variable is is dependent dependent variablevariable of interest of interest
The common outcome variables in MS The common outcome variables in MS experiments are:experiments are:– Expanded disability status scale (EDSS)-Expanded disability status scale (EDSS)-
ordinal measure of disease severityordinal measure of disease severity– Presence/absence of disease progressionPresence/absence of disease progression– Expression a cytokine of interest (ex. IFN-Expression a cytokine of interest (ex. IFN-
))– Time to next relapseTime to next relapse
Types of variablesTypes of variables
Continuous variable: Age, expression levelContinuous variable: Age, expression level Dichotomous variable: Dead/alive, Wild Dichotomous variable: Dead/alive, Wild
type/mutanttype/mutant Categorical variable: Race, nominal scalesCategorical variable: Race, nominal scales Ordinal variable: Mild/Moderate/Severe, Ordinal variable: Mild/Moderate/Severe,
level of stat knowledgelevel of stat knowledge Count outcomes: Number of lesionsCount outcomes: Number of lesions Time to event outcome: Time to deathTime to event outcome: Time to death
Continuous variablesContinuous variables
Summary Summary statisticsstatistics– LocationLocation
MeanMean MedianMedian
– VariabilityVariability Standard Standard
deviationdeviation
GraphsGraphs
Dichotomous variablesDichotomous variables
Summary statisticsSummary statistics– TableTable– ProportionProportion
GraphGraph
MalMalee
FemalFemalee
NumbeNumberr
2020 3030
PercenPercentt
4040 6060Categorical variablesCategorical variables
Summary statistics: Summary statistics: – TableTable– ProportionProportion
GraphsGraphs
Provider of mental health
Medical professional
Mental health
professional
Other
Ordinal variableOrdinal variable
Summary statisticsSummary statistics– Mean- may be Mean- may be
appropriate for appropriate for scales or scales or questionnaires questionnaires
– Ordered table- Ordered table- appropriate for appropriate for ordered categories ordered categories with uncertain with uncertain difference in difference in magnitudemagnitude
– RankRank
MildMild ModeratModeratee
SevereSevere
NumbeNumberr
1414 1515 44
Time to eventTime to event
Survival Survival timetime– MedianMedian
GraphGraph– Kaplan-Kaplan-
Meier Meier curvecurve
Description vs. comparisonDescription vs. comparison
In many instances, description of the In many instances, description of the outcome variable is the focusoutcome variable is the focus– Estimate and confidence intervalEstimate and confidence interval
Based on results from survey, Based on results from survey, description is not enough, rather description is not enough, rather comparison is of interestcomparison is of interest
What do we need for comparison?What do we need for comparison?– Second variable-usually called Second variable-usually called
explanatory variableexplanatory variable
Explanatory variablesExplanatory variables
Explanatory variablesExplanatory variables are the are the independent variablesindependent variables that we that we believe affect the outcome variables believe affect the outcome variables in some wayin some way
In MS clinical studies, this can beIn MS clinical studies, this can be– Presence of diseasePresence of disease– Intervention/treatment (clinical trial)Intervention/treatment (clinical trial)– GenotypeGenotype– Expression of another cytokineExpression of another cytokine– TimeTime
Types of analysis-independent Types of analysis-independent samplessamples
OutcomeOutcome ExplanatoryExplanatory AnalysisAnalysis
ContinuousContinuous DichotomousDichotomous t-test, t-test, Wilcoxon Wilcoxon testtest
ContinuousContinuous CategoricalCategorical ANOVA, linear ANOVA, linear regressionregression
ContinuousContinuous ContinuousContinuous Correlation, Correlation, linear regressionlinear regression
DichotomousDichotomous DichotomousDichotomous Chi-square test, Chi-square test, logistic logistic regressionregression
DichotomousDichotomous ContinuousContinuous Logistic Logistic regressionregression
Time to eventTime to event DichotomousDichotomous Log-rank testLog-rank test
Comparison of two groupsComparison of two groups Question: Is the expression of CD-26 Question: Is the expression of CD-26
different in relapsing MS patients different in relapsing MS patients compared to progressive MS patients? compared to progressive MS patients?
What is the outcome?What is the outcome?– We measure CD-26 using flow cytometryWe measure CD-26 using flow cytometry– Continuous variableContinuous variable
What is the explanatory variable?What is the explanatory variable?– Group membership (relapsing vs. progressive)Group membership (relapsing vs. progressive)– Dichotomous variableDichotomous variable
How would you answer this question?How would you answer this question?– Collect a sample from each groupCollect a sample from each group
ResultsResults
Mean values:Mean values:– Relapsing patients=34.6Relapsing patients=34.6– Progressive patients=41.8Progressive patients=41.8
The progressive patients had greater The progressive patients had greater production, but are we certain that production, but are we certain that there is a difference between these?there is a difference between these?– Statistically significantStatistically significant– Clinically meaningfulClinically meaningful
What is the variability in the data?What is the variability in the data?
Means in Means in two groups two groups are the are the same in both same in both experimentsexperiments
Is there a Is there a difference in difference in Experiment Experiment 1? 1?
In In Experiment Experiment 2?2?
Hypothesis Hypothesis testtest
Experiment 1
Experiment 2
Reasons for differences Reasons for differences between groupsbetween groups
Actual effect-when there is a difference Actual effect-when there is a difference between the two groupsbetween the two groups
ChanceChance BiasBias ConfoundingConfounding Statistical tests are designed to Statistical tests are designed to
determine if the observed difference determine if the observed difference between the groups was likely due to between the groups was likely due to chancechance
Chance experimentChance experiment
Experiment: I flip a coinExperiment: I flip a coin– If heads, I win $1If heads, I win $1– If tails, you win $1If tails, you win $1
What if the following happened?What if the following happened?– 2 heads in a row2 heads in a row– 5 heads in a row5 heads in a row– 15 heads in a row15 heads in a row
Are you suspicious?Are you suspicious?
Null hypothesisNull hypothesis
In all experiments, we have an initial beliefIn all experiments, we have an initial belief– In coin example, you believed that there was a In coin example, you believed that there was a
50/50 chance of heads50/50 chance of heads We always set up our null hypothesis so We always set up our null hypothesis so
that we can reject the null hypothesis. that we can reject the null hypothesis. For our study, the null hypothesis is that For our study, the null hypothesis is that
the mean in the relapsing MS patients is the mean in the relapsing MS patients is the same as the mean in the progressive the same as the mean in the progressive MS patients.MS patients.
What is rare enough?What is rare enough? This curve is the This curve is the
distribution of the distribution of the statistic under the statistic under the null hypothesisnull hypothesis
If the observed If the observed value is value is sufficiently rare sufficiently rare under the null, we under the null, we reject the null reject the null hypothesishypothesis
0.05 corresponds 0.05 corresponds to a 1 out of 20 to a 1 out of 20 chancechance
0.05
0.05
P-valueP-value
Definition:Definition: the probability of the the probability of the observed result or something more observed result or something more extreme under the null hypothesisextreme under the null hypothesis
If the probability of the event is If the probability of the event is sufficiently small, we say that the sufficiently small, we say that the difference is likely not due simply to difference is likely not due simply to chance and we have an actual effect.chance and we have an actual effect.
If p-value is small enough, we call the If p-value is small enough, we call the effect effect statistically significantstatistically significant
What if p>0.05?What if p>0.05?
In this case, the difference between the In this case, the difference between the groups is not statistically significant (at the groups is not statistically significant (at the 0.05 level).0.05 level).
““If two values are not significantly different, If two values are not significantly different, then by definition are they not identical?”then by definition are they not identical?”– NoNo– The two groups are not significantly different, but The two groups are not significantly different, but
we cannot say that they are the samewe cannot say that they are the same– We fail to reject the null hypothesis; we do not We fail to reject the null hypothesis; we do not
accept that the null is trueaccept that the null is true– Bayesian statisticsBayesian statistics
BiasBias
Is there Is there something something in my in my design that design that led to my led to my result?result?
Steps for hypothesis testingSteps for hypothesis testing
1)1) State null hypothesisState null hypothesis2)2) State type of data for explanatory and State type of data for explanatory and
outcome variableoutcome variable3)3) Determine appropriate statistical testDetermine appropriate statistical test4)4) State summary statistics if possibleState summary statistics if possible5)5) Calculate p-value (stat package)Calculate p-value (stat package)6)6) Decide whether to reject or not reject the Decide whether to reject or not reject the
null hypothesisnull hypothesis• NEVER accept nullNEVER accept null
7)7) Write conclusionWrite conclusion
ExampleExample
1)1) HH00: mean: meanrelapsingrelapsing =mean =meanprogressiveprogressive
2)2) Explanatory: group membership- Explanatory: group membership- dichotomousdichotomous
Outcome: cytokine production-Outcome: cytokine production-continuouscontinuous
• What test can we use to compare a What test can we use to compare a continuous outcome with a continuous outcome with a dichotomous explanatory variable?dichotomous explanatory variable?
Two sample t-testTwo sample t-test
A two sample t-test is a test for A two sample t-test is a test for differences in means in two samples.differences in means in two samples.
Assumption: Underlying population Assumption: Underlying population distribution is normaldistribution is normal
The method of calculating the p-value The method of calculating the p-value is beyond the scope of this class, but it is beyond the scope of this class, but it is easily found on-lineis easily found on-line
Can get p-value from statistical Can get p-value from statistical packagepackage
ResultsResults
4)4) meanmeanrelapsingrelapsing =34.6, mean =34.6, meanprogressiveprogressive=41.8=41.85)5) Calculate p-value:Calculate p-value:
Two Sample t-testTwo Sample t-testt = -1.19, df = 22.8, p-value = t = -1.19, df = 22.8, p-value = 0.250.2595 percent confidence interval: (-5.3, 95 percent confidence interval: (-5.3, 19.7) 19.7)
6)6) Fail to reject the null hypothesis because Fail to reject the null hypothesis because p-value is less than 0.05p-value is less than 0.05
7)7) Conclusion: The difference between the Conclusion: The difference between the groups is not statistically significant.groups is not statistically significant.
Significant Significant difference in difference in experiment 1experiment 1
Added Added variance in variance in experiment 2 experiment 2 led to non-led to non-significant significant resultresult
What does What does this mean?this mean?
Experiment 1
Experiment 2
p<0.0001
p=0.25
Types of analysis-independent Types of analysis-independent samplessamples
OutcomeOutcome ExplanatoryExplanatory AnalysisAnalysis
ContinuousContinuous DichotomousDichotomous t-testt-test, Wilcoxon , Wilcoxon testtest
ContinuousContinuous CategoricalCategorical ANOVA, linear ANOVA, linear regressionregression
ContinuousContinuous ContinuousContinuous Correlation, Correlation, linear regressionlinear regression
DichotomousDichotomous DichotomousDichotomous Chi-square test, Chi-square test, logistic logistic regressionregression
DichotomousDichotomous ContinuousContinuous Logistic Logistic regressionregression
Time to eventTime to event DichotomousDichotomous Log-rank testLog-rank test
ExampleExample
Experimental Autoimmune Experimental Autoimmune Encephalomyelitis (EAE) in mice is the Encephalomyelitis (EAE) in mice is the animal model for multiple sclerosis (MS)animal model for multiple sclerosis (MS)
The effect of various interventions are first The effect of various interventions are first tested in micetested in mice
A common hypothesis is that treating mice A common hypothesis is that treating mice with a specific intervention will either with a specific intervention will either inhibit or promote the diseaseinhibit or promote the disease
How do we measure the change in the How do we measure the change in the disease?disease?
What is the outcome?What is the outcome?
Monkey wrenchMonkey wrench
What if What if underlying underlying data is not data is not normal?normal?
An outcome in An outcome in an EAE study an EAE study is the disease is the disease grade, which grade, which is an ordinal is an ordinal scalescale
Disease severity scores
01234567
0 1 2 3 4
Score
Fre
qu
en
cy
KO
Wild-type
Wilcoxon rank sum testWilcoxon rank sum test
Wilcoxon rank sum test is a Wilcoxon rank sum test is a nonparametric test that allows group nonparametric test that allows group comparison ifcomparison if– Ordinal dataOrdinal data– Rank dataRank data– Underlying data are non-normalUnderlying data are non-normal– OutliersOutliers
Steps for hypothesis test using a Steps for hypothesis test using a Wilcoxon test are exactly the sameWilcoxon test are exactly the same
Hypothesis testHypothesis test
1)1) HH00: median: medianKOKO =median =medianWild typeWild type 2)2) Predictor: dichotomousPredictor: dichotomous
Outcome: ordinalOutcome: ordinal3)3) Test: Wilcoxon rank sum testTest: Wilcoxon rank sum test
4)4) MedianMedianKOKO=1; Median=1; MedianWild typeWild type=2=25)5) Calculate p-value: p = 0.19Calculate p-value: p = 0.196)6) Fail to reject null hypothesisFail to reject null hypothesis7)7) There is not significant evidence of a There is not significant evidence of a
difference between the two groupsdifference between the two groups
Dependent observationsDependent observations
Up to now we have assumed that Up to now we have assumed that observations are independentobservations are independent
What if we have related observations?What if we have related observations?– On and off treatment on the same subjectOn and off treatment on the same subject– Left and right eye from the same subjectLeft and right eye from the same subject– Multiple observations over timeMultiple observations over time
The big advantage of dependent The big advantage of dependent observations is the same subject is observations is the same subject is observed under multiple conditionsobserved under multiple conditions
Independent tests fail to account for Independent tests fail to account for correlationcorrelation
ExampleExample
In MS patients, the intensity of areas of In MS patients, the intensity of areas of the brain on T1-weighted MRI are of the brain on T1-weighted MRI are of interest to determine if there is damageinterest to determine if there is damage
In particular, the intensity of the In particular, the intensity of the putamen of left and right side of the putamen of left and right side of the brain was measured in 35 MS patientsbrain was measured in 35 MS patients
We believed that there would be more We believed that there would be more significant hypointensity in the left sidesignificant hypointensity in the left side
There may There may a a difference difference between between the groupsthe groups
Are we Are we interested interested just in the just in the mean at mean at each time each time point?point?
The The difference difference between between the time the time points is points is the the outcomeoutcome
Is the Is the difference difference significantlsignificantly different y different from 0?from 0?
Hypothesis testHypothesis test
1)1) HH00: mean: meanleftleft=mean=meanrightright
2)2) Paired continuous data with side as Paired continuous data with side as explanatory variableexplanatory variable
3)3) Paired t-testPaired t-test4)4) Mean difference=0.063Mean difference=0.0635)5) p-value=0.046p-value=0.0466)6) Since the p-value is less than 0.05, we Since the p-value is less than 0.05, we
can reject the null hypothesis can reject the null hypothesis 7)7) We conclude that the intensity is unequal We conclude that the intensity is unequal
in the two sides of the brainin the two sides of the brain
Types of analysis-dependent Types of analysis-dependent samplessamples
OutcomeOutcome PredictorPredictor AnalysisAnalysis
ContinuousContinuous DichotomousDichotomous Paired t-test, Paired t-test, Wilcoxon signed Wilcoxon signed rank testrank test
ContinuousContinuous CategoricalCategorical Repeated Repeated measures measures ANOVAANOVA
ContinuousContinuous ContinuousContinuous Mixed modelMixed model
DichotomousDichotomous DichotomousDichotomous McNemar’s testMcNemar’s test
DichotomousDichotomous ContinuousContinuous Repeated Repeated measures measures logistic logistic regressionregression
Other dependent samplesOther dependent samples
Continuous outcome/categorical Continuous outcome/categorical explanatory variableexplanatory variable– Subject is measured under three Subject is measured under three
conditionsconditions– Subject is measures at three time pointsSubject is measures at three time points
Each dot Each dot represents represents an an observation observation for a mouse for a mouse at each of at each of the markersthe markers
There was a There was a negative negative control in this control in this experiment experiment (Group = 0)(Group = 0)
What should we do?What should we do?
What is the hypothesis?What is the hypothesis?– Is the expression of any of the markers Is the expression of any of the markers
different than the control?different than the control? Repeated measures ANOVA/mixed Repeated measures ANOVA/mixed
modelmodel– Can proceed with normal hypothesis Can proceed with normal hypothesis
testtest Must always think about assumptions Must always think about assumptions
of modelof model– Do we have equal variance?Do we have equal variance?
Consult a statisticianConsult a statistician
Why use dependent Why use dependent samples?samples?
Sometimes it is required based on Sometimes it is required based on the studythe study
Often can increase power depending Often can increase power depending on the outcome because one major on the outcome because one major source of variability is accounted forsource of variability is accounted for– Changes over timeChanges over time
Consult a statistician if you want to Consult a statistician if you want to determine the best study designdetermine the best study design
Helpful websiteHelpful website
http://www.ats.ucla.edu/stat/stata/whhttp://www.ats.ucla.edu/stat/stata/whatstat/default.htmatstat/default.htm
Shows how to complete many of Shows how to complete many of these analyses in various statistical these analyses in various statistical packagespackages