52
HST 190: Introduction to Biostatistics Lecture 1: Basic principles of statistical data analysis 1 HST 190: Intro to Biostatistics

HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntroductiontoBiostatistics

Lecture1:Basicprinciplesofstatisticaldata

analysis

1 HST190:IntrotoBiostatistics

Page 2: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Welcome!

• Statisticalreasoningistheprocessofdrawingscientificconclusionsfromdatainarational,consistentway

• Goalsforthecourse:§ developanintuitionforthekeyconceptsthatunderpinthestatisticalanalysisofdata

§ readthe“Methods”sectionofanarticle,andunderstand/critiquetheapproachtaken

§ learntoanalyzeanddrawscientificconclusionsfromyourowndata

HST190:IntrotoBiostatistics2

Page 3: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Outline

Lecture Topic(s)1 Basicprinciplesofstatistical dataanalysis

2 Principlesofprobability&Estimationofparameters

3 Two-sample comparisons,hypothesistestingandpower/samplesizecalculations

4 Clinicaltrials&Simplelinearregression

5 Multiplelinear regression

6 Methodsforbinaryoutcomes

7 Logisticregression

8 Analysis oftime-to-eventdata

9 Projectpresentations

10 Reviewbeforetheexam

HST190:IntrotoBiostatistics3

Page 4: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

CourseLogistics

HST190:IntrotoBiostatistics4

• Eightlectures§ each2-2.5hourslong

• Readingwillbeassignedpriortoeachlecture§ giventhepaceofthecourse,thisisstronglyencouraged

• Problemsetsfollowingeachlecture§ includeMatlab exercises§ dueat9amonthedayofthefollowinglecture(unlessspecifiedotherwise)

Page 5: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics5

• Duringbreaksinthemiddlewewill:§ completegroupexercises§ learnMatlab§ discusscourseprojects

• Youwillalsoworkonagroupproject andpresentresultsduringoneoftheclassmeetings

• In-classexamwilltakeplaceduringlastmeeting§ 28th August§ open-book

Page 6: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Suggestions

HST190:IntrotoBiostatistics6

• AskquestionsduringthelectureaswellasonPiazza§ takenotes!

• MaterialpresentedindifferentsequencefromRosner§ consultRosnerforadifferentapproach

• Lotsofmaterialinashorttime§ feelfreetoaskforhelp!

• Therewillbemanyformulae§ goalisnottomemorizethem

§ eventhoughwehaveaccesstosoftware,handcalculationscanhelpcultivateintuition

Page 7: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HowtoPrioritize

HST190:IntrotoBiostatistics7

• Thecourseispass/fail.

• Examisopen-book,sodon’tspendtimememorizingformulas.Learnwhenandwhytouseeachprocedure;youcanalwaysrefertoyournotestoseehow.

• Togetthemostoutofthiscourse,youshould:§ attendlectures

§ submitsolutionstoalltheproblemsets

§ participateinclassdiscussions,groupexercises,andPiazza

§ completeaproject

§ takethefinalexam

Page 8: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Resources

HST190:IntrotoBiostatistics8

• LectureNotes(Canvas->Files)§ Getbonuspointsforfindingtypos!

• IntroductiontoMatlab (Canvas->Files)

• Rosnertextbook,7thed.(required;alifelongreference)

• Piazza

• Pagano&Gavreau textbook

• SeeSyllabusforadditionalreferences.

Page 9: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Basicstepsofdataanalysis

• Tosetthestage,let’sconsidertwomotivatingquestions:1) isthereanassociationbetweentimespentintheoperatingroom

andpost-surgicaloutcomesforlungcancerresection?

2) canwedevelopanenhancedbreastcancerriskmodel?

• Thequestionshavebeenleftdeliberatelyvague!it’softenthecasethatscientificquestionsareinitiallyimpreciselyposed

• Integraltotheprocessofresearchistranslatingscienceintostatistics,andbackagain§ asyoureadpapers,itisimportanttoconsiderhowtheauthorsthoughtthroughthisprocess

HST190:IntrotoBiostatistics9

Page 10: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

• Therearemany(possiblyinfinite!)waysinwhichonecouldcharacterize‘basicsteps’butareasonableoutlinemightbe:I. Understandthecontextoftheanalysis

II. Establishthescientificgoals

III. Translatethescientificgoalsintostatisticallanguage

IV. Choosestatisticalmethodstoemploy

V. Implementationandrunningtheanalysis

VI. Interpretation

HST190:IntrotoBiostatistics10

Page 11: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics11

• Sometimes,thewayforwardisclearand,inthatsense,theprocessisprescriptive§ features/issuesthatarecommontoallanalyses

• Inmanyinstances,however,thewayforwardisn’tclear§ aspectsoftheanalysisdon’tfitinwithwhatyoucurrentlyknow

§ thesemayrelatetothescience,dataand/orstatisticsaspects

• Solutionsinclude:§ appealingtothepublishedliterature(scientificandstatistical)

§ adoptingoradaptingexistingmethods

§ developingnewmethods

• Regardless,dealingwiththeseissueswillrequiresomecreativity,andthereisseldom,ifever,one‘correct’dataanalysis§ differentdataanalysescorrespondtodifferentscientificquestions

§ whichscientificquestionis‘right’?

Page 12: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

I.UnderstandingtheContext

• Fromtheperspectiveofabiostatistician,thepurposeofdataanalysisistolearnaboutsomepopulationusinginformationinasample

• Learnaboutcovariatesintermsofassociationwithorpredictionofanoutcome§ notationally weoftenthinkintermsof𝑋 and𝑌

§ possiblywithinoracrosscertainsub-populationsdenoted,say,by𝑍

• Contextusuallyinvolvesthreethings:1) thebackgroundscience

2) thenatureoftheavailabledata

3) thepopulationofinterest,oftencalledthe‘targetpopulation’

HST190:IntrotoBiostatistics12

Page 13: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Lungcancersurgery

HST190:IntrotoBiostatistics13

Q:Isthereanassociationbetweentimespentintheoperatingroomandpost-surgicaloutcomes?

• Backgroundscience:§ longeroperatingtime–>greaterexposuretoanesthesia

§ shorteningoperatingtimemightreduceadversepost-surgicaloutcomes

o complicationsduringthehospitalstay

o recurrenceoflungcancer

o mortality

§ mayalsoleadtodecreasedcosts/increasedefficiencyo increasedcapacityfortheoperatingroom

o shorterpost-surgicalhospitalstay

Page 14: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics14

• Availabledata:§ ≈400surgeriesatBrighamandWomen’sHospital

§ performedbetween1997-2008

§ demographic,clinical,tumorandfollow-upinformation

• Targetpopulation:§ patientswhoundergoelectivesurgeryforearlystagenon-smallcelllungcancer

§ needtobeawareofdifferentsurgerysub-types

o lobectomy,segmentectomy,wedgeresection

o thorachotomy,videoassistedthoracicsurgery

§ whatdowethinkaboutthe(relatively)longtimeframe?

§ generalizabilitybeyondBWH?

Page 15: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Breastcancerrisk

HST190:IntrotoBiostatistics15

Q:Canwedevelopanenhancedbreastcancerriskmodel?

• Backgroundscience:§ the‘Gailmodel’forbreastcancerriskwasdevelopedinthelate1980s

o age,race,

o ageatmenarche,ageatbirthoffirstchild

o familyhistory,numberofpriorbiopsyexaminationsandatypicalhyperplasia

§ themodelwasvalidatedinanumberofsubsequentstudies

§ subsequentresearchidentifiedanumberofadditionalriskfactorsforbreastcancer

o breastdensity,useofhormonereplacementtherapyandbodymassindex

Page 16: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics16

• Availabledata:§ 2,392,998screeningmammogramsfromtheBreastCancerSurveillanceConsortium

o NCI-fundednationwidenetworkofmammographyregistries

§ mammogramsperformedbetween1996-2002

§ outcomesareascertainedvialinkageswithcancerregistries

• Targetpopulation:§ screeningmammogramsperformedonwomenaged35-84years

o unitofanalysisisthemammogram,notthewoman

§ whoundergoesscreening?whodoesn’t?

o howmightthisimpacttheinterpretationofthestudy?

Page 17: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Natureoftheavailabledata

HST190:IntrotoBiostatistics17

• Whatwerethedatacollectionprocedures?§ conveniencesampleorpartofadesignedstudy?

§ whatwasthesetting/timeframe?

§ observationalstudyorrandomizeddesign?

§ cross-sectional,prospective,orretrospective?

§ stratificationand/ormatching?

• Howweretheproceduresfollowed?§ anysystematicdeviationsfromthe‘ideal’datacollectionprocess?

§ maybeduetopatients?o refusaltoparticipate/respond

o inaccurateresponses

Page 18: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics18

§ maybeduetoresearchers?

o wereuniformproceduresappliedtoall(potential)participants?

o areweactuallymeasuringwhatwethinkwearemeasuring?

• Havetherebeenanyinterimdatacleaning/manipulationefforts?§ cleaningof‘strange’values

o settosomethresholdvalueortomissing

o exclusionfromthedataset

§ constructionofderivedvariables

Page 19: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Populations

HST190:IntrotoBiostatistics19

• Inpractice,the‘population’canbe§ anactual,potentiallyobservablepopulation

§ ahypothetical(sometimesinfinite)population

• Mightrefertothe‘targetpopulation’toemphasizethatthereisaspecificpopulationinmind

• Definingthetargetpopulationiscrucialinthatitprovidesthecontextthescientificquestionofinterest§ whowouldwelikeourresultstogeneralizetoo?

• Narrowvs.broaddefinitionsofthetargetpopulation§ heterogeneityvs.homogeneity

§ whatarethetrade-offs?

Page 20: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics20

• Whatcomesfirst...thedataorthepopulation?§ dependsonwhenyougetinvolved

• Ifthedatahasalreadybeencollected:§ forwhichpopulationcouldweconsiderthesampleasbeing‘representative’?

§ mayneedtofocusthedatasetbyexcludingcertainfolkso implicitlychangesthepopulationtowhichonecangeneralize

o samplesizevs.mixingofeffects

§ istherescopeforadditionaldatacollectionefforts?

• Ifthedatahasnotbeencollected:§ muchgreaterflexibilityforchoosing/definingthepopulationofinterest

Page 21: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Learningfromdata

HST190:IntrotoBiostatistics21

• Recall,thegoalistolearnabouttherelationshipsbetweenasubsetofcovariates

• Achievedbycollectingandanalyzingasamplefromthepopulation§ animportantaspectof‘context’isthatthisisindeedwhatwearedoing

o or,atleast,hopingtodo!

• Supposewecouldenumeratetheentirepopulation§ thatis,thesampleisthepopulation

• Inthiscaseobserveddatacharacterizesrelationshipscompletely

Page 22: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics22

• Notewhenwehaveacompleteenumeration,thereisnosamplingvariability§ wedon’thavetoworryaboutmakingstatementsaboutthepopulationonthebasisofinformationinthesample

§ thesampleisthepopulation

• Wedon’thavetoconsiderorquantifyuncertaintyassociatedwithonlyobservingasub-sample§ noneedforstandarderrors,confidenceintervalsorp-values

§ maybenoneedforstatisticalmethods!

• Mostofthetimewecan’tenumeratetheentirepopulation§ typically,thisisn’tlogisticallyand/orfinanciallyfeasible

• So…

Page 23: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

II.Establishthescientificgoals

• Broadlyspeakingonecanclassifyscientificgoalsas:§ descriptionorexplorationofapopulation

§ evaluationofsomehypothesis

§ predictionoffutureoutcomes

• Asingleanalysismayhaveseveralgoals§ dependsonscientificsettingandbackground

HST190:IntrotoBiostatistics23

Page 24: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Lungcancersurgery

HST190:IntrotoBiostatistics24

Q:Isthereanassociationbetweentimespentintheoperatingroomandpost-surgicaloutcomes?

• Description/exploration:§ whatisthenatureoftheassociation?

§ doestheassociationdifferacrosssurgerytypes?

• Hypothesistesting:§ apriorihypothesisamongthecollaboratorsthatshortertimesareassociatedwithbetterpost-surgicaloutcomes

Page 25: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Breastcancerrisk

HST190:IntrotoBiostatistics25

Q:Canwedevelopanenhancedbreastcancerriskmodel?

• Prediction:§ usealltheavailableinformationinthebestpossiblewaytopredicttheriskofbreastcancer

§ buildpredictionmodelsthatcatertospecificsettingswithvaryingamounts/typeofinformation?

o athome/online

o inthephysiciansoffice

• Whymightdescription/explorationandhypothesistestingbeoflessinterest?

Page 26: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Description/exploration

HST190:IntrotoBiostatistics26

• Goalistocharacterizetherelationshipsamongasetofcovariatesinthepopulationofinterest

• Animportantissueiswhetherornotthegoalistoestablishcausation§ typicallyrequiresagreaterunderstandingofthescience

• Typically,althoughnotalways,viewedashypothesisgenerating§ wehaveacooldataset,let’sseewhatwecanfind...

§ thereisafine,oftenblurrylinebetweenexplorationandhypothesistesting

o whatcamefirst...thedataorthequestion?

Page 27: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Hypothesistesting

HST190:IntrotoBiostatistics27

• Goalistomakesomeconfirmatorystatement

• Typicallyframedinthecontextofmakinga‘decision’betweentwocompetinghypotheses𝐻%:nullhypothesis

𝐻&:alternativehypothesis

• Assumethenullhypothesisholdsandlookforevidencetothecontrary

• Standardhypothesistestingreducesthepotentialdecisionsto:1. failtoreject𝐻%2. reject𝐻% (implicitlyinfavorof𝐻&)

§ decisionshouldbeaccompaniedbysomemeasureofuncertainty

Page 28: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Prediction

HST190:IntrotoBiostatistics28

• Goalistoestimatefutureoutcomesorrisk§ Typicallyframedintermsofbuildingthebestpossiblemodel

• Whatdowemeanby‘best’?§ needsomemeansofjudgingaccuracyandpenalizingpoorpredictions

§ ideallybasedonrealworldconsequenceso e.g.false-positivevs.false-negativeforbreastcancer

• Sometimesasinglebestmodelisinappropriate§ amodelmayworkwellinonepopulationandnotothers

§ inputsmaynotalwaysbeavailable(e.g.geneticinformation)

• Towhatextentdoweneedtocareaboutcausation?§ doweneedtounderstandthe‘true’underlyingmechanisms?

Page 29: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Therealworld

HST190:IntrotoBiostatistics29

• Unfortunately,thescientificgoalsarenotalwaysclearattheoutset

• Typically,itisthecasethat:§ therearemanyscientificgoalsthatareofinterest,and/or

§ thegoalcanbeinterpretedinanumberofways

• Primarilyaproblembecauseinvestigatorsneedprecisestatementstobeabletoproceed§ totranslatethescientificgoalsintostatisticalones

• Towardsrefiningstudygoals,acoupleofusefulquestionsare:1) whoistheintended(primary)audience?

2) whatwillbeactionablefromtheresults?

Page 30: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics30

• Considerthequestion:WhatisMrs.Jones’riskofbreastcancer?

• Howoneproceedsdepends,atleastinpart,onhowthisinformationwillbeused:Researchers

o determineeligibilityforarandomizedstudyofsomenovelpreventativeagent

Patientso decisionastowhetherornotsheshouldgetintouchwithherphysician

Physicianso planningforfuturescreeningschedule

Policy-makerso monitorthepublichealthburdenofbreastcancer

Page 31: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics31

• Relatedquestionsinclude:§ isinterestinallbreastcancersorsomespecifictumortype?

§ riskoverwhichtimeframe?o 1year?

o 5years?

o lifetime?

§ howmuchinformationwilltheinterested‘user’haveaccessto?

o willdetailedfamilyhistoryinformationbeavailable?

o willgeneticinformationbeavailable?

• Differentanswerstoallthesequestionsdefinedifferentscientificgoals

Page 32: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

III.Translatingscientificgoalsintostatisticalterms/tasks

• Oncethescientificgoalsare‘established’weneedtotranslatethemintothelanguageofstatistics

• Movingforwardrequires:§ preciseandcleardefinitionsofallrelevantcovariates

§ specificationofkeyrelationshipsofinterest

HST190:IntrotoBiostatistics32

Scientificgoal StatisticaltaskDescription/exploration Estimation

Hypothesistesting InferencePrediction Estimation

Page 33: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Preciselydefiningcovariates

HST190:IntrotoBiostatistics33

• Eachofthepotentialgoalsistryingtosaysomethingabouttherelationshipsamongasetofcovariates

• Priortoanyanalysisweneedcleardefinitionsforallrelevantcovariates:§ responsevariables

§ exposure(s)ofinterest

§ interactiontermsand/oreffectmodifiers

§ predictorsoftheresponse

§ predictorsoftheexposure(s)ofinterest

• Therewillbeoverlapacrossthesevarioustypesofvariables§ e.g.,acovariatemaybeapredictorofboththeresponseandoftheexposureofinterest

Page 34: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics34

• Oftennotasstraightforwardasonemightthink,mainlybecausethereisoftenchoiceinvolved

• Supposetheresponseofinterestis‘diagnosisofbreastcancer’§ overwhichtimeframe?

§ forwhichsub-types?

• Supposetheexposureofinterestis‘operatingtime’§ whendoestimestart?

§ whendoestimestop?

• Define(andperhapsre-define)untileverythingisclear!

Page 35: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Lungcancersurgery

HST190:IntrotoBiostatistics35

Q:Isthereanassociationbetweentimespentintheoperatingroomandpost-surgicaloutcomes?

• Responses:§ hospitalstayof>7days(binary)

§ numberofmajorcomplicationsduringhospitalstay(count)

o needalistof‘major’complications

§ timetodeath(continuous,right-censored)

• Exposureofinterest:§ operatingtimedefinedasthetimefromthefirstincisiontothetimeofthefirststitchtocloseup(continuous)

Page 36: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Breastcancerrisk

HST190:IntrotoBiostatistics36

Q:Canwedevelopanenhancedbreastcancerriskmodel?

• Response:§ diagnosisofbreastcancerwithin1yearofthescreeningmammogram(binary)

• Exposureofinterest:§ age,race,education,breastdensity,HRTuse...

§ atotalof13potentialpredictors

§ allcategorical

o atleastintheavailabledataset

Page 37: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

IV.Choosingstatisticalmethods

• Onewayofviewingallthestatisticalmethodsavailableisasacollectionoftools§ differentstatisticaltoolsfordifferentstatisticaltasks

§ developunderstandingofacollectionoftoolsoverthecourseofyourcareer

• Atoolboxofstatisticaltools/methods§ basicmethods,thateveryoneshouldbeabletouse

§ specializedmethods

o sophisticatedtoolsthatrequire‘training’

o constantlybeingdevelopedandpublishedintheliterature

§ sometimesnewquestionsrequirenewmethods

HST190:IntrotoBiostatistics37

Page 38: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics38

• Forthemostpart,thetoolsthatresearchersemployaredeterminedbytheissueswe’veconsideredsofar§ scientificgoals

§ natureoftheavailabledata

§ populationofinterest

• Evengivenallthisinformation,thereareoftenseveralchoicesofstatisticaltools/methods

• Howtochoosebetweenalltheavailableapproaches?§ interpretation(tobediscussedlater)

§ operatingcharacteristicso e.g.biasandstatisticalefficiency

Page 39: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

V.Implementationandrunningtheanalysis

• Seeminglythemost‘prescriptive’ofthesteps§ inaperfectworld,turnthehandle...andyou’redone!

• Unfortunately,actuallyperformingtheanalysisisnotalwaysstraightforward

• Manychoicesforstatisticalsoftware§ R,Matlab,SAS,Stata,WinBUGS,...

§ eachhasnumerousresources,includingalready-writtencodeavailableonline

§ notallmethodshavebeenimplementedinallsoftwarepackages

HST190:IntrotoBiostatistics39

Page 40: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics40

• Performingtheanalysescanalsohighlightallsortsofproblems§ EDAmighthighlightdataissues

o missingdata

o unusualvalues

o unusualobservedrelationships

• Issueslikethismayrequireare-thinkofthescientificgoals§ ifyoucan’tanswerthisquestion,whichquestioncanyouanswer?

Page 41: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

VI.Interpretation

• It’simportanttodistinguishinterpretationofthemodel frominterpretationoftheresults

• Specificationofthemodelissomethingthatwehavecontrolover§ itshouldbestraightforwardtoprovideapreciseinterpretationofits’components

o youcannotbepedanticenoughonthispoint

§ shouldbeabletodothisbeforeyouevenseethatdata

• Considerthelinearregressionmodel:𝐸 𝑌 𝑋 = 𝛽% + 𝛽&𝑋

§ Howdoweinterpret𝛽&?

HST190:IntrotoBiostatistics41

Page 42: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Interpretationoftheresults

HST190:IntrotoBiostatistics42

• Herearesomeresults...whatdoesitallmean?!?§ translationofstatisticsbacktoscience

• Interpretingtheresultsrequiresadetailedunderstandingboththescientificandstatisticalcontext§ usuallyrequiresdiscussionwithcollaborators

• Sometimestheresultsdon’tsupporttheinitialhypotheses!§ e.g.,Breitner etal(2008)Neurology

§ RiskofdementiaandADwithpriorexposuretoNSAIDsinanelderlycommunity-basedcohort

§ seethenextslide

Page 43: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics43

Page 44: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics44

• Thesecanbeparticularlychallengingsituations

• Aretheseresults‘right’?§ arewemisinterpretingourassumptions/models?

§ aretheredataissuesthatwearen’tawareof?

§ isthecodewrong?

§ aretheresultssensitivetoparticularanalysischoices?

• Itmaybethattheresultsare‘right’§ perhapsanewunderstandingofthemechanismofinterest

§ perhapstheresultspertaintoapopulationthathasn’tbeenstudiedbefore

Page 45: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Learningaboutpopulations

• Itisseldompossibletospecifyone,singletargetpopulation§ oftenthecasetherearemanyinterestingtargetpopulations

• Flexibilitytoconsiderdifferentpopulationsdependsonwhetherornotthesamplehasbeencollected

• Ifthesamplehasnotbeencollected,onemightconsider§ arangeofscientificquestions

§ thefeasibilityofcollectingdataacrossdifferentpopulations

• Ifthesamplehasbeencollected,flexibilitydependsonthenatureandscopeoftheavailabledata

HST190:IntrotoBiostatistics45

Page 46: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Breastcancerscreening

HST190:IntrotoBiostatistics46

• Broadgoalofscreeningistodetectcancerasearlyaspossible§ balancebetweenpublichealthgoalsandcosts

§ cannotscreeneveryoneallofthetime

§ therearealso‘harms’associatedwithscreening

§ mammographyisnotperfect

§ realconsequencesassociatedwithfalse-positives

• Currentrecommendationsare(broadly):§ allwomenaged50oroldergetscreenedeverytwoyears

§ also,womenintheir40’swhoareat‘highrisk’

Q:Howgoodismammographyasascreeningmodality?§ answerdepends,inpart,onthepopulationofinterest

Page 47: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics47

• Rosenbergetal(2006)Radiology.§ allwomenwhoundergoscreeningmammography

Page 48: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics48

• Yankaskas et al (2010) JNCI.

Page 49: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics49

• Miglioretti etal(2004)JAMA.

Page 50: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics50

• Goldmanetal(2008)MedicalCare.

Page 51: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Remarks

• Exceptinthemosttrivialofsettings,thedataanalysisprocessiscollaborativeanditerative

• Howyouproceedwilldependonmanythings:§ thenatureofthedata

§ yourphilosophy

§ thephilosophyofyourcollaborators

• Gettingthescience‘right’isoftenthehardestpart§ goalsareseldompreciseattheoutset

§ goingback-and-forthbetweenthescienceandstatisticsistypicallyaveryinstructiveprocess

§ todoagoodjobusuallyrequiresknowledgeofthescience

HST190:IntrotoBiostatistics51

Page 52: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics52

• Moreoftenthannot,thereisscopeforprescriptionaswellasforcreativity§ sometimesthereisanobviouswayforward

§ othertimesthereisn’t

• Whatcamefirst...thequestionorthedata?

• Thereisseldomone‘right’scientificquestionordataanalysis§ BoxandDraper(1987):

Essentially,allmodelsarewrongbutsomeareuseful.