Click here to load reader
Upload
jemille6
View
2.659
Download
0
Embed Size (px)
Citation preview
ExploratoryResearchisMoreReliablethanConfirmatoryResearch
ClarkGlymourCarnegieMellonUniversity
1
Confirmatory“Logic”
• HypothesisH:A,Barecausallyconnected• Nullhypothesis:A,Bareindependent• HavedataD,chooseteststaIsIcS,chosealpha
• RejectnullhypothesisifS(D)<alpha• Ifnullisrejected,Hisconfirmed.
2
TheArgumentAgainst“ConfirmatoryResearch”
• Many,manypossibleposiIvehypotheses(typicallyofcausaleffects)inadomain(psychology,epidemiology,biomedicalscience).
• Mostarefalse(wayfewerthan10%aretrue).• SelecIonofwhichhypothesestotestisindependentoftheirtruth.
• Mosttestsareatalpha=.05andpower<.8• PosiIvepublishedresultsarefromrejectednullhypotheses.
• Conclusion:“ifyouusep=0.05asacriterionforclaimingthatyouhavediscoveredaneffectyouwillmakeafoolofyourselfatleast30%oftheIme”,Colquhuon,2012,JRoySoc.Open
3
TheBaseRateCalculaIonIllustrated• SupposeN>>>1hypotheses,10%ofwhicharetrueposiIves(cause-
effect),foreachofwhichthenullhypothesisofindependenceistestedindependentlywithalpha=Pr(rejecIngnull|nullistrue)=0.05;powerw=(probabilityofrejecIngthenullwhenthealternaIveistrue=.8
• ProbabilityoffindingafalseposiIveassociaIons:Pr(rejectnull|nulltrue)xPr(nulltrue)=.05x.9=.045• ProbabilityoffindingaposiIveassociaIon:[.045+Pr(rejectnull|alttrue)xPr(alttrue)=.045+(wx.1)• RaIooftrueposiIvesfoundtoallposiIvesfound:• .045/(.045+.8x.1)• E.g.,ifPower=.8,alpha=.05expectedpropor9onoffalseposi9ves
is.36 4
WhyDoScienIstsPublish“ConfirmatoryStudies”at.05?
• BecausetheythinktheyknowmostofthecausalrelaIons?
• Becauseiftheyusedaloweralphatheirresultswouldnotbe“significant”?
• Becausearealsearchwouldfindthattheycan’tinfermuchfromthedata?
• Publishorperish!
5
TheArgumentthat“Exploratory,”“HighThroughput”ResearchIstheWorst
• Becauseittestsmorehypotheses,andsoproducesmorefalseposi:veeffects:
• ‘”Thegreaterthenumberandthelesser[sic]theselecIonoftestedrelaIonshipsinascienIficfield,thelesslikelytheresearchfindingsaretobetrue.Thusresearchfindingsaremorelikelytrueinconfirmatorydesigns…thaninhypothesisgeneraIngdesigns”[becauseinexploratorystudiesalotoffalsehypothesesaretested,andthemorethataretested,themoreerrorswillbemade.]”FieldsconsideredhighlyinformaIveandcreaIvegiventhewealthoftheassembledandtestedinformaIon,suchasmicroarraysandotherhigh-throughputdiscoveryorientedresearch…shouldhaveextremelylowPPV”[PosiIvePredicIveValue,theprobabilitythatareportedresultistrue].(p.0698).(Ioannidis,2005,PLOSMedicine).
6
Balderdash!Ignorance!Dogma!SupersIIon!
• SearchforcausalrelaIonsisjustparameteres:ma:on.
• StopthinkingintermsoftesIngandconfirmaIon,thinkaccuraciesofes:mators—hypothesistestsarejustcogsinanesImaIonprocedure.
• ConsistentesImatorsforrarerelaIonsexist,employingeither(quasi)BayesiancalculaIonsorclassicalhypothesistests,orboth.
• TheproceduresneverpostulateaconnecIonwithoutmulIpleassessments.
• Appropriatelyused,theproceduresareamazinglyaccurate.
7
Example:SGSAlgorithm
Variables:X1,X2,…,XNXp1–X2isinferredifandonlyifthenullhypothesesX1||X2|ZarerejectedforeachandeverysetZ⊆{X2,…,XN}.InPCsubsetstotestonareselecteddynamically,butthetestsareequivalenttoSGSassumingFaithfulness.
8
Example:FGSAlgorithm
• IteraIvealgorithmstarIngwithtotallydisconnectedgraphofvariables.ConnecIonisaddedonlyifitimprovesthelikelihoodmorethananyotheraddiIonornoneandmorethankln(S)wherekisposiIveandSisthesamplesize.
• E.g.,inthefirststepasingleconnecIonisaddedonlyifitimprovesthelikelihoodsufficientlyandmorethantheaddiIonofanyotheredge.Foramillionvariablecaseaedgeisaddedonlyifitslikelihoodisbe{erthan~1012alternaIves.
9
SimulaIons
AccuraciesforcausalgraphrecoverywithFastGreedySearch(FGS)for,LinearGaussiandata,samplesize1,000:
SimilarresultswithPC-Max.10
The“AnI-ExploraIonArgument”HasEverythingBackwards
• WithverysparsecausalrelaIons,automatedsearchnumberofvariables>>samplesize
hasinthesimulaIons<2%falseposiIvecausalconnecIons<2%falsedirecIonsThesparserthegraph,themoreaccuratetheposiIveresultsoftheprocedure. 11
EmpiricalResultsfrom
• Economics• Ecology• Planetaryscience• Climatescience• GeneregulaIon• EducaIonalResearch• Neuropsychology• Etc.
12
Why?• TheproceduresareasymptoIcallycorrect.• TheyusedatainwhichLOTSofvariableshavebeen
measured.• EachposiIvecausalclaimistestedorassessedmulIple
Imes,againstmulIplecompeInghypothesesinmulIplesubsamplesofthedata.
• TheproceduresarebiasedagainstposiIveresults.• Theprocedureshaveanadjustablebiasagainstweak
effectsandinfavorofstrongeffects,andcanbeusedtofindthevariableswiththestrongesttotaleffectsizeforanoutcomeofinterest.
• ThereverseofIonnaidis’concernaboutrareposiIverelaIonsholds:theproceduresaremostreliablyaccurate,mostinformaIve,andmostfeasiblewhenthetrueposiIvecausalrelaIonsarerare. 13
Morals
• Researchonfast,reliablealgorithmsforcausalesImaIoninavarietyofse}ngsiswheretheacIonisandshouldbe:Latentstructure,feedbackrelaIons,mixedpopulaIons,sampleselecIonbias,Imeseries—allin“highdimensions.”
• Almosteverythingsaidandwri{eninstaIsIcsaboutthesuperiorityof“confirmatory”researchandtheevilsofdatadrivenhypothesissearchiswrong,verywrong.
• “Iandmyfriendscan’tthinkofawaytodoX,thereforeXisimpossible”isacrummyinference.
14
Read,ThenWork• P.Spirtes,etal.,Causa:on,Predic:onandSearch• B.Shipley,Causa:onandCorrela:oninBiology,2ndediIon
• J.D.Ramsey• Buhlmann• Webpagesof:
P.SpirtesT.RichardsonMarloesMaatuisDavidBesslerKevinHoover,forastart
15