Center for Big Data Analytics and Discovery Informatics Artiﬁcial … · 2018. 9. 9. · Center...

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Evaluating Classifier Performance

VasantHonavarArtificialIntelligenceResearchLaboratory

InformaticsGraduateProgramComputerScienceandEngineeringGraduateProgram

BioinformaticsandGenomicsGraduateProgramNeuroscienceGraduateProgram

CenterforBigDataAnalyticsandDiscoveryInformaticsHuckInstitutesoftheLifeSciences

InstituteforCyberscienceClinicalandTranslationalSciencesInstitute

NortheastBigDataHubPennsylvaniaStateUniversity

vhonavar@ist.psu.eduhttp://faculty.ist.psu.edu/vhonavar

http://ailab.ist.psu.edu

WhyEvaluateclassifiers?

•  Toknowhowwellaclassifiercanbeexpectedtoperformwhenitisputtouse

•  Tochoosethebestmodelfromamongasetofalternatives

EvaluatingaClassifier

•  Howcanwemeasureperformanceofclassifiers?•  Howwellcanaclassifierbeexpectedtoperformonnoveldata,i.e.,

datanotseenduringtraining?•  Wecanestimatetheperformance(e.g.,accuracy,sensitivity)ofthe

classifierusinganevaluationdataset(notusedfortraining)•  Howcloseistheestimatedperformancetothetrueperformance?

Classificationerror

•  Error=classifyingarecordasbelongingtooneclasswhenitbelongstoanotherclass.

•  Errorrate=percentofmisclassifiedsamplesoutofthetotalsamplesinthevalidationdata

NaïveBaseline

•  Wehopetodobetterthanthenaïvebaseline•  Whenthegoalistoidentifyhigh-valuebutrare

outcomes,wemaydowellbydoingworsethanthenaïvebaselineintermsofaccuracy

Naïvebaseline:classifyallsamplesasbelongingtothemostprevalentclass

EstimatingClassifierPerformance

N:TotalnumberofinstancesinthedatasetTPj: Numberof Truepositivesforclass j FPj : Numberof Falsepositivesforclass j TNj: Numberof TrueNegativesforclass j FNj: Numberof FalseNegativesforclass j

clabelcclassPNTNTP

Accuracy

=∧==

PerfectclassifierßàAccuracy=1PopularmeasureBiasedinfavorofthemajorityclass!Shouldbeusedwithcaution!

ClassifierLearning--MeasuringPerformanceClassLabel

C1 ¬ C1

C1 TP=55 FP=5¬ C1 FN=10 TN=30

1003055

105555100

FPTNFPfalsealarm

NTNTPaccuracy

FPTPTPyspecificit

FNTPTPysensitivit

FPTNFNTPN

WhenOneClassisMoreImportantthananother

–  Taxfraud–  Creditdefault–  Responsetopromotionaloffer–  Detectingelectronicnetworkintrusion–  Predictingdelayedflights–  Diagnosingcancer–  Predictingnuclearreactormeltdown

Inmanycasesitismoreimportanttoidentifymembersofaspecifictargetclass

Insuchcases,wemaytolerategreateroverallerror,inreturnforbetterpredictionsofthemoreimportantclass

MeasuringClassifierPerformance:Sensitivity

( )( )

c classclabelP c classCount

c classclabelCountFNTP

TPensitivityS

=∧==

PerfectclassifieràSensitivity=1ProbabilityofcorrectlylabelingmembersofthetargetclassAlsocalledrecallorhitrate

C1 ¬ C1

C1 TP=55 FP=5¬ C1 FN=10 TN=30

1003055

105555100

FPTNFPfalsealarm

NTNTPaccuracy

FPTPTPyspecificit

FNTPTPysensitivit

FPTNFNTPN

MeasuringClassifierPerformance:Specificity ( )

( )( ) |

clabelcclassP clabelCount

c classclabelCountFPTP

TPpecificityS

=∧==

PerfectclassifieràSpecificity=1AlsocalledprecisionProbabilitythatapositivepredictioniscorrect

MeasuringPerformance:Precision,Recall,andFalseAlarmRate

jjj FPTP

TPySpecificitPrecision

jjj FNTP

TPySensitivitRecall

( )( )

cclassclabelPclabelCount

cclassclabelCountFPTN

FPFalseAlarm

¬=∧==

PerfectclassifieràPrecision=1PerfectclassifieràRecall=1

PerfectclassifieràFalseAlarmRate=0

C1 ¬ C1

C1 TP=55 FP=5¬ C1 FN=10 TN=30

1003055

105555100

FPTNFPfalsealarm

NTNTPaccuracy

FPTPTPyspecificit

FNTPTPysensitivit

FPTNFNTPN

MeasuringPerformance–CorrelationCoefficient

CC j =TPj ×TN j( ) − FPj × FN j( )

TPJ + FN j( ) TPj + FPj( ) TN j + FPj( ) TN j + FN j( ) −1≤ CC j ≤1

CC j =jlabeli − jlabel( ) jclassi − jclass( )

σ JLABELσ JCLASSdi∈D∑

where jlabeli =1 iff the classifier assigns di to class c jjclassi =1 iff the true class of di is class c j

Bewareofterminologicalconfusionintheliterature!•  Somebioinformaticsauthorsuse“accuracy”incorrectlytorefer

torecalli.e.sensitivityorprecisioni.e.specificity•  Inmedicalstatistics,specificitysometimesreferstosensitivity

forthenegativeclassi.e.•  Someauthorsusefalsealarmratetorefertotheprobabilitythat

apositivepredictionisincorrecti.e.Whenyouwrite•  providetheformulaintermsofTP, TN, FP, FN Whenyouread•  checktheformulaintermsofTP, TN, FP, FN

FPTNTN+

j PrecisionTPFP

FP−=

MeasuringClassifierPerformance•  TP,FP,TN,FNprovidetherelevantinformation•  Nosinglemeasuretellsthewholestory•  Aclassifierwith98%accuracycanbeuselessif98%ofthe

populationdoesnothavecancerandthe2%thatdoaremisclassifiedbytheclassifier

•  Useofmultiplemeasuresrecommended•  Bewareofterminologicalconfusion!

Micro-averagedperformancemeasuresPerformanceonarandomsample

⎟⎟⎠

⎞⎜⎜⎝

⎛+⎟⎟

⎞⎜⎜⎝

⎛+⎟⎟

⎞⎜⎜⎝

⎛+⎟⎟

⎞⎜⎜⎝

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛×⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛×⎟⎟⎠

⎞⎜⎜⎝

∑∑∑∑∑∑∑∑

∑∑∑∑

FNTNFPTNFPTPFNTP

FNFPTNTPCCgeMicroAvera

∑∑

TPPrecision geMicroAvera ∑∑

TPRecall geMicroAvera

PrecisiongeMicroAveraFalseAlarmgeMicroAvera 1 −=

•  Microaveraginggivesequalimportancetoeachsample•  Classeswithlargenumberofinstancesdominate

TPAccuracygeMicroAvera j

j∑= Etc.

Macro-averagedperformancemeasures

jnCoeffCorrelatioM

ionCoeffgeCorrelatMacroAvera 1

jpecificitySM

ty SpecificigeMacroAvera 1

jensitivitySM

ty SensitivigeMacroAvera 1

MacroaveraginggivesequalimportancetoeachoftheMclasses

CutoffforclassificationMostmachinelearningalgorithmsclassifyviaa2-stepprocess:Foreachsample,

1.  Computeprobabilityofbelongingtoclass“1”2.  Comparetocutoffvalue,andclassifyaccordingly

•  Defaultcutoffvalueis0.50If>=0.50,classifyas“1”If<0.50,classifyas“0”

•  Canusedifferentcutoffvaluesfortradingoffonemeasureagainstanother(moreonthislater)

•  Question:HowwouldthisworkinthecaseofKnearestneighbor?

•  Ifcutoffis0.50:12samplesareclassifiedas“1”•  Ifcutoffis0.80:sevensamplesareclassifiedas“1”

ActualClass Prob.of"1" ActualClass Prob.of"1"1 0.996 1 0.5061 0.988 0 0.4711 0.984 0 0.3371 0.980 1 0.2181 0.948 0 0.1991 0.889 0 0.1491 0.848 0 0.0480 0.762 0 0.0381 0.707 0 0.0251 0.681 0 0.0221 0.656 0 0.0160 0.622 0 0.004

CutoffTable

ReceiverOperatingCharacteristic(ROC)Curve

•  Theconfusionmatrix,andhencethepreviousmeasuresofclassifierperformancearethresholddependent

•  Wecanoftentradeoffrecallversusprecision–e.g.,byadjustingclassificationthresholdθ

•  Isthereathreshold-independentmeasureofclassifierperformance?– ROCcurveisaplotofSensitivityagainstFalseAlarm

Ratewhichissameas(1-Specificity)whichcharacterizesthistradeoffforagivenclassifier

– ROCcurveisobtainedbyplottingsensitivityagainst(1-specificity)byvaryingtheclassificationthreshold

Receiveroperatingcharacteristic(ROC)Curve

MeasuringPerformanceofClassifiers–ROCcurves

•  ROCcurvesofferamorecompletepictureoftheperformanceoftheclassifierasafunctionoftheclassificationthreshold

•  AclassifierhisbetterthananotherclassifiergifROC(h)dominatestheROC(g)

•  ROC(h)dominatesROC(g)àAreaROC(h)>AreaROC(g)

ROCCurve

MisclassificationCostsMayDiffer

•  Thecostofmakingamisclassificationerrormaybehigherforoneclassthantheother(s)

•  Lookedatanotherway,thebenefitofmakingacorrectclassificationmaybehigherforoneclassthantheother(s)

Example–ResponsetoPromotionalOffer

•  “Naïverule”(classifyeveryoneas“0”)haserrorrateof1%(seemsgood)

•  Usingmachinelearningsupposewecancorrectlyclassifyeight1’sas1’s

•  Butatthecostofmisclassifyingtwenty0’sas1’sandtwo1’sas0’s.

•  Supposewesendanofferto1000people,with1%averageresponserate

•  “1”=response,“0”=nonresponse

Errorrate=(2+20)=2.2%(higherthannaïverate)

ConfusionMatrix

Predictas1 Predictas0Actual1 8 2Actual0 20 970

IntroducingCosts&BenefitsSuppose:•  Profitfroma“1”is$10•  Costofsendingofferis$1Then:•  Undernaïverule,allareclassifiedas“0”,sono

offersaresent:nocost,noprofit•  UnderDMpredictions,28offersaresent.

8respondwithprofitof$10each20failtorespond,cost$1each972receivenothing(nocost,noprofit)

ProfitMatrix

Predictas1 Predictas0Actual1 $80 0Actual0 ($20) 0

EvaluatingaClassifier

•  Whatwehavedonesofaristoestimatetheclassifier’sperformanceonsomeavailabledata.

•  Howwellcanaclassifierbeexpectedtoperformonnoveldata?

•  Performanceestimatedontrainingdataisoftenoptimisticrelativetoperformanceonnoveldata

•  Wecanestimatetheperformance(e.g.,accuracy,sensitivity)oftheclassifierusingevaluationdata(notusedfortraining)

•  Howcloseistheestimatedperformancetothetrueperformance?

Evaluationofaclassifierwithlimiteddata

•  Holdoutmethod–usepartofthedatafortraining,andtherestfortesting

•  Wemaybeluckyorunlucky–trainingdataortestdatamaynotberepresentative

•  Solution–Runmultipleexperimentswithdisjointtrainingandtestdatasetsinwhicheachclassisrepresentedinroughlythesameproportionasintheentiredataset

ClassifierevaluationData Label

Trainingdata

Testingdata

ClassifierevaluationData Label

Trainingdata

Testingdata

trainaclassifier

Classifierevaluation

Data Label

Pretendlikewedon’tknowthelabels

Data Label

Classify

Data Label

Classify

Comparepredictedlabelstoactuallabels

Comparingalgorithms

Data Label

model1 1

model2 10

Ismodel2betterthanmodel1?

Comparingalgorithms

model1 1

model2 1

Predicted

LabelPredicted

Evaluation

score1

score2

model2betterifscore2>score1

Whenwouldwewanttodothistypeofcomparison?

Ismodel2better?Model1:85%accuracyModel2:80%accuracy

Model1:85.5%accuracyModel2:85.0%accuracy

Model1:0%accuracyModel2:100%accuracy

Comparingscores:significance•  Justcomparingscoresononedatasetisn’t

enough!•  Wedon’tjustwanttoknowwhichsystemis

betterononeparticulardataset,wewanttoknowifmodel1isbetterthanmodel2ingeneral

•  Putanotherway,wewanttobeconfidentthatthedifferenceisrealandnotjustduetorandomchance

Howdoweknowhowvariableamodel’saccuracyis?

Variance

Varianceofperformance

•  Weneedmultipleaccuracyscores!•  Howcanwegetthem?

RepeatedexperimentationData Label

Trainingdata

Testingdata

Insteadofoneevaluationwithaparticularsplitoftrainingandtestdata,runmultipleevaluations,withdifferentsplitsoftrainingandtestdata

Repeatedexperimentation

Data Label

=evaluation=train

K-foldcrossvalidationTr

breakintonequal-sizedparts

repeatforallparts/splits:trainonK-1partsevaluateontheother

split1 split2

split3

K-foldcrossvalidation

evaluate

score1

score2

score3

K-foldcrossvalidation

•  Betterutilizationoflabeleddata•  Morerobust:don’tjustrelyononeevaluationsetto

evaluatetheapproach(orforoptimizingparameters)•  MultipliesthecomputationaloverheadbyK(haveto

trainKmodelsinsteadofjustone)•  10isthemostcommonchoiceofK

EstimatingtheperformanceofaclassifierK-foldcross-validationPartitionthedata(multi)setSintoKequalpartsS1..SK

withroughlythesameclassdistributionasS.Errorc=0

Fori=1toKdo

;iTrain SSS −←iTest SS ←)( TrainSLearn←α

),( TestSErrorErrorcErrorc α+←

( )ErrorOutputK

ErrorcError ;⎟⎠

⎞⎜⎝

⎛←

Estimatingclassifierperformance

Recommendedprocedure•  UseK-foldcross-validation(K=5or10)forestimating

performanceestimates(accuracy,precision,recall,pointsonROCcurve,etc.)and95%confidenceintervalsaroundthemean

•  Computemeanvaluesofperformanceestimatesandstandarddeviationsofperformanceestimates

•  Reportmeanvaluesofperformanceestimatesandtheirstandarddeviationsor95%confidenceintervalsaroundthemean

•  Beskeptical–repeatexperimentsseveraltimeswithdifferentrandomsplitsofdataintoKfolds!

Leave-one-outcrossvalidation•  K-foldcrossvalidationwhereK=numberof

samples•  aka“jackknifing”•  pros/cons?•  whenwouldweusethis?

Leave-one-outcross-validation

•  K-foldcrossvalidationwithK=nwherenisthetotalnumberofsamplesavailable

•  nexperiments–usingn-1samplesfortrainingandtheremainingsamplefortesting

•  Leave-one-outcross-validationdoesnotguaranteethesameclassdistributionintrainingandtestdata!

Extremecase:50%class1,50%class2PredictmajorityclasslabelinthetrainingdataTrueerror–50%;

Leave-one-outerrorestimate–100%!!!!!

Leave-one-outcrossvalidation•  Canbeveryexpensiveiftrainingisslowand/or

iftherearealargenumberofexamples•  Usefulindomainswithlimitedtrainingdata:

maximizesthedatawecanusefortraining•  Someclassifierspermittheestimationof

leave-1-outperformancemeasurewithoutactuallyhavingtotrainKmodels

Comparingsystems:sample1split model1 model2

1 87 882 85 843 83 844 80 795 88 896 85 857 83 818 87 869 88 8910 84 85

average: 85 85

1 87 872 92 883 74 794 75 865 82 846

79 877 83 818 83 929 88 8110 77 85avg 82 85

1 84 872 83 863 78 824 80 865 82 846 79 877 83 848 83 869 85 8310 83 85

average: 82 85

Comparingsystemssplit model1 model2

1 84 872 83 863 78 824 80 865 82 846 79 877 83 848 83 869 85 8310 83 85

average: 82 85

split model1 model2

1 87 872 92 883 74 794 75 865 82 846 79 877 83 818 83 929 88 8110 77 85

average: 82 85

What’sthedifference?

Comparingsystemssplit model1 model2

1 84 872 83 863 78 824 80 865 82 846 79 877 83 848 83 869 85 8310 83 85

average: 82 85

stddev 2.3 1.7

split model1 model2

1 87 872 92 883 74 794 75 865 82 846 79 877 83 818 83 929 88 8110 77 85

average: 82 85

stddev 5.9 3.9

Comparingsystems:sample4

split model1 model2

1 80 822 84 873 89 904 78 825 90 916 81 837 80 808 88 899 76 7710 86 88

average 83 85

stddev 4.9 4.7

split model1

model2 model2–model

11 80 82 22 84 87 33 89 90 14 78 82 45 90 91 16 81 83 27 80 80 08 88 89 19 76 77 110 86 88 2

average 83 85stddev 4.9 4.7

split model1 model2 model2–model1

1 80 82 22 84 87 33 89 90 14 78 82 45 90 91 16 81 83 27 80 80 08 88 89 19 76 77 110 86 88 2

average 83 85stddev 4.9 4.7

Model2isALWAYSbetter

split model1 model2 model2–model1

1 80 82 22 84 87 33 89 90 14 78 82 45 90 91 16 81 83 27 80 80 08 88 89 19 76 77 110 86 88 2

average: 83 85

stddev 4.9 4.7

Howdowedecideifmodel2isbetterthanmodel1?

StatisticaltestsSetup:

–  Assumesomedefaulthypothesisaboutthedatathatyou’dliketodisprove,calledthenullhypothesis

–  e.g.model1andmodel2arenotstatisticallydifferentinperformance

Test:–  Calculateateststatisticfromthedata(oftenassuming

somethingaboutthedata)–  Basedonthisstatistic,withsomeprobabilitywecan

rejectthenullhypothesis,thatis,showthatitdoesnothold

t-test

Determineswhethertwosamplescomefromthesameunderlyingdistributionornot

t-testNullhypothesis:model1andmodel2accuraciesarenodifferent,i.e.comefromthesamedistributionResult:probabilitythatthedifferenceinaccuraciesisduetorandomchance(lowvaluesarebetter)

Calculatingt-testForoursetup,we’lldowhat’scalleda“pairt-test”

–  Thevaluescanbethoughtofaspairs,wheretheywerecalculatedunderthesameconditions

–  Inourcase,thesametrain/testsplit– Givesmorepowerthantheunpairedt-test(wehave

moreinformation)

Foralmostallexperiments,we’lldoa“two-tailed”versionofthet-testhttp://en.wikipedia.org/wiki/Student's_t-test

p-value•  Theresultofastatisticaltestisoftenap-value•  p-value:theprobabilitythatthenullhypothesis

holds.Specifically,ifwere-ranthisexperimentmultipletimes(sayondifferentdata)whatistheprobabilitythatwewouldrejectthenullhypothesisincorrectly(i.e.theprobabilitywe’dbewrong)

•  Commonvaluestoconsider“significant”:0.05(95%confident),0.01(99%confident)and0.001(99.9%confident)

1 87 882 85 843 83 844 80 795 88 896 85 857 83 818 87 869 88 8910 84 85

average: 85 85

Theyarethesamewith:p=1

1 87 872 92 883 74 794 75 865 82 846 79 877 83 818 83 929 88 8110 77 85

average: 82 85

Theyarethesamewith:p=0.15

1 84 872 83 863 78 824 80 865 82 846 79 877 83 848 83 869 85 8310 83 85

average: 82 85

1 80 822 84 873 89 904 78 825 90 916 81 837 80 808 88 899 76 7710 86 88

average: 83 85

Statisticaltestsontestdata

LabeledData

(datawithlabels)

AllTraining

TestData

TrainingData

DevelopmentData

cross-validationwitht-test

Canwedothathere?

Bootstrapresamplingtestsettwithnsamplesdomtimes:-  samplenexampleswithreplacementfromthe

testsettocreateanewtestsett’-  evaluatemodel(s)ont’

calculatet-test(orotherstatisticaltest)onthecollectionofmresults

Bootstrapresampling

Test’1

tTestData

Test’m

Test’2

Bootstrapresampling

modelA

Test’1

Test’2

Test’m

Ascore1

Ascore2

Ascorem

Bootstrapresampling

modelB

Test’1

Test’2

Test’m

Bscore1

Bscore2

Bscorem

Bootstrapresampling

Ascore1

Ascore2

Ascorem

Bscore1

Bscore2

Bscorem

pairedt-test(orotheranalysis)

Experimentationgoodpractices

Neverlookatyourtestdata!Duringdevelopment

–  Comparedifferentmodels/hyperparametersondevelopmentdata

–  usecross-validationtogetmoreconsistentresults–  Ifyouwanttobeconfidentwithresults,useat-test

andlookforp=0.05(orevenbetter)Forfinalevaluation,usebootstrapresamplingcombinedwithat-testtocomparemodels

Estimatingtheperformanceofaclassifier

ThetrueerrorofahypothesishwithrespecttoatargetfunctionfandaninstancedistributionDis

[ ])()(Pr)( xhxfhErrorDxD ≠≡

ThesampleerrorofabinaryclassifierhwithrespecttoatargetfunctionfandaninstancedistributionDis

otherwise 0),( ; iff 1),(

))()((||

≠≡ ∑∈

bababa

hErrorSx

Estimatingclassifierperformance

( )( )

( ) [ ]

00110110

⎭⎬⎫

⎩⎨⎧

)()()()(Pr

},,,{)(

cXDaXDxfxhherror

dcbaXDomain

Evaluatingtheperformanceofaclassifier

•  Sampleerrorestimatedfromtrainingdataisanoptimisticestimate

•  Foranunbiasedestimate,hmustbeevaluatedonanindependentsampleS(whichisnotthecaseifSisthetrainingset!)

•  Evenwhentheestimateisunbiased,itcanvaryacrosssamples!•  Ifhmisclassifies8outof100samples

[ ] )()( hErrorhErrorEBias DS −=

0801008 .)( ==hErrorS

Howcloseisthesampleerrortothetrueerror?

Howcloseistheestimatederrortothetrueerror?•  ChooseasampleSofsizenaccordingtodistributionD•  Measure

)(hErrorS

)(hErrorS isarandomvariable(outcomeofarandomexperiment)

?)( about conclude wecan what,)( Given hErrorhError DS

Moregenerally,giventheestimatedperformanceofahypothesis,whatcanwesayaboutitsactualperformance?

Evaluatingperformancewhenwecanaffordtotestonalargeindependenttestset

ThetrueerrorofahypothesishwithrespecttoatargetfunctionfandaninstancedistributionDis

[ ])()(Pr)( xhxfhErrorDxD ≠≡

The sample error of a classifier hwith respect to a target function fand an instance distribution D is

otherwise 0),( ; iff 1),(

))()((||

≠≡ ∑∈

bababa

hErrorSx

EvaluatingClassifierperformance

Sampleerrorestimatedfromtrainingdataisanoptimisticestimate

Foranunbiasedestimate,hmustbeevaluatedonanindependentsampleS(whichisnotthecaseifSisthetrainingset!)

Evenwhentheestimateisunbiased,itcanvaryacrosssamples!Ifhmisclassifies8outof100samples

[ ] )()( hErrorhErrorEBias DS −=

0801008 .)( ==hErrorS

Howcloseisthesampleerrortothetrueerror?

Howcloseisestimatederrortoitstruevalue?ChooseasampleSofsizenaccordingtodistributionDMeasure )(hErrorS

)(hErrorS isarandomvariable(outcomeofarandomexperiment)

?)( about conclude wecan what,)( Given hErrorhError DS

Moregenerally,giventheestimatedperformanceofaclassifier,whatcanwesayaboutitsactualperformance?

Howcloseisestimatedaccuracytoitstruevalue?

Question:Howcloseisp(thetrueprobability)to ?Thisproblemisaninstanceofawell-studiedprobleminstatistics•  Theproblemofestimatingtheproportionofapopulationthat

exhibitssomeproperty,giventheobservedproportionoverarandomsampleofthepopulation.

•  Inourcase,thepropertyofinterestisthathcorrectly(orincorrectly)classifiesasample.

•  TestinghonasinglerandomsamplexdrawnaccordingtoDamountstoperformingarandomexperimentwhichsucceedsifhcorrectlyclassifiesxandfailsotherwise.

TheoutputofaclassifierwhosetrueerrorispasabinaryrandomvariablewhichcorrespondstotheoutcomeofaBernoullitrialwithasuccessratep(theprobabilityofcorrectprediction)

ThenumberofsuccessesrobservedinNtrialsisarandom

variableYwhichfollowstheBinomialdistribution

rnr pprnr

nrP −−−

= )()!(!

Probabilityofobservingrmisclassifiedexamplesinasampleofsizen:

ErrorS(h)isaRandomVariable

rnr pprnr

nrP −−−

= )()!(!

!)( 1r

Recallbasicstatistics

ConsiderarandomexperimentwithdiscretevaluedoutcomesTheexpectedvalueofthecorrespondingrandomvariableYisThevarianceofYisThestandarddeviationofYis

Myyy ,..., 21

)Pr()( i

ii yYyYE =≡ ∑

[ ]2])[()( YEYEYVar −≡

)(YVarY ≡σ

ThemeanofaBernoullitrialwithsuccessratep=pVariance=p(1-p)IfNtrialsaretakenfromthesameBernoulliprocess,the

observedsuccessratehasthesamemeanpandvarianceForlargeN,thedistributionoffollowsaGaussiandistribution

Npp )1( −

BinomialProbabilityDistribution

rnr pprnr

nrP −−−

= )()!(!

ProbabilityP(r)ofrheadsinncoinflips,ifp=Pr(heads)• Expected,ormeanvalueofX,E[X],is

inpiiPXE

• VarianceofXis

• StandarddeviationofX,σX,is

)(]])[[()( pnpXEXEXVar −=−≡ 12

)(]])[[( pnpXEXEX −=−≡ 12σ

Estimators,Bias,Variance,ConfidenceInterval

hErrorS

phErrornrhError

nhErrorhError SS

hErrorS

))()(()(

−≈

AnN%confidenceintervalforsomeparameterpthatistheintervalwhichisexpectedwithprobabilityN%tocontainp

nhErrorhError DD

hErrorS

))()(()(

Normaldistributionapproximatesbinomial

ErrorS(h)followsaBinomialdistribution,with•  mean•  standarddeviation

nhErrorshError

hErrorsDD

))()(()(

−= 1σ

WecanapproximatethisbyaNormaldistributionwiththesamemeanandvariancewhennp(1-p)≥5

)()( hErrorDhErrorS=µ

Normaldistribution2

21 )(1

22)( σ

−−=x

Expected,ormeanvalueofXisgivenbyE[X]=µVarianceofXisgivenbyVar(X)=σ2StandarddeviationofXisgivenbyσX=σ

TheprobabilitythatXwillfallintheinterval(a,b)isgivenby∫

adxxp )(

Howcloseistheestimatedaccuracytoitstruevalue?LettheprobabilitythataGaussianrandomvariableX,withzero

mean,takesavaluebetween–zandz,Pr[-z≤X≤z]=c

Pr[X≥z] z

0.001 3.09

0.005 2.58

0.01 2.33

0.05 1.65

0.10 1.28

Howcloseistheestimatedaccuracytoitstruevalue?

Butdoesnothavezeromeanandunitvariancesowenormalizetoget

nppppz =

⎥⎥⎥⎥

⎢⎢⎢⎢

−<−

)(ˆPr1

Howcloseistheestimatedaccuracytoitstruevalue?

Tofindconfidencelimits:Givenaparticularconfidencefigurec,usethetabletofindthezcorrespondingtotheprobability½(1-c).Uselinearinterpolationforvaluesnotinthetable

⎥⎦

⎤⎢⎣

⎥⎥⎦

⎢⎢⎣

⎡+−±+

42ˆˆˆ

Center for Big Data Analytics and Discovery Informatics Artiﬁcial … · 2018. 9. 9. · Center...

Documents

B 1.OHDSI Erlangen 2018 - MIRACUM – Medical Informatics ... · Medical Informatics Services, NewYork‐Presbyterian Biomedical Informatics discovery and impact. Observational Health

NSF Workshop on Discovery Informaticsgil/diw2012/NSFDiscoveryInformatics2012-FinalReport.pdf · NSF Workshop on Discovery Informatics February 2-3, 2012 Arlington, VA Final Workshop

Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics

Center for Big Data Analytics and Discovery Informatics

What is Biomedical Informatics? - Leadership & Discovery ... · PDF fileWhat is Biomedical Informatics? William Hersh, MD, FACP, FACMI Professor and Chair ... Patient care outside

RUTGERS DISCOVERY INFORMATICS INSTITUTE | RDI · UNDERGRADUATE WORKSHOP: KNOWLEDGE DISCOVERY AND DATA-DRIVEN DECISIONS RDI2, in collaboration with Rutgers Center for Critical Intelligence,

Applied Mathematics and Informatics In Drug Discovery (2020)

ON MATERIALS INFORMATICS AND KNOWLEDGE DISCOVERY ... · knowledge representation/discovery, data mining, machine learning, ... pixels of each image were grouped into meaningful clusters

INFORMATICS AND DATA SCIENCE 2018 情報科学部 ......Big Data Database Deep Learning Statistics Artiﬁcial intelligence Data Analysis 情報科学部 SCHOOL OF INFORMATICS AND

Toward Artiﬁcial Synesthesia: Linking Images and Sounds ...€¦ · Toward Artiﬁcial Synesthesia: Linking Images and Sounds via Words Han Xiao, Thomas Stibor Department of Informatics

Evolva Biotech SA Microarray and Macro opportunities for Discovery informatics Head of Informatics shriramr@evolvabio.com Mobile

Enabling Neuroimaging Informatics Tools and Resources Discovery

Genome Informatics 2015 Bacteriocin Discovery

Introduction to Applied Mathematics and Informatics in ...€¦ · Introduction to Applied Mathematics and Informatics in Drug Discovery (AMIDD) How were new medicines discovered?

Center for Big Data Analytics and Discovery Informatics ... · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory Fall 2018 Vasant

Informatics and data analytics to support for exposome-based discovery

Artiﬁcial Intelligence Approximately every other Tuesday. · Artiﬁcial Intelligence Prof. Dr. Jürgen Dix Department of Informatics TU Clausthal Summer 2009 Prof. Dr. Jürgen

Towards Evidence-Based Discovery Informatics Tools for Synthesis Guest Speaker : Tim Cary

Informatics in Drug Discovery - evqfm.com.br · Phases of Drug Discovery Enabling Science & Technology Emerging Technologies Predictive ADME/Tox, Safety Assessment Front-loading Risk

Workshop on a “Drug Discovery” Approach to …web.mit.edu/dsadoway/www/InvitedTalks/HTE Overview...Drug Discovery Approach to Breakthroughs in Batteries Informatics in Drug Discovery